-
Notifications
You must be signed in to change notification settings - Fork 55
Open
Description
We (me + @grig-guz) are trying to store a (large) CSV dataset into a sparse matrix and subsequently put that in a JLD blob. However we are getting some weird type error when trying to load it again: ERROR: stored type SparseArrays.SparseMatrixCSC{Core.Int8,Core.Int64} does not match currently loaded type
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.5.3 (2020-11-09)
_/ |\__'_|_|_|\__'_| |
|__/ |
julia> using JLD
julia> load("data_matrices.jld")
┌ Warning: type SparseArrays.SparseMatrixCSC{Core.Int8,Core.Int64} not present in workspace; reconstructing
└ @ JLD ~/.julia/packages/JLD/jeGJb/src/jld_types.jl:697
ERROR: stored type SparseArrays.SparseMatrixCSC{Core.Int8,Core.Int64} does not match currently loaded type
Stacktrace:
[1] jldatatype(::JLD.JldFile, ::HDF5.HDF5Datatype) at /home/em/.julia/packages/JLD/jeGJb/src/jld_types.jl:723
[2] read(::JLD.JldDataset) at /home/em/.julia/packages/JLD/jeGJb/src/JLD.jl:367
[3] read(::JLD.JldFile, ::String) at /home/em/.julia/packages/JLD/jeGJb/src/JLD.jl:343
[4] #43 at ./array.jl:0 [inlined]
[5] iterate at ./generator.jl:47 [inlined]
[6] collect(::Base.Generator{Array{String,1},JLD.var"#43#45"{JLD.JldFile}}) at ./array.jl:686
[7] (::JLD.var"#42#44")(::JLD.JldFile) at /home/em/.julia/packages/JLD/jeGJb/src/JLD.jl:1246
[8] jldopen(::JLD.var"#42#44", ::String, ::Vararg{String,N} where N; kws::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/em/.julia/packages/JLD/jeGJb/src/JLD.jl:243
[9] jldopen(::Function, ::String, ::String) at /home/em/.julia/packages/JLD/jeGJb/src/JLD.jl:241
[10] load(::FileIO.File{FileIO.DataFormat{:JLD}}) at /home/em/.julia/packages/JLD/jeGJb/src/JLD.jl:1245
[11] load(::String; options::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/em/.julia/packages/FileIO/wN5rD/src/loadsave.jl:133
[12] load(::String) at /home/em/.julia/packages/FileIO/wN5rD/src/loadsave.jl:133
[13] top-level scope at REPL[2]:1
[14] run_repl(::REPL.AbstractREPL, ::Any) at /build/julia/src/julia-1.5.3/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:288
julia>
The following script is how we package it up
import Pkg; Pkg.add("CSV"); Pkg.add("DataFrames"); Pkg.add("JLD")
using CSV
using DataFrames
using SparseArrays
using JLD
# Data is in the format Item-User-Rating-Timestamp
frame = CSV.read("../data/data.csv", header=["item", "user", "rating", "time_stamp"], DataFrame)
unique_items = unique(frame[!, "item"])
unique_users = unique(frame[!, "user"])
# String id to numerical id
items_map = Dict(zip(unique_items, range(1, length=length(unique_items))))
users_map = Dict(zip(unique_users, range(1, length=length(unique_users))))
num_rows = nrow(frame)
I = zeros(Int64, num_rows); J = zeros(Int64, num_rows); V = ones(Int8, num_rows)
V_ratings= convert(Array{Int8,1}, frame[!, "rating"])
for (index, row) in enumerate(eachrow(frame))
I[index] = items_map[row["item"]]
J[index] = users_map[row["user"]]
end
Sigma = sparse(I, J, V)
B = sparse(I, J, V_ratings)
save("../data/data_matrices.jld", "sigma", Sigma, "b", B)
Metadata
Metadata
Assignees
Labels
No labels