I noticed that datasets using Jaccard similarity were recently added, namely movie lens and kosarak. I downloaded the HDF5 and extracted the 'train' split for kosarak. However, the shape I get doesn't seem correct. Am I missing something?
>>> kosarak = np.load("kosarak-jaccard.train.npy")
>>> kosarak.shape
(4167103,)
>>>
This certainly is very different from what the readme says (74962 vectors each of dimensionality 27983). Would appreciate clarification on this.