Skip to content

Bug: sparse_inverted_index returns incorrect results #647

@coroluca

Description

@coroluca

Hello,
I'm using the sparse_inverted_index (introduced in PR #517) with the following configuration:

create index on documents using vectors (term_freqs svector_dot_ops) with (options = '[indexing.sparse_inverted_index]');

I've encountered a critical issue: when the index is enabled (vectors.enable_index = on), the query returns seemingly random vectors instead of the expected results.
For example, the following query should return document ID 300 as the most similar document to itself:

with input(tf) as (
    select term_freqs from documents where id = 300 limit 1
)
select id, (term_freqs <#> (select tf from input)) as score
from  documents
order by term_freqs <#> (select tf from input)
limit 500;

However, with the index enabled, document ID 300 doesn't appear as expected in the results.
When I disable the index (vectors.enable_index = off), the query returns the correct results.

Thanks for considering this issue. I appreciate any help in resolving this problem as it's currently blocking our implementation.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions