refactor: Rework limit=0 for vector adapters #1450

siillee · 2025-09-23T10:49:21Z

Description

Until now, limit=0 in vector search meant that there is no limit and we should return everything. This caused confusion and errors, so now it is reworked so that limit=None means no limit on the search. If someone puts limit=0, there will be no results returned, as it makes more sense and is less error prone.

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Code refactoring
Performance improvement
Other (please specify):

Changes Made

Testing

Screenshots/Videos (if applicable)

Pre-submission Checklist

I have tested my changes thoroughly before submitting this PR
This PR contains minimal changes necessary to address the issue/feature
My code follows the project's coding standards and style guidelines
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if applicable)
All new and existing tests pass
I have searched existing PRs to ensure this change hasn't been submitted already
I have linked any relevant issues in the description
My commits have clear and descriptive messages

Related Issues

Additional Notes

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

…it, still wip.

pull-checklist · 2025-09-23T10:49:25Z

Please make sure all the checkboxes are checked:

I have tested these changes locally.
I have reviewed the code changes.
I have added end-to-end and unit tests (if applicable).
I have updated the documentation and README.md file (if necessary).
I have removed unnecessary code and debug statements.
PR title is clear and follows the convention.
I have tagged reviewers or team members for feedback.

coderabbitai · 2025-09-23T10:49:30Z

Walkthrough

Updates search APIs across vector and hybrid adapters to accept Optional[int] limits, adding explicit None handling. Call sites switch from limit=0 to limit=None. Interface types and related docstrings are aligned. Tests are added for None-limit behavior across ChromaDB, LanceDB, and PGVector. A demo adds pre-run prune steps.

Changes

Cohort / File(s)	Summary
Vector interface `cognee/infrastructure/databases/vector/vector_db_interface.py`	`search` and `batch_search` `limit` types changed to `Optional[int]`; docstrings updated; minor formatting.
Vector adapters `.../vector/chromadb/ChromaDBAdapter.py`, `.../vector/lancedb/LanceDBAdapter.py`, `.../vector/pgvector/PGVectorAdapter.py`	`search` (and LanceDB `batch_search`) now accept `Optional[int]`. New handling for `limit=None` (resolve via count), early return on non-positive limits; ChromaDB logs downgraded from error to warning on exceptions.
Hybrid adapter `.../hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py`	`search` `limit` type to `Optional[int]`; logs/comments updated to mention None as invalid input defaulting to 10.
Retrieval and graph `cognee/modules/graph/cognee_graph/CogneeGraph.py`, `cognee/modules/retrieval/temporal_retriever.py`, `cognee/modules/retrieval/utils/brute_force_triplet_search.py`, `cognee/modules/retrieval/insights_retriever.py`	Call sites switch `limit=0` to `limit=None`; `InsightsRetriever.__init__` `top_k` type to `Optional[int]`.
Tests `cognee/tests/test_chromadb.py`, `cognee/tests/test_lancedb.py`, `cognee/tests/test_pgvector.py`	Add async tests validating `limit=None` returns large result sets; LanceDB test suite includes file deletion and user-doc retrieval checks; main runners invoke new tests.
Example `examples/python/cognee_simple_document_demo.py`	Adds `prune_data()` and `prune_system(metadata=True)` before add/cognify.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant VectorDBInterface as VectorDBInterface
  participant Adapter as SpecificAdapter
  participant Store as UnderlyingDB

  Caller->>VectorDBInterface: search(collection, query, limit=None|n<=0|n>0)
  VectorDBInterface->>Adapter: search(..., limit)

  alt limit is None
    Adapter->>Store: count(collection)
    Store-->>Adapter: total_count
    alt total_count <= 0
      Adapter-->>VectorDBInterface: []
    else total_count > 0
      Adapter->>Store: query(limit=total_count)
      Store-->>Adapter: results
      Adapter-->>VectorDBInterface: results
    end
  else limit <= 0
    Adapter-->>VectorDBInterface: []
  else limit > 0
    Adapter->>Store: query(limit=limit)
    Store-->>Adapter: results
    Adapter-->>VectorDBInterface: results
  end

  note over Adapter,Store: Some adapters adjust logging (e.g., warn on exceptions)

sequenceDiagram
  autonumber
  participant Graph as CogneeGraph
  participant Vector as Vector Engine
  Graph->>Vector: search(collection="...", limit=None)
  Vector-->>Graph: results (all/expanded set)
  Graph->>Graph: map distances to edges

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: deletes on the fly embeddings and uses edge collections #436 — Also edits CogneeGraph.map_vector_distances_to_graph_edges; overlaps with this PR’s search call change.
feat: add entity and entity type nodes to vector db #126 — Modifies vector DB search implementation (incl. LanceDBAdapter), related to the limit handling updates.

Suggested labels

run-checks

Suggested reviewers

Vasilije1990
borisarzentar
lxobr

Poem

I hopped through limits, none to see,
A warren wide of query spree.
From graphy burrows to vector glades,
I fetched the lot—no partial shades.
With ears alert and tests in line,
I nibble bytes—results divine! 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description explains the intent and high-level behavior change (None becomes "no limit", 0 returns no results) but does not fully follow the repository template: the "Changes Made" and "Testing" sections are empty and there is no per-file change list, test run details, or related-issue references, which reduces reviewer confidence and traceability.	Please complete the template by listing the specific changes (key files and what was changed), describe how you tested the change including commands or CI results and any new tests added, and link related issues or docs; also verify the Pre-submission checklist items and DCO affirmation are fully addressed so reviewers can validate the change quickly.
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "refactor: Rework limit=0 for vector adapters" is a concise, single-sentence summary that accurately describes the primary change (reworking limit semantics across vector adapters) and contains no noisy details; it clearly signals a refactor and the intended behavioral change for reviewers scanning history.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/cog-2837-rework-limit0-for-vector-adapters

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

.github/workflows/vector_db_tests.yml

+    name: LanceDB Tests
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Check out
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - name: Cognee Setup
+        uses: ./.github/actions/cognee_setup
+        with:
+          python-version: ${{ inputs.python-version }}
+
+      - name: Run LanceDB Tests
+        env:
+          ENV: 'dev'
+          LLM_MODEL: ${{ secrets.LLM_MODEL }}
+          LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
+          LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
+          LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
+          EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
+          EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
+          EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
+          EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
+        run: uv run python ./cognee/tests/test_lancedb.py


To fix this issue, set the least-privilege permissions for the GITHUB_TOKEN by adding a permissions block at the root of the workflow. This will apply the minimum necessary permissions (contents: read) to all jobs in the workflow, unless overridden at the job level. This approach is both the simplest and the most likely to continue existing functionality, as reading repository contents is required for workflow operations like checking out code, but no write-access is needed for the existing jobs shown.

The single best step is to add the following under the workflow name: and before the on: block:

permissions: contents: read

No additional code changes, imports, or definitions are needed beyond this single block.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (1)
268-276: None should not be treated as invalid — implement “no limit” semantics or clearly document the exception

Current logic treats None as invalid and forces limit=10, which conflicts with the PR objective where limit=None means “no limit.” If Neptune cannot truly do unlimited topK, at least emulate it by counting collection size and using that as the limit; otherwise, document that Neptune is an exception and why.

Apply this diff to honor None and return empty on non‑positive limits:
-        # In the case of excessive limit, or None / zero / negative value, limit will be set to 10.
-        if not limit or limit <= self._TOPK_LOWER_BOUND or limit > self._TOPK_UPPER_BOUND:
-            logger.warning(
-                "Provided limit (%s) is invalid (None, zero, negative, or exceeds maximum). "
-                "Defaulting to limit=10.",
-                limit,
-            )
-            limit = self._TOPK_UPPER_BOUND
+        # Handle limit semantics:
+        # - None => "no limit": try to count matching nodes in the collection
+        # - <= 0  => return empty
+        # - > max => cap (if an upper bound must be enforced for this backend)
+        if limit is None:
+            count_query = (
+                f"MATCH (n :{self._VECTOR_NODE_LABEL}) "
+                f"WHERE n.{self._COLLECTION_PREFIX} = $collection_name "
+                f"RETURN count(n) as cnt"
+            )
+            try:
+                cnt_res = self._client.query(count_query, {"collection_name": collection_name})
+                limit = int(cnt_res[0].get("cnt", 0)) if cnt_res else 0
+            except Exception as e:
+                logger.warning(
+                    'Failed to count collection "%s": %s. Defaulting to %d.',
+                    collection_name,
+                    e,
+                    self._TOPK_UPPER_BOUND,
+                )
+                limit = self._TOPK_UPPER_BOUND
+        elif limit <= self._TOPK_LOWER_BOUND:
+            return []
+        elif limit > self._TOPK_UPPER_BOUND:
+            logger.warning(
+                "Provided limit (%s) exceeds maximum. Capping to %d.",
+                limit,
+                self._TOPK_UPPER_BOUND,
+            )
+            limit = self._TOPK_UPPER_BOUND
If Neptune enforces a hard topK cap, please document it in the adapter and repo docs so users understand the deviation from the global contract.

Also applies to: 271-274
cognee/infrastructure/databases/vector/vector_db_interface.py (1)

109-126: Batch_search 'limit' should be Optional[int] in all adapters

vector_db_interface.batch_search declares limit: Optional[int]; several adapters use non-matching signatures—update them to limit: Optional[int] = None and adjust internal handling.

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py — batch_search(..., limit: int = 5, ...) (≈ line 442)

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py — batch_search(..., limit: int = None, ...) (≈ line 370)

cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py — batch_search(..., limit: int, ...) (≈ line 322)

cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py — batch_search(..., limit: int = None, ...) (≈ line 766)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

439-475: Align batch_search limit parameter with interface (use Optional[int] = None)

VectorDBInterface.batch_search declares limit: Optional[int] but cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py uses limit: int = 5 (around lines 438–444). Change to limit: Optional[int] = None to match the interface, or document/justify the intentional concrete default and update the interface/adapters consistently.

🧹 Nitpick comments (13)

cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (2)
237-239: Signature updated to Optional[int] — update docstring and batch_search typing for consistency

Docstring still says “limit (int)” and batch_search(..., limit: int, ...) remains int. Align them with Optional[int] to prevent confusion.

Apply this diff to the docstring line to reflect Optional:
-            - limit (int): The maximum number of results to return from the search.
+            - limit (Optional[int]): The maximum number of results to return from the search. None means "no limit".
Optionally, update batch_search signature to:
-    async def batch_search(
-        self, collection_name: str, query_texts: List[str], limit: int, with_vectors: bool = False
-    ):
+    async def batch_search(
+        self, collection_name: str, query_texts: List[str], limit: Optional[int], with_vectors: bool = False
+    ):
268-269: Allow topK > 10 — remove _TOPK_UPPER_BOUND check

Neptune docs state topK is a positive integer (default 10) with no documented maximum; remove/relax the upper-bound check in cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (lines 268–269) so callers can request values >10 while still defaulting to 10 when limit is falsy.
cognee/modules/retrieval/temporal_retriever.py (1)

131-133: Using limit=None aligns with new API — watch for performance on large corpora

Unlimited search may be expensive. If event sets are large, consider a guardrail (e.g., cap to a reasonable maximum, or make it configurable) to avoid latency spikes.

cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)

145-148: Switched to limit=None per new semantics — consider a configurable cap

Running unlimited searches across multiple collections can be heavy. Consider a config/env override to cap per-collection results in high-cardinality deployments.
cognee/tests/test_chromadb.py (1)
99-101: Add a complementary test for limit=0 returning empty

To fully validate the new contract, add a small test ensuring limit=0 yields no results.

Example to add alongside this test:
async def test_vector_engine_search_zero_limit():
    from cognee.infrastructure.databases.vector import get_vector_engine
    vector_engine = get_vector_engine()
    query_vector = (await vector_engine.embedding_engine.embed_text(["Alice"]))[0]
    result = await vector_engine.search(collection_name="Entity_name", query_vector=query_vector, limit=0)
    assert result == []
I can open a follow-up PR with this test if you’d like.
cognee/tests/test_pgvector.py (1)
100-102: Mirror limit=0 test here as well

Add a PGVector counterpart that asserts limit=0 returns [] to prevent regressions.

Suggested snippet:
async def test_vector_engine_search_zero_limit_pg():
    from cognee.infrastructure.databases.vector import get_vector_engine
    vector_engine = get_vector_engine()
    query_vector = (await vector_engine.embedding_engine.embed_text(["Alice"]))[0]
    result = await vector_engine.search(collection_name="Entity_name", query_vector=query_vector, limit=0)
    assert result == []
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (3)
314-323: Avoid COUNT(*) for None-limit; just omit the LIMIT

Counting adds an extra query and can be expensive on large tables. For None, simply don’t apply a LIMIT and guard the non-positive check.

Apply this diff to simplify:
-        if limit is None:
-            async with self.get_async_session() as session:
-                query = select(func.count()).select_from(PGVectorDataPoint)
-                result = await session.execute(query)
-                limit = result.scalar_one()
-
-        # If limit is still 0, no need to do the search, just return empty results
-        if limit <= 0:
+        # If a non-positive limit is explicitly provided, return no results
+        if limit is not None and limit <= 0:
             return []
334-336: Guard LIMIT application when limit is None

Preserve unlimited search semantics.
-            if limit > 0:
+            if limit is not None and limit > 0:
                 query = query.limit(limit)
366-372: Type annotation nit: make batch_search limit Optional[int]

Aligns with the interface and other adapters.
-        limit: int = None,
+        limit: Optional[int] = None,
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
221-229: Optional typing for query params (nit)

Consider annotating query_text and query_vector as Optional for consistency.
-        query_text: str = None,
-        query_vector: List[float] = None,
+        query_text: Optional[str] = None,
+        query_vector: Optional[List[float]] = None,
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
435-435: Use logging placeholders instead of f-strings

Improves performance and satisfies linters.

Apply:
-            logger.warning(f"Error in search: {str(e)}")
+            logger.warning("Error in search: %s", e)
cognee/tests/test_lancedb.py (2)
17-26: Avoid md5 in tests; use sha256

Silences security linters without cost.
-    import hashlib
+    import hashlib
@@
-        data_hash = hashlib.md5(encoded_text).hexdigest()
+        data_hash = hashlib.sha256(encoded_text).hexdigest()
187-190: Reduce brittleness of history length assertion

Hard-coding 8 can be flaky if steps change. Prefer a lower bound.
-    assert len(history) == 8, "Search history is not correct."
+    assert len(history) >= 8, "Search history is not correct."

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f14751b and 541377a.

⛔ Files ignored due to path filters (1)

.github/workflows/vector_db_tests.yml is excluded by !**/*.yml

📒 Files selected for processing (13)

cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (2 hunks)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (3 hunks)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (3 hunks)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (3 hunks)
cognee/infrastructure/databases/vector/vector_db_interface.py (3 hunks)
cognee/modules/graph/cognee_graph/CogneeGraph.py (1 hunks)
cognee/modules/retrieval/insights_retriever.py (1 hunks)
cognee/modules/retrieval/temporal_retriever.py (1 hunks)
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1 hunks)
cognee/tests/test_chromadb.py (2 hunks)
cognee/tests/test_lancedb.py (1 hunks)
cognee/tests/test_pgvector.py (2 hunks)
examples/python/cognee_simple_document_demo.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (7)

cognee/tests/test_lancedb.py (11)

cognee/shared/logging_utils.py (1)

get_logger (182-194)

cognee/infrastructure/files/storage/get_storage_config.py (1)

get_storage_config (13-18)

cognee/modules/data/models/Data.py (1)

Data (12-59)

cognee/modules/users/methods/get_default_user.py (1)

get_default_user (13-41)

cognee/infrastructure/databases/vector/vector_db_interface.py (3)

search (81-105)

prune (146-150)

get_connection (170-175)

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (3)

search (221-261)

prune (310-322)

get_connection (56-70)

cognee/modules/search/operations/get_history.py (1)

get_history (8-27)

cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)

delete_data_entity (268-317)

cognee/modules/users/permissions/methods/get_document_ids_for_user.py (1)

get_document_ids_for_user (10-72)

cognee/infrastructure/databases/vector/get_vector_engine.py (1)

get_vector_engine (5-7)

cognee/api/v1/config/config.py (3)

set_vector_db_config (159-168)

data_root_directory (36-38)

system_root_directory (18-33)

cognee/tests/test_pgvector.py (5)

cognee/infrastructure/databases/vector/vector_db_interface.py (1)

search (81-105)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

search (350-436)

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)

search (221-261)

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

search (297-364)

cognee/infrastructure/databases/vector/get_vector_engine.py (1)

get_vector_engine (5-7)

examples/python/cognee_simple_document_demo.py (5)

cognee/infrastructure/databases/vector/vector_db_interface.py (1)

prune (146-150)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

prune (535-548)

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)

prune (310-322)

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

prune (397-399)

cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (1)

prune (411-417)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

cognee/shared/logging_utils.py (1)

warning (176-176)

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)

get_async_session (87-96)

cognee/tests/test_chromadb.py (5)

cognee/infrastructure/databases/vector/vector_db_interface.py (1)

search (81-105)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

search (350-436)

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)

search (221-261)

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

search (297-364)

cognee/infrastructure/databases/vector/get_vector_engine.py (1)

get_vector_engine (5-7)

cognee/infrastructure/databases/vector/vector_db_interface.py (5)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

batch_search (438-512)

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)

batch_search (263-282)

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

batch_search (366-385)

cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (1)

batch_search (321-348)

cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (1)

batch_search (762-800)

🪛 Ruff (0.13.1)

cognee/tests/test_lancedb.py

25-25: Probable use of insecure hash functions in hashlib: md5

(S324)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py

435-435: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🪛 Pylint (3.3.8)

cognee/tests/test_lancedb.py

[refactor] 103-103: Too many local variables (18/15)

(R0914)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: End-to-End Tests / S3 Bucket Test
GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
GitHub Check: End-to-End Tests / Deduplication Test
GitHub Check: Basic Tests / Run Unit Tests
GitHub Check: End-to-End Tests / Server Start Test
GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
GitHub Check: CLI Tests / CLI Integration Tests
GitHub Check: CLI Tests / CLI Functionality Tests

🔇 Additional comments (15)

cognee/modules/retrieval/insights_retriever.py (1)

28-31: Optional[int] top_k aligns with new None-limit semantics — LGTM

Passing self.top_k through to vector search will correctly treat None as “no limit,” and list slicing with [: self.top_k] safely handles None. No changes needed.

examples/python/cognee_simple_document_demo.py (1)

18-20: Pre-run pruning for deterministic demos — LGTM

Good call to ensure a clean slate before add/cognify.

cognee/modules/graph/cognee_graph/CogneeGraph.py (1)

161-166: Edge search uses limit=None consistently — LGTM

This keeps edge mapping behavior consistent with node paths after the None-limit change.

cognee/tests/test_chromadb.py (2)

70-102: Solid None-limit coverage for ChromaDB

Test exercises None-limit end-to-end and asserts more-than-default results. Looks good.

201-202: Direct invocation in main() — LGTM

Ensures the new test runs in the demo flow as well.

cognee/tests/test_pgvector.py (2)

71-102: None-limit test for PGVector — LGTM

Covers the new contract on another backend.

210-211: Main includes the None-limit test — LGTM

Keeps parity with the ChromaDB test runner.

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

297-305: Limit now Optional[int] with default=15 — aligns with new semantics

Interface alignment looks good.

Please confirm all call sites stopped using limit=0 and now use limit=None for “no limit.”

cognee/infrastructure/databases/vector/vector_db_interface.py (1)

80-104: Interface updated to Optional[int] for limit — good

This matches the PR goal and downstream adapter changes.

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (3)

226-229: Limit now Optional[int] with default=15 — OK

Matches new semantics.

238-244: None-limit handling and non-positive short-circuit — OK

Early return on non-positive limits prevents backend errors. Good.

264-269: Batch search limit Optional[int] — OK

Signature aligns with interface and adapter usage.

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)

350-358: Limit now Optional[int] with default=15 — OK

Signature reflects new behavior.

389-395: None-limit resolves to collection count and non-positive short-circuit — OK

Consistent with API needs (n_results requires an int).

cognee/tests/test_lancedb.py (1)

99-101: Good assertion for None-limit path

Ensures we didn’t accidentally fall back to a default limit.

dexters1 · 2025-09-23T15:32:30Z

cognee/tests/unit/infrastructure/databases/vector/data/alice_in_wonderland.txt

@@ -0,0 +1,1772 @@
+CHAPTER I.
+Down the Rabbit-Hole


Can we reuse the Alice example from the examples data folder? So we don't store two copies

Actually let's not use this file at all for tests as it's big and expensive to process. Use something small like the NLP text from the test_data folder in the tests directory

I didn't want to make a nasty path to the Alice in examples, so I just put it closer, and was gonna ask about that.

And okay, I'll use some smaller example, although I wanted a bigger one just to see if it will return all results when there is more than a couple of rows.
Also, Alice is not the greatest example since it fails with Azure OpenAI because Azure flags it as inappropriate content. I forgot about this, and saw that tests failed on CI :)

EDIT: The extra copy of Alice is actually a mistake, I changed up the tests while working on this, so I forgot to delete it, it is actually not used anywhere, my bad.

siillee and others added 2 commits September 19, 2025 12:31

refactor: Change limit=0 to limit=None in vector search. Initial comm…

e3cde23

…it, still wip.

test:Add tests for limit=None search

9b6e1a8

siillee requested review from Vasilije1990, borisarzentar and dexters1 September 23, 2025 10:49

siillee self-assigned this Sep 23, 2025

siillee changed the title ~~feat: Rework limit=0 for vector adapters~~ refactor: Rework limit=0 for vector adapters Sep 23, 2025

github-advanced-security bot found potential problems Sep 23, 2025

View reviewed changes

chore: Delete unused test, fix formatting.

541377a

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

dexters1 reviewed Sep 23, 2025

View reviewed changes

Delete extra copy of alice text file

0dce0fb

Vasilije1990 approved these changes Sep 25, 2025

View reviewed changes

Vasilije1990 merged commit 235f28a into main Sep 25, 2025
123 of 128 checks passed

Vasilije1990 deleted the feature/cog-2837-rework-limit0-for-vector-adapters branch September 25, 2025 19:13

coderabbitai bot mentioned this pull request Sep 26, 2025

test: Use smaller files than Alice for tests. #1474

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: Rework limit=0 for vector adapters #1450

refactor: Rework limit=0 for vector adapters #1450

Uh oh!

siillee commented Sep 23, 2025 •

edited

Loading

Uh oh!

pull-checklist bot commented Sep 23, 2025

Uh oh!

coderabbitai bot commented Sep 23, 2025 •

edited

Loading

Uh oh!

Check warning

Copilot Autofix

coderabbitai bot left a comment

Uh oh!

dexters1 Sep 23, 2025

Uh oh!

dexters1 Sep 23, 2025 •

edited

Loading

Uh oh!

siillee Sep 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

@@ -1,4 +1,6 @@
             name: Reusable Vector DB Tests
+            permissions:
+              contents: read
             on:
               workflow_call:

refactor: Rework limit=0 for vector adapters #1450

refactor: Rework limit=0 for vector adapters #1450

Uh oh!

Conversation

siillee commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Changes Made

Testing

Screenshots/Videos (if applicable)

Pre-submission Checklist

Related Issues

Additional Notes

DCO Affirmation

Uh oh!

pull-checklist bot commented Sep 23, 2025

Please make sure all the checkboxes are checked:

Uh oh!

coderabbitai bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

Check warning

Copilot Autofix

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

dexters1 Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

dexters1 Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siillee Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

siillee commented Sep 23, 2025 •

edited

Loading

coderabbitai bot commented Sep 23, 2025 •

edited

Loading

dexters1 Sep 23, 2025 •

edited

Loading

siillee Sep 23, 2025 •

edited

Loading