Skip to content

Conversation

@siillee
Copy link
Contributor

@siillee siillee commented Sep 23, 2025

Description

Until now, limit=0 in vector search meant that there is no limit and we should return everything. This caused confusion and errors, so now it is reworked so that limit=None means no limit on the search. If someone puts limit=0, there will be no results returned, as it makes more sense and is less error prone.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Code refactoring
  • Performance improvement
  • Other (please specify):

Changes Made

Testing

Screenshots/Videos (if applicable)

Pre-submission Checklist

  • I have tested my changes thoroughly before submitting this PR
  • This PR contains minimal changes necessary to address the issue/feature
  • My code follows the project's coding standards and style guidelines
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if applicable)
  • All new and existing tests pass
  • I have searched existing PRs to ensure this change hasn't been submitted already
  • I have linked any relevant issues in the description
  • My commits have clear and descriptive messages

Related Issues

Additional Notes

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

@pull-checklist
Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 23, 2025

Walkthrough

Updates search APIs across vector and hybrid adapters to accept Optional[int] limits, adding explicit None handling. Call sites switch from limit=0 to limit=None. Interface types and related docstrings are aligned. Tests are added for None-limit behavior across ChromaDB, LanceDB, and PGVector. A demo adds pre-run prune steps.

Changes

Cohort / File(s) Summary
Vector interface
cognee/infrastructure/databases/vector/vector_db_interface.py
search and batch_search limit types changed to Optional[int]; docstrings updated; minor formatting.
Vector adapters
.../vector/chromadb/ChromaDBAdapter.py, .../vector/lancedb/LanceDBAdapter.py, .../vector/pgvector/PGVectorAdapter.py
search (and LanceDB batch_search) now accept Optional[int]. New handling for limit=None (resolve via count), early return on non-positive limits; ChromaDB logs downgraded from error to warning on exceptions.
Hybrid adapter
.../hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py
search limit type to Optional[int]; logs/comments updated to mention None as invalid input defaulting to 10.
Retrieval and graph
cognee/modules/graph/cognee_graph/CogneeGraph.py, cognee/modules/retrieval/temporal_retriever.py, cognee/modules/retrieval/utils/brute_force_triplet_search.py, cognee/modules/retrieval/insights_retriever.py
Call sites switch limit=0 to limit=None; InsightsRetriever.__init__ top_k type to Optional[int].
Tests
cognee/tests/test_chromadb.py, cognee/tests/test_lancedb.py, cognee/tests/test_pgvector.py
Add async tests validating limit=None returns large result sets; LanceDB test suite includes file deletion and user-doc retrieval checks; main runners invoke new tests.
Example
examples/python/cognee_simple_document_demo.py
Adds prune_data() and prune_system(metadata=True) before add/cognify.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant VectorDBInterface as VectorDBInterface
  participant Adapter as SpecificAdapter
  participant Store as UnderlyingDB

  Caller->>VectorDBInterface: search(collection, query, limit=None|n<=0|n>0)
  VectorDBInterface->>Adapter: search(..., limit)

  alt limit is None
    Adapter->>Store: count(collection)
    Store-->>Adapter: total_count
    alt total_count <= 0
      Adapter-->>VectorDBInterface: []
    else total_count > 0
      Adapter->>Store: query(limit=total_count)
      Store-->>Adapter: results
      Adapter-->>VectorDBInterface: results
    end
  else limit <= 0
    Adapter-->>VectorDBInterface: []
  else limit > 0
    Adapter->>Store: query(limit=limit)
    Store-->>Adapter: results
    Adapter-->>VectorDBInterface: results
  end

  note over Adapter,Store: Some adapters adjust logging (e.g., warn on exceptions)
Loading
sequenceDiagram
  autonumber
  participant Graph as CogneeGraph
  participant Vector as Vector Engine
  Graph->>Vector: search(collection="...", limit=None)
  Vector-->>Graph: results (all/expanded set)
  Graph->>Graph: map distances to edges
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

run-checks

Suggested reviewers

  • Vasilije1990
  • borisarzentar
  • lxobr

Poem

I hopped through limits, none to see,
A warren wide of query spree.
From graphy burrows to vector glades,
I fetched the lot—no partial shades.
With ears alert and tests in line,
I nibble bytes—results divine! 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description explains the intent and high-level behavior change (None becomes "no limit", 0 returns no results) but does not fully follow the repository template: the "Changes Made" and "Testing" sections are empty and there is no per-file change list, test run details, or related-issue references, which reduces reviewer confidence and traceability. Please complete the template by listing the specific changes (key files and what was changed), describe how you tested the change including commands or CI results and any new tests added, and link related issues or docs; also verify the Pre-submission checklist items and DCO affirmation are fully addressed so reviewers can validate the change quickly.
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "refactor: Rework limit=0 for vector adapters" is a concise, single-sentence summary that accurately describes the primary change (reworking limit semantics across vector adapters) and contains no noisy details; it clearly signals a refactor and the intended behavioral change for reviewers scanning history.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/cog-2837-rework-limit0-for-vector-adapters

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@siillee siillee changed the title feat: Rework limit=0 for vector adapters refactor: Rework limit=0 for vector adapters Sep 23, 2025
Comment on lines +106 to +130
name: LanceDB Tests
runs-on: ubuntu-22.04
steps:
- name: Check out
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Cognee Setup
uses: ./.github/actions/cognee_setup
with:
python-version: ${{ inputs.python-version }}

- name: Run LanceDB Tests
env:
ENV: 'dev'
LLM_MODEL: ${{ secrets.LLM_MODEL }}
LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }}
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }}
EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }}
EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }}
EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }}
EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }}
run: uv run python ./cognee/tests/test_lancedb.py

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI about 1 month ago

To fix this issue, set the least-privilege permissions for the GITHUB_TOKEN by adding a permissions block at the root of the workflow. This will apply the minimum necessary permissions (contents: read) to all jobs in the workflow, unless overridden at the job level. This approach is both the simplest and the most likely to continue existing functionality, as reading repository contents is required for workflow operations like checking out code, but no write-access is needed for the existing jobs shown.

The single best step is to add the following under the workflow name: and before the on: block:

permissions:
  contents: read

No additional code changes, imports, or definitions are needed beyond this single block.


Suggested changeset 1
.github/workflows/vector_db_tests.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/vector_db_tests.yml b/.github/workflows/vector_db_tests.yml
--- a/.github/workflows/vector_db_tests.yml
+++ b/.github/workflows/vector_db_tests.yml
@@ -1,4 +1,6 @@
 name: Reusable Vector DB Tests
+permissions:
+  contents: read
 
 on:
   workflow_call:
EOF
@@ -1,4 +1,6 @@
name: Reusable Vector DB Tests
permissions:
contents: read

on:
workflow_call:
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (1)

268-276: None should not be treated as invalid — implement “no limit” semantics or clearly document the exception

Current logic treats None as invalid and forces limit=10, which conflicts with the PR objective where limit=None means “no limit.” If Neptune cannot truly do unlimited topK, at least emulate it by counting collection size and using that as the limit; otherwise, document that Neptune is an exception and why.

Apply this diff to honor None and return empty on non‑positive limits:

-        # In the case of excessive limit, or None / zero / negative value, limit will be set to 10.
-        if not limit or limit <= self._TOPK_LOWER_BOUND or limit > self._TOPK_UPPER_BOUND:
-            logger.warning(
-                "Provided limit (%s) is invalid (None, zero, negative, or exceeds maximum). "
-                "Defaulting to limit=10.",
-                limit,
-            )
-            limit = self._TOPK_UPPER_BOUND
+        # Handle limit semantics:
+        # - None => "no limit": try to count matching nodes in the collection
+        # - <= 0  => return empty
+        # - > max => cap (if an upper bound must be enforced for this backend)
+        if limit is None:
+            count_query = (
+                f"MATCH (n :{self._VECTOR_NODE_LABEL}) "
+                f"WHERE n.{self._COLLECTION_PREFIX} = $collection_name "
+                f"RETURN count(n) as cnt"
+            )
+            try:
+                cnt_res = self._client.query(count_query, {"collection_name": collection_name})
+                limit = int(cnt_res[0].get("cnt", 0)) if cnt_res else 0
+            except Exception as e:
+                logger.warning(
+                    'Failed to count collection "%s": %s. Defaulting to %d.',
+                    collection_name,
+                    e,
+                    self._TOPK_UPPER_BOUND,
+                )
+                limit = self._TOPK_UPPER_BOUND
+        elif limit <= self._TOPK_LOWER_BOUND:
+            return []
+        elif limit > self._TOPK_UPPER_BOUND:
+            logger.warning(
+                "Provided limit (%s) exceeds maximum. Capping to %d.",
+                limit,
+                self._TOPK_UPPER_BOUND,
+            )
+            limit = self._TOPK_UPPER_BOUND

If Neptune enforces a hard topK cap, please document it in the adapter and repo docs so users understand the deviation from the global contract.

Also applies to: 271-274

cognee/infrastructure/databases/vector/vector_db_interface.py (1)

109-126: Batch_search 'limit' should be Optional[int] in all adapters

vector_db_interface.batch_search declares limit: Optional[int]; several adapters use non-matching signatures—update them to limit: Optional[int] = None and adjust internal handling.

  • cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py — batch_search(..., limit: int = 5, ...) (≈ line 442)
  • cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py — batch_search(..., limit: int = None, ...) (≈ line 370)
  • cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py — batch_search(..., limit: int, ...) (≈ line 322)
  • cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py — batch_search(..., limit: int = None, ...) (≈ line 766)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

439-475: Align batch_search limit parameter with interface (use Optional[int] = None)

VectorDBInterface.batch_search declares limit: Optional[int] but cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py uses limit: int = 5 (around lines 438–444). Change to limit: Optional[int] = None to match the interface, or document/justify the intentional concrete default and update the interface/adapters consistently.

🧹 Nitpick comments (13)
cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (2)

237-239: Signature updated to Optional[int] — update docstring and batch_search typing for consistency

Docstring still says “limit (int)” and batch_search(..., limit: int, ...) remains int. Align them with Optional[int] to prevent confusion.

Apply this diff to the docstring line to reflect Optional:

-            - limit (int): The maximum number of results to return from the search.
+            - limit (Optional[int]): The maximum number of results to return from the search. None means "no limit".

Optionally, update batch_search signature to:

-    async def batch_search(
-        self, collection_name: str, query_texts: List[str], limit: int, with_vectors: bool = False
-    ):
+    async def batch_search(
+        self, collection_name: str, query_texts: List[str], limit: Optional[int], with_vectors: bool = False
+    ):

268-269: Allow topK > 10 — remove _TOPK_UPPER_BOUND check

Neptune docs state topK is a positive integer (default 10) with no documented maximum; remove/relax the upper-bound check in cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (lines 268–269) so callers can request values >10 while still defaulting to 10 when limit is falsy.

cognee/modules/retrieval/temporal_retriever.py (1)

131-133: Using limit=None aligns with new API — watch for performance on large corpora

Unlimited search may be expensive. If event sets are large, consider a guardrail (e.g., cap to a reasonable maximum, or make it configurable) to avoid latency spikes.

cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)

145-148: Switched to limit=None per new semantics — consider a configurable cap

Running unlimited searches across multiple collections can be heavy. Consider a config/env override to cap per-collection results in high-cardinality deployments.

cognee/tests/test_chromadb.py (1)

99-101: Add a complementary test for limit=0 returning empty

To fully validate the new contract, add a small test ensuring limit=0 yields no results.

Example to add alongside this test:

async def test_vector_engine_search_zero_limit():
    from cognee.infrastructure.databases.vector import get_vector_engine
    vector_engine = get_vector_engine()
    query_vector = (await vector_engine.embedding_engine.embed_text(["Alice"]))[0]
    result = await vector_engine.search(collection_name="Entity_name", query_vector=query_vector, limit=0)
    assert result == []

I can open a follow-up PR with this test if you’d like.

cognee/tests/test_pgvector.py (1)

100-102: Mirror limit=0 test here as well

Add a PGVector counterpart that asserts limit=0 returns [] to prevent regressions.

Suggested snippet:

async def test_vector_engine_search_zero_limit_pg():
    from cognee.infrastructure.databases.vector import get_vector_engine
    vector_engine = get_vector_engine()
    query_vector = (await vector_engine.embedding_engine.embed_text(["Alice"]))[0]
    result = await vector_engine.search(collection_name="Entity_name", query_vector=query_vector, limit=0)
    assert result == []
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (3)

314-323: Avoid COUNT(*) for None-limit; just omit the LIMIT

Counting adds an extra query and can be expensive on large tables. For None, simply don’t apply a LIMIT and guard the non-positive check.

Apply this diff to simplify:

-        if limit is None:
-            async with self.get_async_session() as session:
-                query = select(func.count()).select_from(PGVectorDataPoint)
-                result = await session.execute(query)
-                limit = result.scalar_one()
-
-        # If limit is still 0, no need to do the search, just return empty results
-        if limit <= 0:
+        # If a non-positive limit is explicitly provided, return no results
+        if limit is not None and limit <= 0:
             return []

334-336: Guard LIMIT application when limit is None

Preserve unlimited search semantics.

-            if limit > 0:
+            if limit is not None and limit > 0:
                 query = query.limit(limit)

366-372: Type annotation nit: make batch_search limit Optional[int]

Aligns with the interface and other adapters.

-        limit: int = None,
+        limit: Optional[int] = None,
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)

221-229: Optional typing for query params (nit)

Consider annotating query_text and query_vector as Optional for consistency.

-        query_text: str = None,
-        query_vector: List[float] = None,
+        query_text: Optional[str] = None,
+        query_vector: Optional[List[float]] = None,
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

435-435: Use logging placeholders instead of f-strings

Improves performance and satisfies linters.

Apply:

-            logger.warning(f"Error in search: {str(e)}")
+            logger.warning("Error in search: %s", e)
cognee/tests/test_lancedb.py (2)

17-26: Avoid md5 in tests; use sha256

Silences security linters without cost.

-    import hashlib
+    import hashlib
@@
-        data_hash = hashlib.md5(encoded_text).hexdigest()
+        data_hash = hashlib.sha256(encoded_text).hexdigest()

187-190: Reduce brittleness of history length assertion

Hard-coding 8 can be flaky if steps change. Prefer a lower bound.

-    assert len(history) == 8, "Search history is not correct."
+    assert len(history) >= 8, "Search history is not correct."
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f14751b and 541377a.

⛔ Files ignored due to path filters (1)
  • .github/workflows/vector_db_tests.yml is excluded by !**/*.yml
📒 Files selected for processing (13)
  • cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (2 hunks)
  • cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (3 hunks)
  • cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (3 hunks)
  • cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (3 hunks)
  • cognee/infrastructure/databases/vector/vector_db_interface.py (3 hunks)
  • cognee/modules/graph/cognee_graph/CogneeGraph.py (1 hunks)
  • cognee/modules/retrieval/insights_retriever.py (1 hunks)
  • cognee/modules/retrieval/temporal_retriever.py (1 hunks)
  • cognee/modules/retrieval/utils/brute_force_triplet_search.py (1 hunks)
  • cognee/tests/test_chromadb.py (2 hunks)
  • cognee/tests/test_lancedb.py (1 hunks)
  • cognee/tests/test_pgvector.py (2 hunks)
  • examples/python/cognee_simple_document_demo.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
cognee/tests/test_lancedb.py (11)
cognee/shared/logging_utils.py (1)
  • get_logger (182-194)
cognee/infrastructure/files/storage/get_storage_config.py (1)
  • get_storage_config (13-18)
cognee/modules/data/models/Data.py (1)
  • Data (12-59)
cognee/modules/users/methods/get_default_user.py (1)
  • get_default_user (13-41)
cognee/infrastructure/databases/vector/vector_db_interface.py (3)
  • search (81-105)
  • prune (146-150)
  • get_connection (170-175)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (3)
  • search (221-261)
  • prune (310-322)
  • get_connection (56-70)
cognee/modules/search/operations/get_history.py (1)
  • get_history (8-27)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
  • delete_data_entity (268-317)
cognee/modules/users/permissions/methods/get_document_ids_for_user.py (1)
  • get_document_ids_for_user (10-72)
cognee/infrastructure/databases/vector/get_vector_engine.py (1)
  • get_vector_engine (5-7)
cognee/api/v1/config/config.py (3)
  • set_vector_db_config (159-168)
  • data_root_directory (36-38)
  • system_root_directory (18-33)
cognee/tests/test_pgvector.py (5)
cognee/infrastructure/databases/vector/vector_db_interface.py (1)
  • search (81-105)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
  • search (350-436)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
  • search (221-261)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
  • search (297-364)
cognee/infrastructure/databases/vector/get_vector_engine.py (1)
  • get_vector_engine (5-7)
examples/python/cognee_simple_document_demo.py (5)
cognee/infrastructure/databases/vector/vector_db_interface.py (1)
  • prune (146-150)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
  • prune (535-548)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
  • prune (310-322)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
  • prune (397-399)
cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (1)
  • prune (411-417)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
cognee/shared/logging_utils.py (1)
  • warning (176-176)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
cognee/infrastructure/databases/relational/sqlalchemy/SqlAlchemyAdapter.py (1)
  • get_async_session (87-96)
cognee/tests/test_chromadb.py (5)
cognee/infrastructure/databases/vector/vector_db_interface.py (1)
  • search (81-105)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
  • search (350-436)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
  • search (221-261)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
  • search (297-364)
cognee/infrastructure/databases/vector/get_vector_engine.py (1)
  • get_vector_engine (5-7)
cognee/infrastructure/databases/vector/vector_db_interface.py (5)
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
  • batch_search (438-512)
cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (1)
  • batch_search (263-282)
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
  • batch_search (366-385)
cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (1)
  • batch_search (321-348)
cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (1)
  • batch_search (762-800)
🪛 Ruff (0.13.1)
cognee/tests/test_lancedb.py

25-25: Probable use of insecure hash functions in hashlib: md5

(S324)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py

435-435: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🪛 Pylint (3.3.8)
cognee/tests/test_lancedb.py

[refactor] 103-103: Too many local variables (18/15)

(R0914)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: End-to-End Tests / S3 Bucket Test
  • GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
  • GitHub Check: End-to-End Tests / Deduplication Test
  • GitHub Check: Basic Tests / Run Unit Tests
  • GitHub Check: End-to-End Tests / Server Start Test
  • GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
  • GitHub Check: CLI Tests / CLI Integration Tests
  • GitHub Check: CLI Tests / CLI Functionality Tests
🔇 Additional comments (15)
cognee/modules/retrieval/insights_retriever.py (1)

28-31: Optional[int] top_k aligns with new None-limit semantics — LGTM

Passing self.top_k through to vector search will correctly treat None as “no limit,” and list slicing with [: self.top_k] safely handles None. No changes needed.

examples/python/cognee_simple_document_demo.py (1)

18-20: Pre-run pruning for deterministic demos — LGTM

Good call to ensure a clean slate before add/cognify.

cognee/modules/graph/cognee_graph/CogneeGraph.py (1)

161-166: Edge search uses limit=None consistently — LGTM

This keeps edge mapping behavior consistent with node paths after the None-limit change.

cognee/tests/test_chromadb.py (2)

70-102: Solid None-limit coverage for ChromaDB

Test exercises None-limit end-to-end and asserts more-than-default results. Looks good.


201-202: Direct invocation in main() — LGTM

Ensures the new test runs in the demo flow as well.

cognee/tests/test_pgvector.py (2)

71-102: None-limit test for PGVector — LGTM

Covers the new contract on another backend.


210-211: Main includes the None-limit test — LGTM

Keeps parity with the ChromaDB test runner.

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

297-305: Limit now Optional[int] with default=15 — aligns with new semantics

Interface alignment looks good.

Please confirm all call sites stopped using limit=0 and now use limit=None for “no limit.”

cognee/infrastructure/databases/vector/vector_db_interface.py (1)

80-104: Interface updated to Optional[int] for limit — good

This matches the PR goal and downstream adapter changes.

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (3)

226-229: Limit now Optional[int] with default=15 — OK

Matches new semantics.


238-244: None-limit handling and non-positive short-circuit — OK

Early return on non-positive limits prevents backend errors. Good.


264-269: Batch search limit Optional[int] — OK

Signature aligns with interface and adapter usage.

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)

350-358: Limit now Optional[int] with default=15 — OK

Signature reflects new behavior.


389-395: None-limit resolves to collection count and non-positive short-circuit — OK

Consistent with API needs (n_results requires an int).

cognee/tests/test_lancedb.py (1)

99-101: Good assertion for None-limit path

Ensures we didn’t accidentally fall back to a default limit.

@@ -0,0 +1,1772 @@
CHAPTER I.
Down the Rabbit-Hole
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the Alice example from the examples data folder? So we don't store two copies

Copy link
Collaborator

@dexters1 dexters1 Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually let's not use this file at all for tests as it's big and expensive to process. Use something small like the NLP text from the test_data folder in the tests directory

Copy link
Contributor Author

@siillee siillee Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to make a nasty path to the Alice in examples, so I just put it closer, and was gonna ask about that.

And okay, I'll use some smaller example, although I wanted a bigger one just to see if it will return all results when there is more than a couple of rows.
Also, Alice is not the greatest example since it fails with Azure OpenAI because Azure flags it as inappropriate content. I forgot about this, and saw that tests failed on CI :)

EDIT: The extra copy of Alice is actually a mistake, I changed up the tests while working on this, so I forgot to delete it, it is actually not used anywhere, my bad.

@Vasilije1990 Vasilije1990 merged commit 235f28a into main Sep 25, 2025
123 of 128 checks passed
@Vasilije1990 Vasilije1990 deleted the feature/cog-2837-rework-limit0-for-vector-adapters branch September 25, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants