feat: implement rerank step in collection search API #1162

Shushankranjan · 2025-07-24T06:33:44Z

Issue - [#1158]

Enhanced the create_search method in collection_service.py:
- Added reranking after all index operations (vector, fulltext, graph) are completed
- Implemented the rerank model selection logic to find models with the "default_for_rerank" tag
- Added error handling to gracefully fall back if reranking fails
Implemented the _get_default_rerank_model method:
- This method finds rerank models with the "default_for_rerank" tag
- Returns the first model found, or None if no model is available
Added the fallback strategy when no rerank model is available:
- Places graph search results first (as they typically have better quality)
- Sorts remaining vector and fulltext results by score in descending order

The implementation leverages the existing rerank functionality from the flow system, ensuring consistency across
the application. When a search is performed, the system will now:

Execute the search operations (vector, fulltext, graph)
Find a suitable rerank model with the "default_for_rerank" tag
Apply reranking to improve result quality
Fall back to a score-based ordering if reranking isn't available

This enhancement will significantly improve search result quality by leveraging the existing rerank
infrastructure in the direct search API.
Closes #1158

earayu

Hi, @Shushankranjan
You should use RerankNodeRunner.
You can find it in aperag/flow package, take rag_flow2.yaml as an example

Shushankranjan · 2025-07-24T06:41:48Z

Hi, @Shushankranjan You should use RerankNodeRunner.

Sure , i will change it to reranknoderunner

earayu

After finding the model, you need to verify if it has an API key configured. It's highly possible that an LLM provider might have tagged models without a configured API key.
You shouldn't create a separate FlowInstance just to run the rerank node. Instead, you should add the rerank node and its edges to the existing FlowInstance above.

earayu · 2025-07-25T05:41:16Z

aperag/service/collection_service.py

+                        # Re-execute the flow with the added rerank node
+                        result, _ = await engine.execute_flow(flow, {"query": query, "user": user})


You shouldn't re-execute the flow. Instead, you should check if there's a rerank node; if so, include it in the execution. If there's no rerank node, then use the default rerank strategy.

earayu · 2025-07-25T05:42:11Z

aperag/service/collection_service.py

+                else:
+                    # Log warning that the model doesn't have an API key configured
+                    import logging
+                    logger = logging.getLogger(__name__)
+                    logger.warning(f"Rerank model {rerank_model['model']} with provider {rerank_model['provider']} has no API key configured. Skipping reranking.")


If there's no API key, the fallback rerank strategy should be used.

earayu · 2025-07-25T05:43:43Z

aperag/service/collection_service.py

+                        flow.nodes[rerank_node_id] = NodeInstance(
+                            id=rerank_node_id,
+                            type="rerank",
+                            input_values={
+                                "model": rerank_model["model"],
+                                "model_service_provider": rerank_model["provider"],
+                                "custom_llm_provider": rerank_model.get("custom_llm_provider", rerank_model["provider"]),
+                                "docs": docs
+                            }
+                        )
+
+                        # Add an edge from the merge node to the rerank node
+                        flow.edges.append(Edge(source=end_node_id, target=rerank_node_id))


The rerank node shouldn't just be added; it should form a DAG after the merge node. Our flow process has actually implemented the combination of search + rerank; you can refer to rag_flow2.yaml.

earayu

@Shushankranjan Thanks for your contribution, but this implementation has several critical issues that need to be addressed:

1. Fundamental Logic Error - Duplicate DAG Execution

Your current approach executes the entire DAG twice:

First execution: vector_search → fulltext_search → graph_search → merge
Then you check for rerank models and re-execute the entire flow again

This is incorrect and wasteful. The correct approach should be:

Check rerank availability before building the flow
Build the appropriate flow structure once (with or without rerank node)
Execute engine.execute_flow() only once

2. Code Quality Issues

Duplicate Code: The fallback sorting logic is repeated 3 times (no API key, no rerank model, exception handling)
Excessive Nesting: Deep if-else nesting makes the code hard to maintain

if docs:
    if rerank_model:
        if api_key:
            try:
                # rerank logic
            except:
                # fallback logic (duplicate 1)
        else:
            # fallback logic (duplicate 2)
    else:
        # fallback logic (duplicate 3)

3. Correct Implementation Direction

The flow should be:

Check rerank model availability and API key upfront
Build flow structure based on availability:
- With rerank: search_nodes → merge → rerank
- Without rerank: search_nodes → merge (with fallback sorting in merge node)
Execute flow once and get results from the final node

Please refactor to eliminate duplicate execution, extract the fallback logic into a separate method, and reduce nesting complexity.

github-actions · 2025-09-15T00:26:05Z

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment

Implement rerank step in collection search API

29597c2

Shushankranjan requested a review from iziang as a code owner July 24, 2025 06:33

apecloud-bot added the size/M Denotes a PR that changes 30-99 lines. label Jul 24, 2025

Shushankranjan changed the title ~~Implement rerank step in collection search API~~ feat: implement rerank step in collection search API Jul 24, 2025

earayu requested changes Jul 24, 2025

View reviewed changes

Shushankranjan added 2 commits July 24, 2025 12:12

Changing it to RerankNodeRunner

fd48539

Changing it to RerankNodeRunner

e78c569

Shushankranjan requested a review from earayu July 24, 2025 06:57

earayu requested changes Jul 24, 2025

View reviewed changes

reranking functionality

23949b1

Shushankranjan requested a review from earayu July 24, 2025 13:17

earayu requested changes Jul 25, 2025

View reviewed changes

changes made

6237035

apecloud-bot added size/L Denotes a PR that changes 100-499 lines. and removed size/M Denotes a PR that changes 30-99 lines. labels Jul 25, 2025

Shushankranjan requested a review from earayu July 25, 2025 12:49

earayu requested changes Jul 28, 2025

View reviewed changes

github-actions bot added the Stale label Sep 15, 2025

earayu closed this Sep 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement rerank step in collection search API #1162

feat: implement rerank step in collection search API #1162

Uh oh!

Shushankranjan commented Jul 24, 2025 •

edited

Loading

Uh oh!

earayu left a comment •

edited

Loading

Uh oh!

Shushankranjan commented Jul 24, 2025

Uh oh!

earayu left a comment

Uh oh!

earayu Jul 25, 2025

Uh oh!

earayu Jul 25, 2025

Uh oh!

earayu Jul 25, 2025

Uh oh!

earayu left a comment

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

Uh oh!

		# Re-execute the flow with the added rerank node
		result, _ = await engine.execute_flow(flow, {"query": query, "user": user})

feat: implement rerank step in collection search API #1162

feat: implement rerank step in collection search API #1162

Uh oh!

Conversation

Shushankranjan commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

earayu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shushankranjan commented Jul 24, 2025

Uh oh!

earayu left a comment

Choose a reason for hiding this comment

Uh oh!

earayu Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

earayu Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

earayu Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

earayu left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

Uh oh!

Shushankranjan commented Jul 24, 2025 •

edited

Loading

earayu left a comment •

edited

Loading