Add cuvs-bench-elastic: HTTP backend for Elasticsearch GPU vector search by afourniernv · Pull Request #1907 · rapidsai/cuvs

afourniernv · 2026-03-10T22:31:01Z

Introduce cuvs-bench-elastic as an optional plugin for cuvs-bench that provides an Elasticsearch backend. The backend communicates with Elasticsearch via HTTP and supports HNSW indexing with optional GPU acceleration when using the Elasticsearch GPU image (cuVS-accelerated vector search).

Add cuvs_bench_elastic package with backend and config loader entry points
Extend cuvs_bench registry and search spaces for pluggable backends
Add elastic and integration optional dependencies to cuvs-bench
Add modularization tests and integration test scaffolding (disabled until CI has ES GPU image, cuVS libs, and GPU runner)

copy-pr-bot · 2026-03-10T22:31:05Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

jnke2016 · 2026-03-11T00:08:04Z

python/cuvs_bench/cuvs_bench/backends/registry.py

+                ep.load()()
+            except ImportError as e:
+                if "elasticsearch" in str(e).lower() or "elasticsearch" in str(e):
+                    raise ImportError(


Nice. This is what I discussed with @cjnolet pertaining to lazy import and used Milvus as an example

class MilvusBackend(BenchmarkBackend): def __init__(self, config: Dict[str, Any]): super().__init__(config) try: from pymilvus import connections, Collection except ImportError: raise ImportError( "pymilvus is required for MilvusBackend. " "Install with: pip install pymilvus" ) connections.connect(host=config["host"], port=config["port"])

Introduce cuvs-bench-elastic as an optional plugin for cuvs-bench that provides an Elasticsearch backend. The backend communicates with Elasticsearch via HTTP and supports HNSW indexing with optional GPU acceleration when using the Elasticsearch GPU image (cuVS-accelerated vector search). - Add cuvs_bench_elastic package with backend and config loader entry points - Extend cuvs_bench registry and search spaces for pluggable backends - Add elastic and integration optional dependencies to cuvs-bench - Add modularization tests and integration test scaffolding (disabled until CI has ES GPU image, cuVS libs, and GPU runner) Signed-off-by: Alex Fournier <afournier@nvidia.com>

Use single-doc format (_index, _id, vector_field) instead of two-part NDJSON (index action + source) so ES accepts the bulk request. Signed-off-by: Alex Fournier <afournier@nvidia.com>

Expose ELASTIC constant and convenience wrappers for build-only, search-only, or full benchmark runs. Signed-off-by: Alex Fournier <afournier@nvidia.com>

- Document run_build, run_search, run_benchmark convenience API - Document ELASTIC constant and orchestrator usage - Add username/password support in config loader (converts to basic_auth) Signed-off-by: Alex Fournier <afournier@nvidia.com>

cjnolet · 2026-03-25T21:22:17Z

python/cuvs_bench/cuvs_bench/tests/integration/conftest.py

+
+
+@pytest.fixture(scope="module")
+def elasticsearch_container():


Why a container? We should just be able to assume an elasticsearch cluster is ready to accept requests, right? cuvs-bench doesn't need to be self-contained, just be able to send the proper requests to an existing elasticsearch cluster. Or am I missing a big detail here?

This just seems unnecessary.

cjnolet · 2026-03-25T21:24:33Z

python/cuvs_bench/cuvs_bench/backends/elasticsearch.py

+class ElasticBackend(BenchmarkBackend):
+    """Elasticsearch GPU backend for vector benchmarking."""
+
+    def __init__(self, config: Dict[str, Any]):


Oh I see- is the container just for the integration tests?

cjnolet · 2026-04-09T16:15:38Z

python/cuvs_bench_elastic/pyproject.toml

@@ -0,0 +1,50 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026, NVIDIA CORPORATION.
+# SPDX-License-Identifier: Apache-2.0
+


We can't have different packages deployed for every backend. This significantly adds to the maintenance burden as every new package now needs to be versioned, deloyed, and audited for dependency trails. Let's consolidate this into the existing cuvs-bench package. All dependencies on anything elasticsearch or cuvs-lucene should be soft dependendencies in Python (that is, they should test if they can import the package and if they can't, throw a warning and fail gracefully).

- New `cuvs_bench_elastic` package with HTTP backend for Elasticsearch GPU vector search (HNSW, int8_hnsw, int4_hnsw, bbq_hnsw index types) - Supports `pip install cuvs-bench[elastic]` without a separate PyPI publish: `cuvs_bench` bundles the plugin via setuptools packages.find - Plugin registers via entry points (`cuvs_bench.backends` / `cuvs_bench.config_loaders`) — no changes to core cuvs-bench required - `ElasticConfigLoader` reads shared `datasets.yaml` from cuvs_bench and `elastic.yaml` from the plugin config; supports sweep and tune modes - `build()` checks index existence before file validation so `force=False` returns immediately without requiring the base file on disk - Removed testcontainers-based integration tests; added unit tests for pre-flight failure, force=False skip, dry-run, helper functions - `elasticsearch` client is an optional dep (`cuvs-bench[elastic]` extra)

The separate cuvs_bench_elastic package required bundling via packages.find and complicated the build. Instead, keep the backend inside cuvs_bench and use entry points pointing back into the same package so the backend only registers when elasticsearch is installed. - git mv backend.py to cuvs_bench/backends/elasticsearch.py - git mv elastic.yaml to cuvs_bench/config/algos/ - Fix imports to relative paths - Fix _get_elastic_config_path() to use ../config from backends/ - Update pyproject.toml: entry points -> cuvs_bench.backends.elasticsearch:register - Remove packages.find (no longer needed) - Remove cuvs_bench_elastic/ package entirely DX unchanged: pip install cuvs-bench[elastic] One package, one publish pipeline. Signed-off-by: Alex Fournier <afournier@nvidia.com>

…ch.py Restores the high-level API that was previously in cuvs_bench_elastic/__init__.py so existing demo scripts continue to work after the module consolidation. Signed-off-by: Alex Fournier <afournier@nvidia.com>

github-project-automation bot added this to Unstructured Data Processing Mar 10, 2026

cjnolet assigned afourniernv Mar 10, 2026

cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 10, 2026

cjnolet moved this to In Progress in Unstructured Data Processing Mar 10, 2026

jnke2016 reviewed Mar 11, 2026

View reviewed changes

afourniernv added 3 commits March 20, 2026 09:39

Fix bulk indexing format for Elasticsearch

80c104f

Use single-doc format (_index, _id, vector_field) instead of two-part NDJSON (index action + source) so ES accepts the bulk request. Signed-off-by: Alex Fournier <afournier@nvidia.com>

Add run_build, run_search, run_benchmark API for elastic backend

7b25521

Expose ELASTIC constant and convenience wrappers for build-only, search-only, or full benchmark runs. Signed-off-by: Alex Fournier <afournier@nvidia.com>

afourniernv force-pushed the fea-1856-cuvs-lucene-backend branch from ec0b3e3 to 7b25521 Compare March 20, 2026 16:45

cjnolet reviewed Mar 25, 2026

View reviewed changes

cjnolet reviewed Apr 9, 2026

View reviewed changes

jrbourbeau mentioned this pull request Apr 10, 2026

[REVIEW] Add OpenSearch backend to cuvs-bench #2012

Open

afourniernv added 2 commits April 12, 2026 18:27

Revert unrelated change to get_dataset/__main__.py

087289d

afourniernv force-pushed the fea-1856-cuvs-lucene-backend branch from dca9c6f to 087289d Compare April 13, 2026 01:46

afourniernv added 2 commits April 13, 2026 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cuvs-bench-elastic: HTTP backend for Elasticsearch GPU vector search#1907

Add cuvs-bench-elastic: HTTP backend for Elasticsearch GPU vector search#1907
afourniernv wants to merge 8 commits intorapidsai:mainfrom
afourniernv:fea-1856-cuvs-lucene-backend

afourniernv commented Mar 10, 2026

Uh oh!

copy-pr-bot bot commented Mar 10, 2026

Uh oh!

jnke2016 Mar 11, 2026

Uh oh!

cjnolet Mar 25, 2026

Uh oh!

cjnolet Mar 25, 2026

Uh oh!

cjnolet Mar 25, 2026

Uh oh!

cjnolet Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@pytest.fixture(scope="module")
		def elasticsearch_container():

		@@ -0,0 +1,50 @@
		# SPDX-FileCopyrightText: Copyright (c) 2024-2026, NVIDIA CORPORATION.
		# SPDX-License-Identifier: Apache-2.0

Conversation

afourniernv commented Mar 10, 2026

Uh oh!

copy-pr-bot bot commented Mar 10, 2026

Uh oh!

jnke2016 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

cjnolet Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

cjnolet Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

cjnolet Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

cjnolet Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants