Skip to content

Commit 65fa715

Browse files
committed
Updated documentation for the example
1 parent d14c270 commit 65fa715

File tree

1 file changed

+26
-15
lines changed

1 file changed

+26
-15
lines changed

examples/vector_search.py

Lines changed: 26 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -101,19 +101,6 @@
101101
The SimilaritySearch aggregator is used to perform a Knn vector search on a
102102
cache in the same way that normal Coherence aggregators are used.
103103
104-
HNSW Indexing
105-
=============
106-
107-
Coherence includes an implementation of the HNSW index that can be used to
108-
speed up searches. The hierarchical navigable small world (HNSW) algorithm is
109-
a graph-based approximate nearest neighbor search technique
110-
111-
An index is added to a cache in Coherence by calling the add_index method on
112-
the cache. In this example, a HNSWIndex is created with a ValueExtractor that
113-
will extract the vector field from the cache value and an int parameter that
114-
specifies the number of dimensions the vector has.
115-
116-
117104
"""
118105

119106

@@ -218,7 +205,8 @@ async def search(self, search_text: str, count: int, filter: Filter = Filters.al
218205
vector: FloatVector = self.vectorize(search_text)
219206
# create the SimilaritySearch aggregator using the above vector and count
220207
search: SimilaritySearch = SimilaritySearch(self.VALUE_EXTRACTOR, vector, count)
221-
# perform the k-nn search using the above aggregator and optional filter
208+
# perform the k-nn search using the above aggregator and optional filter and
209+
# returns a list of QueryResults
222210
return await self.movies.aggregate(search, filter=filter)
223211

224212

@@ -228,19 +216,42 @@ async def search(self, search_text: str, count: int, filter: Filter = Filters.al
228216

229217
async def do_run() -> None:
230218

219+
# Create a new session to the Coherence server using the default host and
220+
# port i.e. localhost:1408
231221
session: Session = await Session.create()
222+
# Create a NamedMao called movies with key of str and value of dict
232223
movie_db: NamedMap[str, dict] = await session.get_map("movies")
233224
try:
225+
# an instance of class MovieRepository is create passing the above
226+
# NamedMap as a parameter
234227
movies_repo = MovieRepository(movie_db)
235-
# await movie_db.add_index(HnswIndex(MovieRepository.VALUE_EXTRACTOR, MovieRepository.EMBEDDING_DIMENSIONS))
236228

229+
# All of the movies data from filename MOVIE_JSON_FILENAME is
230+
# processed and loaded into the movies_repo
237231
await movies_repo.load(MOVIE_JSON_FILENAME)
232+
233+
# Search method is called on the movies_repo instance of class
234+
# MovieRepository that takes a search_text parameter which is the
235+
# text to use to convert to a vector and search the movie plot for
236+
# the nearest matches. The second parameter is a count of the number
237+
# of nearest neighbours to search for.
238+
#
239+
# Below a search for five movies roughly based on "star travel and space ships"
240+
# is being done
238241
results = await movies_repo.search("star travel and space ships", 5)
239242
print("Search results:")
240243
print("================")
241244
for e in results:
242245
print(f"key = {e.key}, distance = {e.distance}, plot = {e.value.get('plot')}")
243246

247+
# Search method on the movies_repo instance can also include a filter
248+
# to reduce the cache entries used to perform the nearest neighbours
249+
# (k-nn) search.
250+
#
251+
# Below any movie with a plot similar to "star travel and space
252+
# ships" was searched for. In addition a Filter is used to narrow down
253+
# the search i.e. movies that starred "Harrison Ford". The filter
254+
# will be applied to the cast field of the JsonObject.
244255
cast_extractor = Extractors.extract("cast")
245256
filter = Filters.contains(cast_extractor, "Harrison Ford")
246257
results = await movies_repo.search("star travel and space ships", 2, filter)

0 commit comments

Comments
 (0)