101101The SimilaritySearch aggregator is used to perform a Knn vector search on a
102102cache in the same way that normal Coherence aggregators are used.
103103
104- HNSW Indexing
105- =============
106-
107- Coherence includes an implementation of the HNSW index that can be used to
108- speed up searches. The hierarchical navigable small world (HNSW) algorithm is
109- a graph-based approximate nearest neighbor search technique
110-
111- An index is added to a cache in Coherence by calling the add_index method on
112- the cache. In this example, a HNSWIndex is created with a ValueExtractor that
113- will extract the vector field from the cache value and an int parameter that
114- specifies the number of dimensions the vector has.
115-
116-
117104"""
118105
119106
@@ -218,7 +205,8 @@ async def search(self, search_text: str, count: int, filter: Filter = Filters.al
218205 vector : FloatVector = self .vectorize (search_text )
219206 # create the SimilaritySearch aggregator using the above vector and count
220207 search : SimilaritySearch = SimilaritySearch (self .VALUE_EXTRACTOR , vector , count )
221- # perform the k-nn search using the above aggregator and optional filter
208+ # perform the k-nn search using the above aggregator and optional filter and
209+ # returns a list of QueryResults
222210 return await self .movies .aggregate (search , filter = filter )
223211
224212
@@ -228,19 +216,42 @@ async def search(self, search_text: str, count: int, filter: Filter = Filters.al
228216
229217async def do_run () -> None :
230218
219+ # Create a new session to the Coherence server using the default host and
220+ # port i.e. localhost:1408
231221 session : Session = await Session .create ()
222+ # Create a NamedMao called movies with key of str and value of dict
232223 movie_db : NamedMap [str , dict ] = await session .get_map ("movies" )
233224 try :
225+ # an instance of class MovieRepository is create passing the above
226+ # NamedMap as a parameter
234227 movies_repo = MovieRepository (movie_db )
235- # await movie_db.add_index(HnswIndex(MovieRepository.VALUE_EXTRACTOR, MovieRepository.EMBEDDING_DIMENSIONS))
236228
229+ # All of the movies data from filename MOVIE_JSON_FILENAME is
230+ # processed and loaded into the movies_repo
237231 await movies_repo .load (MOVIE_JSON_FILENAME )
232+
233+ # Search method is called on the movies_repo instance of class
234+ # MovieRepository that takes a search_text parameter which is the
235+ # text to use to convert to a vector and search the movie plot for
236+ # the nearest matches. The second parameter is a count of the number
237+ # of nearest neighbours to search for.
238+ #
239+ # Below a search for five movies roughly based on "star travel and space ships"
240+ # is being done
238241 results = await movies_repo .search ("star travel and space ships" , 5 )
239242 print ("Search results:" )
240243 print ("================" )
241244 for e in results :
242245 print (f"key = { e .key } , distance = { e .distance } , plot = { e .value .get ('plot' )} " )
243246
247+ # Search method on the movies_repo instance can also include a filter
248+ # to reduce the cache entries used to perform the nearest neighbours
249+ # (k-nn) search.
250+ #
251+ # Below any movie with a plot similar to "star travel and space
252+ # ships" was searched for. In addition a Filter is used to narrow down
253+ # the search i.e. movies that starred "Harrison Ford". The filter
254+ # will be applied to the cast field of the JsonObject.
244255 cast_extractor = Extractors .extract ("cast" )
245256 filter = Filters .contains (cast_extractor , "Harrison Ford" )
246257 results = await movies_repo .search ("star travel and space ships" , 2 , filter )
0 commit comments