Skip to content

Commit 7a229c8

Browse files
committed
docs: Add comprehensive stopwords documentation to API reference
- Add note to TextQuery docstring about index-level vs query-time stopwords - Add stopwords section to schema.rst with configuration examples - Add note to query.rst about stopwords interaction - Link stopwords_interaction_guide.md from 11_advanced_queries.ipynb These updates improve discoverability and help users understand the interaction between index-level and query-time stopwords configuration.
1 parent eff8b4f commit 7a229c8

File tree

4 files changed

+71
-9
lines changed

4 files changed

+71
-9
lines changed

docs/api/query.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,12 @@ TextQuery
6161
:show-inheritance:
6262
:exclude-members: add_filter,get_args,highlight,return_field,summarize
6363

64+
.. note::
65+
The ``stopwords`` parameter in :class:`TextQuery` controls query-time stopword filtering (client-side).
66+
For index-level stopwords configuration (server-side), see :class:`redisvl.schema.IndexInfo.stopwords`.
67+
Using query-time stopwords with index-level ``STOPWORDS 0`` is counterproductive.
68+
See the `Stopwords Interaction Guide <../stopwords_interaction_guide.html>`_ for details.
69+
6470

6571
FilterQuery
6672
===========

docs/api/schema.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,47 @@ IndexSchema
3131
:exclude-members: generate_fields,validate_and_create_fields,redis_fields
3232

3333

34+
Index-Level Stopwords Configuration
35+
====================================
36+
37+
The :class:`IndexInfo` class supports index-level stopwords configuration through
38+
the ``stopwords`` field. This controls which words are filtered during indexing
39+
(server-side), as opposed to query-time filtering (client-side).
40+
41+
**Configuration Options:**
42+
43+
- ``None`` (default): Use Redis default stopwords (~300 common words)
44+
- ``[]`` (empty list): Disable stopwords completely (``STOPWORDS 0``)
45+
- Custom list: Specify your own stopwords (e.g., ``["the", "a", "an"]``)
46+
47+
**Example:**
48+
49+
.. code-block:: python
50+
51+
from redisvl.schema import IndexSchema
52+
53+
# Disable stopwords to search for phrases like "Bank of America"
54+
schema = IndexSchema.from_dict({
55+
"index": {
56+
"name": "company-idx",
57+
"prefix": "company",
58+
"stopwords": [] # STOPWORDS 0
59+
},
60+
"fields": [
61+
{"name": "name", "type": "text"}
62+
]
63+
})
64+
65+
**Important Notes:**
66+
67+
- Index-level stopwords affect what gets indexed (server-side)
68+
- Query-time stopwords (in :class:`TextQuery`) affect what gets searched (client-side)
69+
- Using query-time stopwords with index-level ``STOPWORDS 0`` is counterproductive
70+
71+
For detailed information about stopwords configuration and best practices, see the
72+
`Stopwords Interaction Guide <../stopwords_interaction_guide.html>`_.
73+
74+
3475
Defining Fields
3576
===============
3677

docs/user_guide/11_advanced_queries.ipynb

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -740,7 +740,7 @@
740740
"name": "stdout",
741741
"output_type": "stream",
742742
"text": [
743-
" Loaded 5 companies\n"
743+
"\u2713 Loaded 5 companies\n"
744744
]
745745
}
746746
],
@@ -757,7 +757,7 @@
757757
"for i, company in enumerate(companies):\n",
758758
" company_index.load([company], keys=[f\"company:{i}\"])\n",
759759
"\n",
760-
"print(f\" Loaded {len(companies)} companies\")"
760+
"print(f\"\u2713 Loaded {len(companies)} companies\")"
761761
]
762762
},
763763
{
@@ -805,9 +805,9 @@
805805
"\n",
806806
"If we had used the default stopwords (not specifying `stopwords` in the schema), the word \"of\" would be filtered out during indexing. This means:\n",
807807
"\n",
808-
"- Searching for `\"Bank of America\"` might not find exact matches\n",
809-
"- The phrase would be indexed as `\"Bank America\"` (without \"of\")\n",
810-
"- With `STOPWORDS 0`, all words including \"of\" are indexed\n",
808+
"- \u274c Searching for `\"Bank of America\"` might not find exact matches\n",
809+
"- \u274c The phrase would be indexed as `\"Bank America\"` (without \"of\")\n",
810+
"- \u2705 With `STOPWORDS 0`, all words including \"of\" are indexed\n",
811811
"\n",
812812
"**Custom Stopwords Example:**\n",
813813
"\n",
@@ -886,6 +886,16 @@
886886
"```"
887887
]
888888
},
889+
{
890+
"cell_type": "markdown",
891+
"metadata": {},
892+
"source": [
893+
"### \ud83d\udcda Additional Resources\n",
894+
"\n",
895+
"For a comprehensive guide on stopwords configuration and best practices, see:\n",
896+
"- [Stopwords Interaction Guide](../stopwords_interaction_guide.md) - Detailed explanation of index-level vs query-time stopwords"
897+
]
898+
},
889899
{
890900
"cell_type": "code",
891901
"execution_count": 17,
@@ -902,14 +912,14 @@
902912
"name": "stdout",
903913
"output_type": "stream",
904914
"text": [
905-
" Cleaned up company_index\n"
915+
"\u2713 Cleaned up company_index\n"
906916
]
907917
}
908918
],
909919
"source": [
910920
"# Cleanup\n",
911921
"company_index.delete(drop=True)\n",
912-
"print(\" Cleaned up company_index\")"
922+
"print(\"\u2713 Cleaned up company_index\")"
913923
]
914924
},
915925
{
@@ -1572,4 +1582,4 @@
15721582
},
15731583
"nbformat": 4,
15741584
"nbformat_minor": 4
1575-
}
1585+
}

redisvl/query/query.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1061,10 +1061,15 @@ def __init__(
10611061
params (Optional[Dict[str, Any]], optional): The parameters for the query.
10621062
Defaults to None.
10631063
stopwords (Optional[Union[str, Set[str]]): The set of stop words to remove
1064-
from the query text. If a language like 'english' or 'spanish' is provided
1064+
from the query text (client-side filtering). If a language like 'english' or 'spanish' is provided
10651065
a default set of stopwords for that language will be used. Users may specify
10661066
their own stop words by providing a List or Set of words. if set to None,
10671067
then no words will be removed. Defaults to 'english'.
1068+
1069+
Note: This parameter controls query-time stopword filtering (client-side).
1070+
For index-level stopwords configuration (server-side), see IndexInfo.stopwords.
1071+
Using query-time stopwords with index-level STOPWORDS 0 is counterproductive.
1072+
See docs/stopwords_interaction_guide.md for details.
10681073
text_weights (Optional[Dict[str, float]]): The importance weighting of individual words
10691074
within the query text. Defaults to None, as no modifications will be made to the
10701075
text_scorer score.

0 commit comments

Comments
 (0)