Skip to content

Commit 1845374

Browse files
authored
Merge pull request #1045 from Teradata/Chatbot_Teradata_Vector_Store
Fixed issue on VS chunking
2 parents bdb25f9 + 127c005 commit 1845374

File tree

1 file changed

+104
-68
lines changed

1 file changed

+104
-68
lines changed

VantageCloud_Lake/UseCases/Chatbot_Teradata_Vector_Store/Chatbot_Teradata_Vector_Store.ipynb

Lines changed: 104 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,15 @@
164164
" print(\"Please contact the support team.\")"
165165
]
166166
},
167+
{
168+
"cell_type": "markdown",
169+
"id": "2b13bdef-f8ae-4e8c-9460-1debe9f1a342",
170+
"metadata": {},
171+
"source": [
172+
"<p style = 'font-size:18px;font-family:Arial;'><b>2.2 Set your Authentication Token for this session.</b></p>\n",
173+
"<p style = 'font-size:16px;font-family:Arial;'>Load the required values from the included .env file.</p>"
174+
]
175+
},
167176
{
168177
"cell_type": "code",
169178
"execution_count": null,
@@ -186,53 +195,44 @@
186195
},
187196
{
188197
"cell_type": "markdown",
189-
"id": "ed339c42-2306-4236-82bb-437f2276cd3b",
198+
"id": "69e7fe8f-eede-48ec-bb15-85d64fa8d06e",
190199
"metadata": {},
191200
"source": [
192-
"<p style = 'font-size:18px;font-family:Arial;'><b>2.2 Check the connectivity to our Vector Store Database</b></p>\n",
193-
"<p style = 'font-size:16px;font-family:Arial;'>Execute this statement to test the connection.</p>"
201+
"<hr style=\"height:2px;border:none\">"
194202
]
195203
},
196204
{
197-
"cell_type": "code",
198-
"execution_count": null,
199-
"id": "3b8b5b67-82cd-4158-851a-a14cb17d9387",
205+
"cell_type": "markdown",
206+
"id": "09ae9a2d-f964-4f2f-a269-1ade3579a1a7",
200207
"metadata": {},
201-
"outputs": [],
202208
"source": [
203-
"VSManager.health()"
209+
"<div class=\"alert alert-block alert-info\">\n",
210+
" <p style = 'font-size:18px;font-family:Arial;'><b>*** This section is OPTIONAL ***</b></p>\n",
211+
" <p style = 'font-size:20px;font-family:Arial'><b>3. Upload your own PDF files!</b></p>\n",
212+
" <p style = 'font-size:16px;font-family:Arial;'><i>This section is not required to continue.</i> This will create a button that will open your File Explorer and allow you to select one or more PDF files.</p>"
204213
]
205214
},
206215
{
207216
"cell_type": "markdown",
208-
"id": "aeb01b06-6ed5-4e05-92f8-f40d5a2825d7",
217+
"id": "f92196dd-ae9a-415e-a652-6ed280211b67",
209218
"metadata": {},
210219
"source": [
211220
"<hr style=\"height:2px;border:none\">\n",
212-
"<b style = 'font-size:20px;font-family:Arial'>3. Initializing the Vector Store</b>\n",
213-
"<p style = 'font-size:16px;font-family:Arial'>Here, we initialize the Vector Store, which will store the document embeddings. This vector store will be used to index and search the uploaded documents efficiently..</p>"
221+
"<p style = 'font-size:18px;font-family:Arial'><b>3.1 File Upload Setup</b></p>\n",
222+
"\n",
223+
"<p style = 'font-size:16px;font-family:Arial'>We initialize the Panel extension to create a user interface that allows document uploads. The panel interface enables users to select and upload documents.</p>"
214224
]
215225
},
216226
{
217227
"cell_type": "code",
218228
"execution_count": null,
219-
"id": "58d08ae3-a9aa-44f9-b987-658b4223eb9a",
220-
"metadata": {},
229+
"id": "b1b67d58-6aae-4683-a741-1a1d5eda6117",
230+
"metadata": {
231+
"tags": []
232+
},
221233
"outputs": [],
222234
"source": [
223-
"# Create the vector store\n",
224-
"document_vector_store = VectorStore(env_vars.get(\"username\"))"
225-
]
226-
},
227-
{
228-
"cell_type": "markdown",
229-
"id": "a0fc4794-3dfa-427e-85da-dcbdb2db3bc5",
230-
"metadata": {},
231-
"source": [
232-
"<hr style=\"height:2px;border:none\">\n",
233-
"<p style = 'font-size:18px;font-family:Arial'><b>3.1 File Upload Setup</b></p>\n",
234-
"\n",
235-
"<p style = 'font-size:16px;font-family:Arial'>We initialize the Panel extension to create a user interface that allows document uploads. The panel interface enables users to select and upload documents.</p>"
235+
"from IPython.display import display, HTML "
236236
]
237237
},
238238
{
@@ -248,7 +248,7 @@
248248
},
249249
{
250250
"cell_type": "markdown",
251-
"id": "4f50a631-49e6-47eb-94cb-46d5ed1b9426",
251+
"id": "d669ba41-1f25-4ce7-8362-f825156a03c2",
252252
"metadata": {},
253253
"source": [
254254
"<hr style=\"height:2px;border:none\">\n",
@@ -271,12 +271,12 @@
271271
},
272272
{
273273
"cell_type": "markdown",
274-
"id": "a594e173-867b-43a4-bc5c-0bd38aa15f0a",
274+
"id": "1face215-ecfc-4eb0-98b8-734895f74ebe",
275275
"metadata": {},
276276
"source": [
277277
"<hr style=\"height:2px;border:none\">\n",
278-
"<b style = 'font-size:20px;font-family:Arial'>4. File Input Widget</b>\n",
279-
"<p style = 'font-size:16px;font-family:Arial'>We create a File Input widget, allowing users to select multiple document files for upload. Supported file types include PDF.</p>"
278+
"<b style = 'font-size:18px;font-family:Arial'>3.3 File Input Widget</b>\n",
279+
"<p style = 'font-size:16px;font-family:Arial'>This section is OPTIONAL. Execute this section if you want to upload your own PDF files. This will create a File Input widget which will allow you to select multiple document files for upload. Supported file types include PDF.</p>"
280280
]
281281
},
282282
{
@@ -345,27 +345,6 @@
345345
" \"\"\"))"
346346
]
347347
},
348-
{
349-
"cell_type": "markdown",
350-
"id": "69e7fe8f-eede-48ec-bb15-85d64fa8d06e",
351-
"metadata": {},
352-
"source": [
353-
"<hr style=\"height:2px;border:none\">\n",
354-
"<p style = 'font-size:18px;font-family:Arial'><b>4.1 **Optional** Upload your own PDF files!</b></p>\n",
355-
"\n",
356-
"<p style = 'font-size:16px;font-family:Arial'>This section is not required to continue. If provides a button that will open your File Explorer and allow you to select one or more PDF files. Continue to 4.2 to use the provided PDF file.</p>"
357-
]
358-
},
359-
{
360-
"cell_type": "code",
361-
"execution_count": null,
362-
"id": "b1b67d58-6aae-4683-a741-1a1d5eda6117",
363-
"metadata": {},
364-
"outputs": [],
365-
"source": [
366-
"from IPython.display import display, HTML "
367-
]
368-
},
369348
{
370349
"cell_type": "code",
371350
"execution_count": null,
@@ -399,15 +378,25 @@
399378
"display(ui)"
400379
]
401380
},
381+
{
382+
"cell_type": "markdown",
383+
"id": "5afbdd88-66ab-403e-8ec3-fdec251d2836",
384+
"metadata": {},
385+
"source": [
386+
"<hr style=\"height:2px;border:none\">\n",
387+
"<div class=\"alert alert-block alert-info\">\n",
388+
" <p style = 'font-size:18px;font-family:Arial;'><b>Resume notebook execution here.</b></p>\n",
389+
"</div>"
390+
]
391+
},
402392
{
403393
"cell_type": "markdown",
404394
"id": "fd62e972-c00a-413a-a5c3-82a359406c81",
405395
"metadata": {},
406396
"source": [
407397
"<hr style=\"height:2px;border:none\">\n",
408-
"<p style = 'font-size:18px;font-family:Arial'><b>4.2 Validate the PDF files</b></p>\n",
409-
"\n",
410-
"<p style = 'font-size:16px;font-family:Arial'>Scan the files in the included /data folder and set the Project Directory.</p>"
398+
"<p style = 'font-size:20px;font-family:Arial'><b>4. Validate the PDF files</b></p>\n",
399+
"<p style = 'font-size:16px;font-family:Arial'>Scan the files in local /data folder and set the Project Directory. This is where you resume the notebook if you skipped section 3."
411400
]
412401
},
413402
{
@@ -449,7 +438,6 @@
449438
"metadata": {},
450439
"outputs": [],
451440
"source": [
452-
"\n",
453441
"data_folder = os.path.join(PROJECT_DIR, \"data\")\n",
454442
"supported_patterns = [\"*.pdf\"]\n",
455443
"files = []\n",
@@ -472,8 +460,48 @@
472460
"<hr style=\"height:2px;border:none\">\n",
473461
"<p style = 'font-size:20px;font-family:Arial'><b>5. Creating the Vector Store</b></p>\n",
474462
"\n",
475-
"<p style = 'font-size:18px;font-family:Arial'><b>5.1 Create</b></p>\n",
476-
"<p style = 'font-size:18px;font-family:Arial'>Use the <code>Create</code> function to initialize and configure the <b>Teradata Vector Store</b> with the required parameters. This is the core step where we set up the vector store with the relevant models, algorithms, and document files. The Vector Store will index the uploaded documents and prepare them for fast retrieval using similarity search.</p>"
463+
"<p style = 'font-size:18px;font-family:Arial;'><b>5.1 Check the connectivity to our Vector Store Database</b></p>\n",
464+
"<p style = 'font-size:16px;font-family:Arial;'>Execute this statement to test the connection.</p>"
465+
]
466+
},
467+
{
468+
"cell_type": "code",
469+
"execution_count": null,
470+
"id": "75f4044b-bfb8-4abe-a999-7e9542cbd84d",
471+
"metadata": {},
472+
"outputs": [],
473+
"source": [
474+
"VSManager.health()"
475+
]
476+
},
477+
{
478+
"cell_type": "markdown",
479+
"id": "4bbc49a4-a9ff-439b-a7e0-7caaa197bf0d",
480+
"metadata": {},
481+
"source": [
482+
"<hr style=\"height:2px;border:none\">\n",
483+
"<b style = 'font-size:18px;font-family:Arial'>5.2 Initializing the Vector Store</b>\n",
484+
"<p style = 'font-size:16px;font-family:Arial'>Here, we initialize the Vector Store, which will store the document embeddings. This vector store will be used to index and search the uploaded documents efficiently..</p>"
485+
]
486+
},
487+
{
488+
"cell_type": "code",
489+
"execution_count": null,
490+
"id": "d7fcfbbb-fc89-4f04-9546-35be776f18a3",
491+
"metadata": {},
492+
"outputs": [],
493+
"source": [
494+
"# Create the vector store\n",
495+
"document_vector_store = VectorStore(env_vars.get(\"username\"))"
496+
]
497+
},
498+
{
499+
"cell_type": "markdown",
500+
"id": "2006c254-7822-4070-8e33-abdee4658a46",
501+
"metadata": {},
502+
"source": [
503+
"<p style = 'font-size:18px;font-family:Arial'><b>5.3 Create</b></p>\n",
504+
"<p style = 'font-size:16px;font-family:Arial'>Use the <code>Create</code> function to initialize and configure the <b>Teradata Vector Store</b> with the required parameters. This is the core step where we set up the vector store with the relevant models, algorithms, and document files. The Vector Store will index the uploaded documents and prepare them for fast retrieval using similarity search.</p>"
477505
]
478506
},
479507
{
@@ -493,8 +521,8 @@
493521
" object_names=\"tbl_testing\",\n",
494522
" data_columns=[\"chunks\"],\n",
495523
" vector_column=\"VectorIndex\",\n",
496-
" chunk_size=100,\n",
497-
" optimized_chunking=False,\n",
524+
" chunk_size=500,\n",
525+
" optimized_chunking=True,\n",
498526
" document_files=files,\n",
499527
" )\n",
500528
"else:\n",
@@ -503,15 +531,17 @@
503531
},
504532
{
505533
"cell_type": "markdown",
506-
"id": "73a680b0-95a4-4724-8d74-614287b0fa59",
534+
"id": "745e10d4-79ed-423a-ba86-f3f2a8376df0",
507535
"metadata": {},
508536
"source": [
537+
"<hr style=\"height:2px;border:none\">\n",
538+
"<b style = 'font-size:18px;font-family:Arial'>5.4 Check the Status</b>\n",
509539
"<p style = 'font-size:16px;font-family:Arial'>Check the current status of the <b>Teradata Vector Store</b> after it has been created. This step ensures that the Vector Store has been successfully initialized and is ready for processing queries. <br>\n",
510540
"</p>\n",
511541
"\n",
512542
"<p style = 'font-size:16px;font-family:Arial'>\n",
513543
"This cell will loop every 15 seconds to check the status, Move on to next cell when the status shows as - <b>\"READY\"</b>\n",
514-
"</p>"
544+
"</p></p>"
515545
]
516546
},
517547
{
@@ -526,12 +556,15 @@
526556
"df = document_vector_store.status()\n",
527557
"\n",
528558
"while True:\n",
529-
" if df.loc[0, 'status'] == 'READY':\n",
530-
" break\n",
559+
" if df is not None:\n",
560+
" if df.loc[0, 'status'] == 'READY':\n",
561+
" break\n",
562+
" else:\n",
563+
" print(f\"Current status: {df.loc[0, 'status']}. Waiting 15 seconds...\")\n",
564+
" time.sleep(15)\n",
565+
" df = document_vector_store.status()\n",
531566
" else:\n",
532-
" print(f\"Current status: {df.loc[0, 'status']}. Waiting 15 seconds...\")\n",
533-
" time.sleep(30)\n",
534-
" df = document_vector_store.status()\n",
567+
" time.sleep(15)\n",
535568
"\n",
536569
"print(f\"The Vector Store Database: {df.loc[0,'vs_name']} is {df.loc[0, 'status']}!\")\n"
537570
]
@@ -542,7 +575,7 @@
542575
"metadata": {},
543576
"source": [
544577
"<hr style=\"height:2px;border:none\">\n",
545-
"<p style = 'font-size:18px;font-family:Arial'><b>5.2 Run_Query</b></p>\n",
578+
"<p style = 'font-size:18px;font-family:Arial'><b>5.5 Run_Query</b></p>\n",
546579
"\n",
547580
"<p style = 'font-size:16px;font-family:Arial'>The <code>Run_Query</code> function is designed to process and answer user queries based on the document content stored in the Teradata Vector Store. This function leverages the embeddings created from the uploaded documents to retrieve relevant information and provide answers.</p>"
548581
]
@@ -566,7 +599,7 @@
566599
"metadata": {},
567600
"source": [
568601
"<hr style=\"height:2px;border:none\">\n",
569-
"<p style = 'font-size:18px;font-family:Arial'><b>5.3 Callback</b></p>\n",
602+
"<p style = 'font-size:18px;font-family:Arial'><b>5.6 Callback</b></p>\n",
570603
"\n",
571604
"<p style = 'font-size:16px;font-family:Arial'>The <code>Callback</code> function is responsible for handling the chat messages from the user and providing appropriate responses. It acts as the core mechanism for processing user input and querying the <b>Teradata Vector Store</b> to generate responses based on the uploaded document content.</p></p>"
572605
]
@@ -650,6 +683,9 @@
650683
"outputs": [],
651684
"source": [
652685
"# Using Panel's ChatInterface for the chatbot UI\n",
686+
"# File upload functionality using Panel\n",
687+
"pn.extension()\n",
688+
"\n",
653689
"pn.chat.ChatInterface(\n",
654690
" callback=callback,\n",
655691
" show_rerun=False, # Hide rerun button\n",

0 commit comments

Comments
 (0)