Merge pull request #127 from Azure-Samples/generate-error

pamelafox · web-flow · commit 197977bc7a9c · 2025-02-07T17:18:11.000-08:00
Deprecate azure-ai-generative partially
diff --git a/README.md b/README.md
@@ -105,8 +105,8 @@ We recommend at least 200 QA pairs if possible.
 There are a few ways to get this data:
 
 1. Manually curate a set of questions and answers that you consider to be ideal. This is the most accurate, but also the most time-consuming. Make sure your answers include citations in the expected format. This approach requires domain expertise in the data.
-2. Use the generator script to generate a set of questions and answers. This is the fastest, but may also be the least accurate. See below for details on how to run the generator script.
-3. Use the generator script to generate a set of questions and answers, and then manually curate them, rewriting any answers that are subpar and adding missing citations. This is a good middle ground, and is what we recommend.
+2. Use a generator script to generate a set of questions and answers, and use them directly. This is the fastest, but may also be the least accurate.
+3. Use a generator script to generate a set of questions and answers, and then manually curate them, rewriting any answers that are subpar and adding missing citations. This is a good middle ground, and is what we recommend.
 
 <details>
  <summary>Additional tips for ground truth data generation</summary>
@@ -117,43 +117,6 @@ There are a few ways to get this data:
 
 </details>
 
-### Running the generator script
-
-This repo includes a script for generating questions and answers from documents stored in Azure AI Search.
-
-> [!IMPORTANT]
-> The generator script can only generate English Q/A pairs right now, due to [limitations in the azure-ai-generative SDK](https://github.com/Azure/azure-sdk-for-python/issues/34099).
-
-1. Create `.env` file by copying `.env.sample`
-2. Fill in the values for your Azure AI Search instance:
-
-    ```shell
-    AZURE_SEARCH_ENDPOINT="https://<service-name>.search.windows.net"
-    AZURE_SEARCH_INDEX="<index-name>"
-    AZURE_SEARCH_KEY=""
-    ```
-
-    The key may not be necessary if it's configured for keyless access from your account.
-    If providing a key, it's best to provide a query key since the script only requires that level of access.
-
-3. Run the generator script:
-
-    ```shell
-    python -m evaltools generate --output=example_input/qa.jsonl --persource=5 --numquestions=200
-    ```
-
-    That script will generate 200 questions and answers, and store them in `example_input/qa.jsonl`. We've already provided an example based off the sample documents for this app.
-
-    To further customize the generator beyond the `numquestions` and `persource` parameters, modify `scripts/generate.py`.
-
-    Optional:
-
-    By default this script assumes your index citation field is named `sourcepage`, if your search index contains a different citation field name use the `citationfieldname` option to specify the correct name
-
-    ```shell
-    python -m evaltools generate --output=example_input/qa.jsonl --persource=5 --numquestions=200 --citationfieldname=filepath
-    ```
-
 ## Running an evaluation
 
 We provide a script that loads in the current `azd` environment's variables, installs the requirements for the evaluation, and runs the evaluation against the local app. Run it like this:
diff --git a/src/evaltools/gen/generate.py b/src/evaltools/gen/generate.py
@@ -5,7 +5,6 @@
 from collections.abc import Generator
 from pathlib import Path
 
-from azure.ai.generative.synthetic.qa import QADataGenerator, QAType
 from azure.search.documents import SearchClient
 
 from evaltools import service_setup
@@ -22,6 +21,13 @@ def generate_test_qa_data(
     source_to_text: callable,
     answer_formatter: callable,
 ):
+    try:
+        from azure.ai.generative.synthetic.qa import QADataGenerator, QAType
+    except ImportError:
+        logger.error(
+            "Azure AI Generative package is deprecated and no longer working, so this functionality is disabled."
+        )
+
     logger.info(
         "Generating %d questions total, %d per source, based on search results",
         num_questions_total,