You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-39Lines changed: 2 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -105,8 +105,8 @@ We recommend at least 200 QA pairs if possible.
105
105
There are a few ways to get this data:
106
106
107
107
1. Manually curate a set of questions and answers that you consider to be ideal. This is the most accurate, but also the most time-consuming. Make sure your answers include citations in the expected format. This approach requires domain expertise in the data.
108
-
2. Use the generator script to generate a set of questions and answers. This is the fastest, but may also be the least accurate. See below for details on how to run the generator script.
109
-
3. Use the generator script to generate a set of questions and answers, and then manually curate them, rewriting any answers that are subpar and adding missing citations. This is a good middle ground, and is what we recommend.
108
+
2. Use a generator script to generate a set of questions and answers, and use them directly. This is the fastest, but may also be the least accurate.
109
+
3. Use a generator script to generate a set of questions and answers, and then manually curate them, rewriting any answers that are subpar and adding missing citations. This is a good middle ground, and is what we recommend.
110
110
111
111
<details>
112
112
<summary>Additional tips for ground truth data generation</summary>
@@ -117,43 +117,6 @@ There are a few ways to get this data:
117
117
118
118
</details>
119
119
120
-
### Running the generator script
121
-
122
-
This repo includes a script forgenerating questions and answers from documents storedin Azure AI Search.
123
-
124
-
> [!IMPORTANT]
125
-
> The generator script can only generate English Q/A pairs right now, due to [limitations in the azure-ai-generative SDK](https://github.com/Azure/azure-sdk-for-python/issues/34099).
126
-
127
-
1. Create `.env` file by copying `.env.sample`
128
-
2. Fill in the values for your Azure AI Search instance:
That script will generate 200 questions and answers, and store them in`example_input/qa.jsonl`. We've already provided an example based off the sample documents for this app.
146
-
147
-
To further customize the generator beyond the `numquestions` and `persource` parameters, modify `scripts/generate.py`.
148
-
149
-
Optional:
150
-
151
-
By default this script assumes your index citation field is named `sourcepage`, if your search index contains a different citation field name use the `citationfieldname` option to specify the correct name
We provide a script that loads in the current `azd` environment's variables, installs the requirements for the evaluation, and runs the evaluation against the local app. Run it like this:
0 commit comments