You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/notes/guide/operators/text_process.md
+33-1Lines changed: 33 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ permalink: /en/guide/mq07gwz4/
9
9
10
10
## Overview
11
11
12
-
DataFlow currently supports text data processing at the data point level, categorized into three types: refiners, deduplicators, and filters.
12
+
DataFlow currently supports text data processing at the data point level, categorized into four types: refiners, deduplicators, generators and filters.
13
13
14
14
<tableclass="tg">
15
15
<thead>
@@ -30,6 +30,11 @@ DataFlow currently supports text data processing at the data point level, catego
30
30
<td class="tg-0pky">6</td>
31
31
<td class="tg-0pky">Removes duplicate data points using methods such as hashing.</td>
32
32
</tr>
33
+
<tr>
34
+
<td class="tg-0pky">Generators</td>
35
+
<td class="tg-0pky">2</td>
36
+
<td class="tg-0pky">Generate specific format data based on seed documents</td>
37
+
</tr>
33
38
<tr>
34
39
<td class="tg-0pky">Filters</td>
35
40
<td class="tg-0pky">42</td>
@@ -194,6 +199,33 @@ DataFlow currently supports text data processing at the data point level, catego
194
199
</tbody>
195
200
</table>
196
201
202
+
## Generators
203
+
204
+
<tableclass="tg">
205
+
<thead>
206
+
<tr>
207
+
<th class="tg-0pky">Name</th>
208
+
<th class="tg-0pky">Applicable Type</th>
209
+
<th class="tg-0pky">Description</th>
210
+
<th class="tg-0pky">Repository or Paper</th>
211
+
</tr>
212
+
</thead>
213
+
<tbody>
214
+
<tr>
215
+
<td class="tg-0pky">PretrainGenerator</td>
216
+
<td class="tg-0pky">Pretrain</td>
217
+
<td class="tg-0pky">Synthesize phi-4 question and answer data pairs using pre trained document data, and retell the document in QA format</td>
0 commit comments