Commit e3f58cd
authored
feat: Add openai support for semantic parse_pdf (#253)
### TL;DR
Added PDF file support for OpenAI models with proper token counting and
estimation.
### What changed?
- Made `max_completion_tokens` optional in OpenAI chat completions
requests
- Implemented PDF file token counting for OpenAI models:
- Added methods to count tokens for PDF files in input and output
- Updated token counter to handle PDF files with proper token estimation
- Added support for PDF parsing to various OpenAI models in the model
catalog
- Refactored openai token counting logic
- Mini refactor - separate `_max_output_tokens` user limit concept from
`_estimate_output_tokens` for cost estimation and throttling
- added to openai and updated gemini
- Max tokens (put in request)
- Use max tokens provided by semantic_operator if exists
**- OR, for page parsing specifically, use an upper limit based on
output limit of our smallest VLM supported (8000 tokens)**
- Add expected reasoning effort
- Estimate output tokens (for cost estimate and throttling)
- Use max tokens provided by semantic_operator if exists
- OR estimate file output tokens
- Add expected reasoning effort
- Added openai models to semantic_parse_pdf tests
### Out of scope
The token estimation should happen at the semantic operator level, since
it has the context of what its expecting from the model. Currently,
semantic operator only passes 'max token' limit to the client and we use
that upper limit in our estimates. As a future improvement we should
refactor and have the semantic operator decide on the output token limit
for the request
### How to test?
1. Run the new token counter tests: `pytest
tests/_inference/test_openai_token_counter.py`
2. Test PDF parsing with OpenAI models: `pytest
tests/_backends/local/functions/test_semantic_parse_pdf.py`
3. Verify that token estimation works correctly with PDF files by using
a model that supports PDF parsing1 parent c99b9ac commit e3f58cd
File tree
20 files changed
+256
-104
lines changed- src/fenic
- _backends/local
- semantic_operators
- transpiler
- _inference
- anthropic
- cohere
- common_openai
- google
- openai
- openrouter
- api/functions
- core
- _inference
- _logical_plan/expressions
- tests
- _backends/local/functions
- _inference
20 files changed
+256
-104
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
| 57 | + | |
56 | 58 | | |
57 | 59 | | |
58 | 60 | | |
| |||
62 | 64 | | |
63 | 65 | | |
64 | 66 | | |
65 | | - | |
| 67 | + | |
66 | 68 | | |
67 | 69 | | |
68 | 70 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
712 | 712 | | |
713 | 713 | | |
714 | 714 | | |
| 715 | + | |
715 | 716 | | |
716 | | - | |
| 717 | + | |
717 | 718 | | |
718 | 719 | | |
719 | 720 | | |
| |||
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
275 | 275 | | |
276 | 276 | | |
277 | 277 | | |
278 | | - | |
| 278 | + | |
279 | 279 | | |
280 | 280 | | |
281 | 281 | | |
| |||
329 | 329 | | |
330 | 330 | | |
331 | 331 | | |
332 | | - | |
| 332 | + | |
333 | 333 | | |
334 | 334 | | |
335 | 335 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
174 | | - | |
| 174 | + | |
175 | 175 | | |
176 | 176 | | |
177 | 177 | | |
| |||
Lines changed: 15 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
93 | | - | |
94 | 93 | | |
95 | 94 | | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
96 | 100 | | |
97 | 101 | | |
98 | 102 | | |
| |||
213 | 217 | | |
214 | 218 | | |
215 | 219 | | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
121 | 121 | | |
122 | 122 | | |
123 | 123 | | |
124 | | - | |
| 124 | + | |
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
| |||
Lines changed: 57 additions & 57 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
135 | | - | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
184 | | - | |
185 | 135 | | |
186 | 136 | | |
187 | 137 | | |
| |||
196 | 146 | | |
197 | 147 | | |
198 | 148 | | |
| 149 | + | |
| 150 | + | |
199 | 151 | | |
200 | 152 | | |
201 | 153 | | |
202 | 154 | | |
203 | 155 | | |
204 | 156 | | |
205 | | - | |
206 | | - | |
207 | 157 | | |
208 | 158 | | |
209 | | - | |
210 | | - | |
211 | | - | |
| 159 | + | |
212 | 160 | | |
213 | 161 | | |
214 | 162 | | |
| |||
228 | 176 | | |
229 | 177 | | |
230 | 178 | | |
231 | | - | |
232 | | - | |
233 | 179 | | |
234 | 180 | | |
235 | 181 | | |
236 | 182 | | |
237 | 183 | | |
238 | 184 | | |
| 185 | + | |
| 186 | + | |
239 | 187 | | |
240 | 188 | | |
| 189 | + | |
241 | 190 | | |
242 | 191 | | |
243 | 192 | | |
| |||
355 | 304 | | |
356 | 305 | | |
357 | 306 | | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
245 | 245 | | |
246 | 246 | | |
247 | 247 | | |
248 | | - | |
249 | | - | |
| 248 | + | |
| 249 | + | |
250 | 250 | | |
251 | 251 | | |
252 | 252 | | |
| |||
Lines changed: 18 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
| 68 | + | |
68 | 69 | | |
69 | 70 | | |
70 | 71 | | |
| |||
108 | 109 | | |
109 | 110 | | |
110 | 111 | | |
111 | | - | |
| 112 | + | |
112 | 113 | | |
113 | 114 | | |
114 | 115 | | |
| |||
123 | 124 | | |
124 | 125 | | |
125 | 126 | | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
131 | 133 | | |
132 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
0 commit comments