You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Make timeouts configurable in semantic LLM operations (#270)
### TL;DR
Added configurable request timeout for all semantic operators in Fenic.
### What changed?
- Added an optional `request_timeout` parameter to all semantic
operators and extensions (map, extract, predicate, reduce, classify,
analyze_sentiment, summarize, parse_pdf, join, sim_join)
- Added verification for request_timeout to be more than 0 and not more
than system maximum
- Updated all LLM provider clients to use http clients with large 10
minute timeout
- Modified the inference layer to respect this timeout parameter when
making LLM requests
- Updated the model client to use the request-specific timeout, or
default to old 120-second value
- Propagated the timeout parameter through all relevant classes in the
execution path
- Added tool for testing long llm requests
- Embedding request objects do not have a timeout
### How to test?
1. Test with a standard request:
```python
import fenic as fn
df = fn.DataFrame({"text": ["This is a long document..."]})
result = df.select(fn.semantic.map("Summarize this text", text=fn.col("text")))
```
2. Test with a custom timeout:
```python
# Set a longer timeout for complex operations
result = df.select(fn.semantic.map("Summarize this text",
text=fn.col("text"),
request_timeout=300.0))
# Set a shorter timeout for quick operations
result = df.select(fn.semantic.analyze_sentiment(fn.col("text"),
request_timeout=30.0))
```
3. Verify timeout behavior by intentionally setting a very short
timeout:
```python
# This should trigger a timeout and retry
result = df.select(fn.semantic.extract(fn.col("text"),
MySchema,
request_timeout=0.1))
```
### Why make this change?
Different semantic operations may require different timeout thresholds:
- Complex operations like parsing large PDFs or extracting structured
data from long documents may need longer timeouts
- Simple operations like sentiment analysis might benefit from shorter
timeouts
- Users with specific latency requirements can now customize timeouts to
match their needs
This change improves flexibility and user control over request handling,
especially for workloads with varying complexity and time sensitivity.
Copy file name to clipboardExpand all lines: src/fenic/_backends/local/semantic_operators/parse_pdf.py
+2Lines changed: 2 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -50,6 +50,7 @@ def __init__(
50
50
describe_images: bool=False,
51
51
model_alias: Optional[ResolvedModelAlias] =None,
52
52
max_output_tokens: Optional[int] =None,
53
+
request_timeout: Optional[float] =None,
53
54
):
54
55
self.page_separator=page_separator
55
56
self.describe_images=describe_images
@@ -68,6 +69,7 @@ def __init__(
68
69
max_output_tokens=max_output_tokens,
69
70
temperature=1.0, # Use a higher temperature so gemini flash models can handle complex table formatting. For more info see the conversation here: https://discuss.ai.google.dev/t/gemini-2-0-flash-has-a-weird-bug/65119/26
0 commit comments