Multi-Document extraction bleed (only last result captured)

Hey, I've been trying to use this library with multiple text documents, but I consistently find that only the last item in the document list is analyzed throughout the results, even though they quote the other documents. Here's a demo code and its output.

```python
import langextract as lx
from rich import print
from dotenv import load_dotenv
load_dotenv()


# Define your prompt
prompt = """
Extract information about Person and Company from the document.
For each Person, extract name, age, role.
For each Company, extract name, location, revenue.
Use exact text from the document; do not invent or paraphrase.
If a value is not present, output null or omit.
"""

# Example annotations
examples = [
    lx.data.ExampleData(
        text="Alice, aged 35, works as a Manager at Acme Corp based in London. Their revenue was $5M last year.",
        extractions=[
            lx.data.Extraction(
                extraction_class="Person",
                extraction_text="Alice",
                attributes={"age": "35", "role": "Manager"}
            ),
            lx.data.Extraction(
                extraction_class="Company",
                extraction_text="Acme Corp",
                attributes={"location": "London", "revenue": "$5M"}
            ),
        ],
    ),
    # possibly more examples
]

# Suppose we have multiple documents
documents = [
    lx.data.Document(document_id='1', text="Bob, 28, is a Developer at Beta LLC located in Berlin. Revenue: €2 million."),
    lx.data.Document(document_id='2', text="Charlie, 42, Senior Engineer at Gamma Inc, USA. Gamma Inc revenue last year was $10M."),
    # ...
]

# Run extraction
results = lx.extract(
    text_or_documents=documents,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    extraction_passes=2,
    max_workers=4,
    # optionally chunking params etc.
)

# Process results
print(list(results))
```

Output:
```python
[
    AnnotatedDocument(
        extractions=[
            Extraction(extraction_class='Person', extraction_text='Charlie', char_interval=CharInterval(start_pos=0, end_pos=7), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=1, group_index=0, description=None, attributes={'age': '42', 'role': 'Senior Engineer'}),
            Extraction(extraction_class='Company', extraction_text='Gamma Inc', char_interval=CharInterval(start_pos=32, end_pos=41), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=2, group_index=1, description=None, attributes={'location': 'USA', 'revenue': '$10M'})
        ],
        text='Bob, 28, is a Developer at Beta LLC located in Berlin. Revenue: €2 million.'
    ),
    AnnotatedDocument(
        extractions=[
            Extraction(extraction_class='Person', extraction_text='Charlie', char_interval=CharInterval(start_pos=0, end_pos=7), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=1, group_index=0, description=None, attributes={'age': '42', 'role': 'Senior Engineer'}),
            Extraction(extraction_class='Company', extraction_text='Gamma Inc', char_interval=CharInterval(start_pos=32, end_pos=41), alignment_status=<AlignmentStatus.MATCH_EXACT: 'match_exact'>, extraction_index=2, group_index=1, description=None, attributes={'location': 'USA', 'revenue': '$10M'})
        ],
        text='Charlie, 42, Senior Engineer at Gamma Inc, USA. Gamma Inc revenue last year was $10M.'
    )
]
```

Notice how both results return Charlie as the person name, even though the first one clearly points to doc1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-Document extraction bleed (only last result captured) #260

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-Document extraction bleed (only last result captured) #260

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions