Skip to content

Querying Multiple Documents Returns Error #113

@Menna-create

Description

@Menna-create

Issue Summary

When attempting to ask questions across multiple uploaded documents using the PageIndex chat completions API, I'm receiving an error response instead of answers based on my uploaded documents.

Setup Information

API Key

PageIndex-API-KEY

Uploaded Documents

I have successfully uploaded 5 documents to PageIndex:

  1. pi-cmlqm8gpx00690io97n36u5ji
  2. pi-cmlqm8a0900670io9iol2h3x5
  3. pi-cmlqm84aa010h0fo98sy9a8ls
  4. pi-cmlqm7c9500fd0lo90o3ot5tm
  5. pi-cmlqm71i800xd08o9ma5a57o0

Code Used to Upload Documents

import requests

api_key = "PageIndex-API-KEY"
file_path = "./documents/Western Power Distribution RIIO-ED2 Business Plan 2023-2028: Strategic Vision, Commitments, and Investment for a Net Zero Energy Future.pdf"

with open(file_path, "rb") as file:
    response = requests.post(
        "https://api.pageindex.ai/doc/",
        headers={"api_key": api_key},
        files={"file": file}
    )
print(f"Status Code: {response.status_code}")
print(f"Response: {response.text}")

Code Used to Verify Uploaded Documents

import requests

api_key = "PageIndex-API-KEY"

response = requests.get(
    "https://api.pageindex.ai/docs",
    headers={"api_key": api_key},
    params={"limit": 10, "offset": 0}
)
print("Status Code:", response.status_code)

data = response.json()
print("\nAll Documents:")
for doc in data.get("documents", []):
    print(f"{doc['id']} - {doc['name']}")

# Find duplicates by base filename (without version suffix like _5.pdf)
import re

base_names = {}
for doc in data.get("documents", []):
    name = doc['name']
    # Remove version suffix (e.g., "_5.pdf" -> ".pdf")
    base_name = re.sub(r'_\d+\.pdf$', '.pdf', name)
    if base_name not in base_names:
        base_names[base_name] = []
    base_names[base_name].append((doc['id'], name))

# Print duplicates
print("\nDuplicates (by base filename):")
duplicates_found = False
duplicate_ids = []

for base_name, docs in base_names.items():
    if len(docs) > 1:
        duplicates_found = True
        print(f"\n{base_name}")
        for doc_id, full_name in docs:
            print(f"  - {doc_id} ({full_name})")
        # Keep the first ID (most recent based on version number)
        duplicate_ids.append(docs[0][0])

if not duplicates_found:
    print("No duplicates found")
else:
    print("\n\nDuplicate IDs to keep (one from each group):")
    print(duplicate_ids)

Code Used to Query Documents (Tokens.py)

import requests

# Test with documents parameter (based on RAG API patterns)
response = requests.post(
    "https://api.pageindex.ai/chat/completions",
    headers={
        "api_key": "API-KEY",
        "Content-Type": "application/json"
    },
    json={
        "messages": [
            {"role": "user", "content": "According to Electricity North West's 2023-2028 Business Plan, How much investment will be made to help customers connect low-carbon technologies?"}
        ],
        "doc_id": ['pi-cmlqm8gpx00690io97n36u5ji', 'pi-cmlqm8a0900670io9iol2h3x5', 'pi-cmlqm84aa010h0fo98sy9a8ls', 'pi-cmlqm7c9500fd0lo90o3ot5tm', 'pi-cmlqm71i800xd08o9ma5a57o0']
    }
)

result = response.json()
print("Full response:", result)
print("Status code:", response.status_code)

# Check if the response contains 'choices'
if "choices" in result:
    print(result["choices"][0]["message"]["content"])
else:
    print("Error: Response does not contain 'choices' field")
    if "error" in result:
        print("Error details:", result["error"])

Actual Response Received

I'll help you find the investment amount for connecting low-carbon technologies in Electricity North West's business plan. {"doc_name": "Electricity North West's 2023-2028 Business Plan to Lead the North West to Net Zero_6.pdf"} It seems the documents are not currently accessible in the system. The Electricity North West document you're asking about appears to have been removed or is no longer available.

To answer your question, I would need you to re-upload the document using a PDF URL. Do you have access to the Electricity North West 2023-2028 Business Plan PDF that I can process?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions