-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Issue Summary
When attempting to ask questions across multiple uploaded documents using the PageIndex chat completions API, I'm receiving an error response instead of answers based on my uploaded documents.
Setup Information
API Key
PageIndex-API-KEY
Uploaded Documents
I have successfully uploaded 5 documents to PageIndex:
pi-cmlqm8gpx00690io97n36u5jipi-cmlqm8a0900670io9iol2h3x5pi-cmlqm84aa010h0fo98sy9a8lspi-cmlqm7c9500fd0lo90o3ot5tmpi-cmlqm71i800xd08o9ma5a57o0
Code Used to Upload Documents
import requests
api_key = "PageIndex-API-KEY"
file_path = "./documents/Western Power Distribution RIIO-ED2 Business Plan 2023-2028: Strategic Vision, Commitments, and Investment for a Net Zero Energy Future.pdf"
with open(file_path, "rb") as file:
response = requests.post(
"https://api.pageindex.ai/doc/",
headers={"api_key": api_key},
files={"file": file}
)
print(f"Status Code: {response.status_code}")
print(f"Response: {response.text}")Code Used to Verify Uploaded Documents
import requests
api_key = "PageIndex-API-KEY"
response = requests.get(
"https://api.pageindex.ai/docs",
headers={"api_key": api_key},
params={"limit": 10, "offset": 0}
)
print("Status Code:", response.status_code)
data = response.json()
print("\nAll Documents:")
for doc in data.get("documents", []):
print(f"{doc['id']} - {doc['name']}")
# Find duplicates by base filename (without version suffix like _5.pdf)
import re
base_names = {}
for doc in data.get("documents", []):
name = doc['name']
# Remove version suffix (e.g., "_5.pdf" -> ".pdf")
base_name = re.sub(r'_\d+\.pdf$', '.pdf', name)
if base_name not in base_names:
base_names[base_name] = []
base_names[base_name].append((doc['id'], name))
# Print duplicates
print("\nDuplicates (by base filename):")
duplicates_found = False
duplicate_ids = []
for base_name, docs in base_names.items():
if len(docs) > 1:
duplicates_found = True
print(f"\n{base_name}")
for doc_id, full_name in docs:
print(f" - {doc_id} ({full_name})")
# Keep the first ID (most recent based on version number)
duplicate_ids.append(docs[0][0])
if not duplicates_found:
print("No duplicates found")
else:
print("\n\nDuplicate IDs to keep (one from each group):")
print(duplicate_ids)Code Used to Query Documents (Tokens.py)
import requests
# Test with documents parameter (based on RAG API patterns)
response = requests.post(
"https://api.pageindex.ai/chat/completions",
headers={
"api_key": "API-KEY",
"Content-Type": "application/json"
},
json={
"messages": [
{"role": "user", "content": "According to Electricity North West's 2023-2028 Business Plan, How much investment will be made to help customers connect low-carbon technologies?"}
],
"doc_id": ['pi-cmlqm8gpx00690io97n36u5ji', 'pi-cmlqm8a0900670io9iol2h3x5', 'pi-cmlqm84aa010h0fo98sy9a8ls', 'pi-cmlqm7c9500fd0lo90o3ot5tm', 'pi-cmlqm71i800xd08o9ma5a57o0']
}
)
result = response.json()
print("Full response:", result)
print("Status code:", response.status_code)
# Check if the response contains 'choices'
if "choices" in result:
print(result["choices"][0]["message"]["content"])
else:
print("Error: Response does not contain 'choices' field")
if "error" in result:
print("Error details:", result["error"])Actual Response Received
I'll help you find the investment amount for connecting low-carbon technologies in Electricity North West's business plan. {"doc_name": "Electricity North West's 2023-2028 Business Plan to Lead the North West to Net Zero_6.pdf"} It seems the documents are not currently accessible in the system. The Electricity North West document you're asking about appears to have been removed or is no longer available.
To answer your question, I would need you to re-upload the document using a PDF URL. Do you have access to the Electricity North West 2023-2028 Business Plan PDF that I can process?