Summary
POST /partition/{partition}/workspaces/{workspace_id}/files accepts arbitrary file_ids and returns 200 OK {"status": "added"} even when the referenced files do not exist in the files table. Ghost entries are inserted into workspace_files, polluting the workspace membership and potentially causing confusing downstream behaviour.
Root cause
add_files_to_workspace() in openrag/components/indexer/vectordb/utils.py:653 performs an upsert into workspace_files without first checking whether each file_id exists in the files table:
def add_files_to_workspace(self, workspace_id: str, file_ids: list[str]):
with self.Session() as session:
for fid in file_ids:
stmt = pg_insert(WorkspaceFile).values(workspace_id=workspace_id, file_id=fid)
stmt = stmt.on_conflict_do_nothing(constraint="uix_workspace_file")
session.execute(stmt)
session.commit()
There is no foreign-key constraint from workspace_files.file_id to files.file_id, so invalid IDs are stored without error.
Steps to reproduce
curl -X POST http://<host>/partition/default/workspaces/my-ws/files \
-H "Content-Type: application/json" \
-d '{"file_ids": ["00000000-0000-0000-0000-000000000000"]}'
# Returns 200 {"status": "added"}
curl http://<host>/partition/default/workspaces/my-ws/files
# Returns ["00000000-0000-0000-0000-000000000000"] — ghost entry
Expected behaviour
The endpoint should return 404 Not Found (or 422 Unprocessable Entity) for any file_id that does not exist in the files table for that partition.
Suggested fix
Two complementary approaches:
- Application-level validation in the router or
add_files_to_workspace: query files for each ID and reject unknown ones before inserting.
- Database-level constraint: add a foreign key from
workspace_files.file_id to files.file_id (with ON DELETE CASCADE) so the DB enforces referential integrity.
The DB constraint is the safer long-term option; the application check provides a clear error message to the client.
Affected files
openrag/routers/workspaces.py — router handler for add_files_to_workspace
openrag/components/indexer/vectordb/utils.py:653 — add_files_to_workspace()
openrag/components/indexer/vectordb/models.py — WorkspaceFile model (missing FK constraint)
Summary
POST /partition/{partition}/workspaces/{workspace_id}/filesaccepts arbitraryfile_idsand returns200 OK {"status": "added"}even when the referenced files do not exist in thefilestable. Ghost entries are inserted intoworkspace_files, polluting the workspace membership and potentially causing confusing downstream behaviour.Root cause
add_files_to_workspace()inopenrag/components/indexer/vectordb/utils.py:653performs an upsert intoworkspace_fileswithout first checking whether eachfile_idexists in thefilestable:There is no foreign-key constraint from
workspace_files.file_idtofiles.file_id, so invalid IDs are stored without error.Steps to reproduce
Expected behaviour
The endpoint should return
404 Not Found(or422 Unprocessable Entity) for anyfile_idthat does not exist in thefilestable for that partition.Suggested fix
Two complementary approaches:
add_files_to_workspace: queryfilesfor each ID and reject unknown ones before inserting.workspace_files.file_idtofiles.file_id(withON DELETE CASCADE) so the DB enforces referential integrity.The DB constraint is the safer long-term option; the application check provides a clear error message to the client.
Affected files
openrag/routers/workspaces.py— router handler foradd_files_to_workspaceopenrag/components/indexer/vectordb/utils.py:653—add_files_to_workspace()openrag/components/indexer/vectordb/models.py—WorkspaceFilemodel (missing FK constraint)