Skip to content

Conversation

@tmcneil-mdb
Copy link
Collaborator

@tmcneil-mdb tmcneil-mdb commented Nov 25, 2025

What does this PR do?

This PR refactors the Python backend to standardize error handling, optimize aggregation performance, and stabilize the CI/CD pipeline.

CI/CD Pipeline & Test Stabilization

It resolves persistent integration test failures in GitHub Actions caused by environment discrepancies between local development and the Atlas CLI runner.

  • Fixed Replica Set Connectivity: Updated run-python-tests.yml to append ?directConnection=true to the MONGO_URI. The Atlas CLI spins up a local Replica Set. That set has a different canonical address than our local host. The ?directionConnection=true prevents redirection and allows the driver to connect to localhost instead, preventing connection timeouts.
  • Provisioned Missing Search Indexes: mongorestore restores B-Tree indexes but excludes Atlas Search (Lucene) configurations. I updated the workflow to explicitly create both Search and Vector Search indexes via the Atlas CLI prior to the test run. This resolves 500 errors in search-related integration tests (e.g., test_search_existing_movies_by_plot).

Performance Improvements

  • Aggregation Optimization: Added ensure_standard_index() to the application startup. This checks for and creates a standard index on comments.movie_id.
    • Impact: Significantly reduces execution time for $lookup stages in the reportingByComments aggregation, preventing request timeouts.

Refactoring

  • Standardized Error Handling: Deprecated custom error classes (ErrorResponse, create_error_response) in favor of FastAPI’s native HTTPException. Ensuring consistent 4xx/5xx status codes across endpoints.

Testing Strategy

Automated Testing (Primary)

  • CI Status: Verify that the GitHub Actions workflow Run Python Tests passes (Green).
  • Local Regression: Run pytest locally to ensure no regressions in unit or integration logic.
    • Note: The environment fixes revealed several integration tests that were failing silently or returning incorrect status codes. These have been patched.

Manual Verification (Secondary)

  • Verify /api/movies/aggregations/reportingByComments returns successfully (validating the new index performance). (I discovered this error testing the frontend. The comments would never load on the aggregation page)
  • Verify basic Search functionality via /api/movies/search to confirm the Lucene index creation hook is working.

Removed Python custom error handler and replaced with HTTPException.

Removed all files and instances related to custom Python error handler

Fixed Python bug in ReportingByComment
@tmcneil-mdb tmcneil-mdb requested a review from dacharyc November 26, 2025 00:26
Copy link
Collaborator

@dacharyc dacharyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these updates, Taylor! Back to you for a couple of small fixups, and also a handful of suggestions for you to consider.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed the integration tests are passing locally on my machine, too, which means this is a CI issue. Claude suggests it may be a bug related to the process at line 57 and the pipe buffer filling up on Linux/the CI box, so suggests outputting to devnull should fix the integration test failures. If you replace the code starting in line 56 with the following change, this should hopefully fix things up 🤞

    # Start the server process
    # Use subprocess.DEVNULL to avoid pipe buffer filling up and blocking the server
    # This prevents the server from hanging when it tries to write output
    process = subprocess.Popen(
        [sys.executable, "-m", "uvicorn", "main:app", "--host", "127.0.0.1", "--port", str(test_port)],
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
        cwd=server_python_dir
    )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this fix didnt seem to solve it. I will come tomorrow and try again. Going to setup docker, so I can see if I can reproduce the issue locally.

limit: Maximum number of results to return
Returns:
SuccessResponse containing a list of movies with similarity scores
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment directly on the lines above where we've implemented this endpoint, but it's possible to add response documentation to add this 503 to the possible responses. If you update the router decorator (starting at line 304) to add a responses key, you can add these there, similar to:

# Vector Search Endpoint
@router.get(
    "/vector-search",
    response_model=SuccessResponse[List[VectorSearchResult]],
    responses={
        503: {
            "description": "Service Unavailable - VOYAGE_API_KEY not configured",
            "content": {
                "application/json": {
                    "example": {
                        "detail": "Vector search unavailable: VOYAGE_API_KEY not configured. Please add your API key to your .env file."
                    }
                }
            }
        },
        500: {
            "description": "Internal Server Error - Vector search operation failed",
            "content": {
                "application/json": {
                    "example": {
                        "detail": "An error occurred during vector search: <error message>"
                    }
                }
            }
        }
    }
)

This will add these responses to the FastAPI-generated OpenAPI spec.

We could also apply this pattern to annotate the other endpoints that have specific error cases - for example, endpoints that validate ObjectIds could document their 400 responses:

@router.get(
    "/{id}",
    responses={
        400: {"description": "Invalid ObjectId format"},
        404: {"description": "Movie not found"}
    }
)

Copy link
Collaborator Author

@tmcneil-mdb tmcneil-mdb Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I see pros and cons to this.

I think this improves the documentation of the API and makes to more in alignment with OpenAPI specs. However, I think this might be extremely verbose. I think we would have to make a helper file that contains all the responses and pulls from there.

I am open to this change, but I think perhaps, it might need to be another ticket? This PR might be doing too much?

@tmcneil-mdb tmcneil-mdb requested a review from dacharyc December 10, 2025 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants