Tanvir/add websites routes #4498

tsbhangu · 2025-10-27T21:01:39Z

Fixes FER-

Short description of the changes made

Adds routes for website datasources (not implemented yet).

How has this PR been tested?

Ran locally and verified that I can hit the endpoints

vercel · 2025-10-27T21:01:46Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Updated (UTC)
dev.ferndocs.com	Ready	Preview	Oct 31, 2025 6:38pm
fern-dashboard	Ready	Preview	Oct 31, 2025 6:38pm
fern-dashboard-dev	Ready	Preview	Oct 31, 2025 6:38pm
ferndocs.com	Ready	Preview	Oct 31, 2025 6:38pm
preview.ferndocs.com	Ready	Preview	Oct 31, 2025 6:38pm
prod-assets.ferndocs.com	Ready	Preview	Oct 31, 2025 6:38pm
prod.ferndocs.com	Ready	Preview	Oct 31, 2025 6:38pm

1 Skipped Deployment

Project	Deployment	Preview	Updated (UTC)
fern-platform	Ignored		Oct 31, 2025 6:38pm

sentry · 2025-10-27T21:24:08Z

servers/fai/src/fai/routes/website.py

+        asyncio.create_task(
+            _crawl_website_job(
+                job_id=job_id,
+                domain=domain,
+                config=body,
+                db=db,
+            )
+        )


Bug: _crawl_website_job receives a request-scoped db session that will be closed before its database operations are executed.
_{Severity: CRITICAL | Confidence: 1.00}

🔍 Detailed Analysis

The _crawl_website_job function receives a request-scoped db: AsyncSession parameter. Since this function is executed as an asyncio.create_task(), the db session will be closed by the time any database operations within _crawl_website_job are attempted. This will result in a "Session is closed" error when the currently commented-out database logic is implemented, as the session will no longer be active.

💡 Suggested Fix

Modify _crawl_website_job to not accept db: AsyncSession as a parameter. Instead, it should create its own AsyncSession using async_session_maker() within the function's scope for all database interactions.

🤖 Prompt for AI Agent

Fix this bug. In servers/fai/src/fai/routes/website.py at lines 81-88: The `_crawl_website_job` function receives a request-scoped `db: AsyncSession` parameter. Since this function is executed as an `asyncio.create_task()`, the `db` session will be closed by the time any database operations within `_crawl_website_job` are attempted. This will result in a "Session is closed" error when the currently commented-out database logic is implemented, as the session will no longer be active.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

sentry · 2025-10-27T21:24:09Z

servers/fai/src/fai/routes/website.py

+                config=body,
+                db=db,
+            )
+        )


Bug: Background task _crawl_website_job receives a request-scoped db session that will be closed before use, causing runtime errors.
_{Severity: CRITICAL | Confidence: 1.00}

🔍 Detailed Analysis

The asyncio.create_task call in index_website passes a request-scoped db (AsyncSession) to the background task _crawl_website_job. When the HTTP request completes, FastAPI's dependency system cleans up and closes this session. If _crawl_website_job attempts database operations using this closed session, it will result in a runtime error. This issue is currently latent because database operations within _crawl_website_job are commented out.

💡 Suggested Fix

Modify _crawl_website_job to create its own AsyncSession using async_session_maker() or use job_manager.execute_job() which handles session lifecycle.

🤖 Prompt for AI Agent

Fix this bug. In servers/fai/src/fai/routes/website.py at line 88: The `asyncio.create_task` call in `index_website` passes a request-scoped `db` (AsyncSession) to the background task `_crawl_website_job`. When the HTTP request completes, FastAPI's dependency system cleans up and closes this session. If `_crawl_website_job` attempts database operations using this closed session, it will result in a runtime error. This issue is currently latent because database operations within `_crawl_website_job` are commented out.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

sentry · 2025-10-27T21:24:09Z

servers/fai/src/fai/routes/website.py

+        job_id = await job_manager.create_job(db)
+
+        # TODO: Start the crawling job similar to index_website
+        # asyncio.create_task(_crawl_website_job(...))


Bug: reindex_website creates a job but fails to launch the background task, leaving the job stuck in PENDING status.
_{Severity: CRITICAL | Confidence: 1.00}

🔍 Detailed Analysis

The reindex_website endpoint creates a job using job_manager.create_job but fails to launch the corresponding background task _crawl_website_job because the asyncio.create_task call is commented out. This leaves the created job stuck indefinitely in PENDING status, as no background worker exists to process it, thus breaking the API contract for job tracking.

💡 Suggested Fix

Uncomment asyncio.create_task(_crawl_website_job(...)) in reindex_website to ensure the job is properly launched and processed.

🤖 Prompt for AI Agent

Fix this bug. In servers/fai/src/fai/routes/website.py at line 297: The `reindex_website` endpoint creates a job using `job_manager.create_job` but fails to launch the corresponding background task `_crawl_website_job` because the `asyncio.create_task` call is commented out. This leaves the created job stuck indefinitely in `PENDING` status, as no background worker exists to process it, thus breaking the API contract for job tracking.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

vercel

Additional Comments:

servers/fai/alembic/env.py (line 24):
WebsiteDb model is missing from the Alembic environment configuration. While the manual migration file exists and will work, Alembic won't be aware of this model, which can cause issues with future auto-migrations and model tracking.

View Details

📝 Patch Details

diff --git a/servers/fai/alembic/env.py b/servers/fai/alembic/env.py
index 9e0cc00b1..84f1a91bc 100644
--- a/servers/fai/alembic/env.py
+++ b/servers/fai/alembic/env.py
@@ -8,6 +8,7 @@ from fai.db import (
     Base,
     engine,
 )
+from fai.models.db.conversation_report_db import ConversationReportDb  # noqa: F401
 from fai.models.db.discord_integration_db import DiscordIntegrationDb  # noqa: F401
 from fai.models.db.discord_message_cache_db import DiscordMessageCacheDb  # noqa: F401
 from fai.models.db.document_db import DocumentDb  # noqa: F401
@@ -22,6 +23,7 @@ from fai.models.db.slack_context_db import SlackContextDb  # noqa: F401
 from fai.models.db.slack_integration_db import SlackIntegrationDb  # noqa: F401
 from fai.models.db.slack_message_cache_db import SlackMessageCacheDb  # noqa: F401
 from fai.models.db.slack_message_classification_db import SlackMessageClassificationDb  # noqa: F401
+from fai.models.db.website_db import WebsiteDb  # noqa: F401
 
 # this is the Alembic Config object, which provides
 # access to the values within the .ini file in use.
diff --git a/servers/fai/src/utils/init_db.py b/servers/fai/src/utils/init_db.py
index e14391bc6..705647a29 100644
--- a/servers/fai/src/utils/init_db.py
+++ b/servers/fai/src/utils/init_db.py
@@ -18,6 +18,7 @@ from fai.models.db.slack_context_db import SlackContextDb  # noqa: F401
 from fai.models.db.slack_integration_db import SlackIntegrationDb  # noqa: F401
 from fai.models.db.slack_message_cache_db import SlackMessageCacheDb  # noqa: F401
 from fai.models.db.slack_message_classification_db import SlackMessageClassificationDb  # noqa: F401
+from fai.models.db.website_db import WebsiteDb  # noqa: F401
 
 
 async def init() -> None:

Analysis

Missing model imports in Alembic environment configuration break auto-migration detection

What fails: Alembic's env.py is missing imports for WebsiteDb and ConversationReportDb models, preventing auto-migration detection from tracking these tables

How to reproduce:

cd servers/fai
# Run alembic revision --autogenerate (would miss WebsiteDb/ConversationReportDb changes)

Result: Alembic's Base.metadata.tables won't include websites or conversation_reports tables, causing auto-migration detection to either:

Try to recreate existing tables
Miss schema changes to these models in future migrations
Generate incorrect migration diffs

Expected: All active database models should be imported in alembic/env.py to ensure complete metadata registration, matching the pattern used for 13 other models (DiscordIntegrationDb, DocumentDb, etc.)

Root cause: WebsiteDb has manual migration but wasn't added to env.py imports. ConversationReportDb exists only in init_db.py imports but not in Alembic configuration.

vercel · 2025-10-31T18:16:17Z

servers/fai/src/fai/routes/website.py

+                job_id,
+                index_source.id,
+                domain,
+                IndexWebsiteRequest(base_url=body.base_url),


The reindex_website endpoint creates a minimal IndexWebsiteRequest with only the base_url, discarding all original crawl configuration parameters that should be preserved from the existing index source.

View Details

📝 Patch Details

diff --git a/servers/fai/src/fai/routes/website.py b/servers/fai/src/fai/routes/website.py index 5b74d9631..17f4c6e7f 100644 --- a/servers/fai/src/fai/routes/website.py +++ b/servers/fai/src/fai/routes/website.py @@ -432,6 +432,12 @@ async def reindex_website( await db.commit() # Start the crawling job + # Preserve original crawl configuration if index source exists + if index_source and hasattr(index_source, 'config') and index_source.config: + crawl_config = IndexWebsiteRequest(**index_source.config) + else: + crawl_config = IndexWebsiteRequest(base_url=body.base_url) + asyncio.create_task( job_manager.execute_job( job_id, @@ -439,7 +445,7 @@ async def reindex_website( job_id, index_source.id, domain, - IndexWebsiteRequest(base_url=body.base_url), + crawl_config, db, ) )

Analysis

Configuration loss in reindex_website endpoint when preserving crawl parameters

What fails: reindex_website() in servers/fai/src/fai/routes/website.py creates minimal IndexWebsiteRequest(base_url=body.base_url) at line 442, losing original crawl configuration stored in index_source.config

How to reproduce:

Index website with custom parameters: POST /sources/website/example.com/index with chunk_size=500, delay=2.0, max_pages=100

Call reindex endpoint: POST /sources/website/example.com/reindex with same base_url

Original config (chunk_size=500, delay=2.0, max_pages=100) gets replaced with defaults (chunk_size=1000, delay=1.0, max_pages=null)

Result: Reindex crawl uses different parameters than original indexing, creating inconsistent chunks and crawl behavior

Expected: Reindex should preserve original crawl configuration from index_source.config for consistent re-crawling with identical parameters as initial indexing

vercel bot deployed to Preview – prod.ferndocs.com October 27, 2025 21:04 View deployment

vercel bot deployed to Preview – prod-assets.ferndocs.com October 27, 2025 21:04 View deployment

vercel bot deployed to Preview – preview.ferndocs.com October 27, 2025 21:04 View deployment

vercel bot deployed to Preview – dev.ferndocs.com October 27, 2025 21:05 View deployment

vercel bot deployed to Preview – fern-dashboard October 27, 2025 21:06 View deployment

vercel bot deployed to Preview – fern-dashboard-dev October 27, 2025 21:06 View deployment

sentry bot reviewed Oct 27, 2025

View reviewed changes

tsbhangu temporarily deployed to Fern Dev October 28, 2025 17:35 — with GitHub Actions Inactive

vercel bot deployed to Preview – ferndocs.com October 28, 2025 17:35 View deployment

vercel bot deployed to Preview – prod-assets.ferndocs.com October 28, 2025 17:40 View deployment

vercel bot deployed to Preview – dev.ferndocs.com October 28, 2025 17:41 View deployment

vercel bot deployed to Preview – preview.ferndocs.com October 28, 2025 17:41 View deployment

vercel bot deployed to Preview – prod.ferndocs.com October 28, 2025 17:41 View deployment

vercel bot deployed to Preview – fern-dashboard October 28, 2025 17:42 View deployment

vercel bot deployed to Preview – fern-dashboard-dev October 28, 2025 17:43 View deployment

tsbhangu temporarily deployed to Fern Dev October 28, 2025 18:04 — with GitHub Actions Inactive

vercel bot deployed to Preview – ferndocs.com October 28, 2025 18:05 View deployment

vercel bot deployed to Preview – dev.ferndocs.com October 28, 2025 18:10 View deployment

vercel bot deployed to Preview – prod-assets.ferndocs.com October 28, 2025 18:10 View deployment

vercel bot deployed to Preview – prod.ferndocs.com October 28, 2025 18:10 View deployment

vercel bot deployed to Preview – preview.ferndocs.com October 28, 2025 18:10 View deployment

vercel bot deployed to Preview – fern-dashboard October 28, 2025 18:12 View deployment

vercel bot deployed to Preview – fern-dashboard-dev October 28, 2025 18:12 View deployment

vercel bot reviewed Oct 28, 2025

View reviewed changes

tsbhangu requested a review from eyw520 as a code owner October 28, 2025 20:30

tsbhangu temporarily deployed to Fern Dev October 28, 2025 20:30 — with GitHub Actions Inactive

vercel bot deployed to Preview – ferndocs.com October 28, 2025 20:31 View deployment

vercel bot deployed to Preview – prod.ferndocs.com October 28, 2025 20:36 View deployment

vercel bot deployed to Preview – fern-dashboard-dev October 31, 2025 16:35 View deployment

vercel bot deployed to Preview – fern-dashboard October 31, 2025 16:35 View deployment

One more prompt update

31b8185

tsbhangu had a problem deploying to Fern Dev October 31, 2025 16:46 — with GitHub Actions Failure

vercel bot deployed to Preview – ferndocs.com October 31, 2025 16:46 View deployment

vercel bot deployed to Preview – preview.ferndocs.com October 31, 2025 16:51 View deployment

vercel bot deployed to Preview – dev.ferndocs.com October 31, 2025 16:51 View deployment

vercel bot deployed to Preview – prod.ferndocs.com October 31, 2025 16:51 View deployment

vercel bot deployed to Preview – prod-assets.ferndocs.com October 31, 2025 16:52 View deployment

vercel bot deployed to Preview – fern-dashboard October 31, 2025 16:53 View deployment

vercel bot deployed to Preview – fern-dashboard-dev October 31, 2025 16:53 View deployment

Maxar website as backdrop

fbf68a2

tsbhangu had a problem deploying to Fern Dev October 31, 2025 18:06 — with GitHub Actions Failure

vercel bot deployed to Preview – ferndocs.com October 31, 2025 18:07 View deployment

vercel bot deployed to Preview – prod.ferndocs.com October 31, 2025 18:12 View deployment

vercel bot deployed to Preview – dev.ferndocs.com October 31, 2025 18:12 View deployment

vercel bot deployed to Preview – prod-assets.ferndocs.com October 31, 2025 18:12 View deployment

vercel bot deployed to Preview – preview.ferndocs.com October 31, 2025 18:12 View deployment

vercel bot deployed to Preview – fern-dashboard October 31, 2025 18:13 View deployment

vercel bot deployed to Preview – fern-dashboard-dev October 31, 2025 18:14 View deployment

vercel bot reviewed Oct 31, 2025

View reviewed changes

Add multi audience support

80f6ecb

tsbhangu had a problem deploying to Fern Dev October 31, 2025 18:30 — with GitHub Actions Failure

vercel bot deployed to Preview – ferndocs.com October 31, 2025 18:30 View deployment

vercel bot deployed to Preview – preview.ferndocs.com October 31, 2025 18:36 View deployment

vercel bot deployed to Preview – prod-assets.ferndocs.com October 31, 2025 18:36 View deployment

vercel bot deployed to Preview – dev.ferndocs.com October 31, 2025 18:36 View deployment

vercel bot deployed to Preview – prod.ferndocs.com October 31, 2025 18:36 View deployment

vercel bot deployed to Preview – fern-dashboard October 31, 2025 18:37 View deployment

vercel bot deployed to Preview – fern-dashboard-dev October 31, 2025 18:38 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tanvir/add websites routes #4498

Tanvir/add websites routes #4498

Uh oh!

tsbhangu commented Oct 27, 2025

Uh oh!

vercel bot commented Oct 27, 2025 •

edited

Loading

Uh oh!

sentry bot Oct 27, 2025

Uh oh!

sentry bot Oct 27, 2025

Uh oh!

sentry bot Oct 27, 2025

Uh oh!

vercel bot left a comment

Uh oh!

vercel bot Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tanvir/add websites routes #4498

Are you sure you want to change the base?

Tanvir/add websites routes #4498

Uh oh!

Conversation

tsbhangu commented Oct 27, 2025

Short description of the changes made

How has this PR been tested?

Uh oh!

vercel bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sentry bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

sentry bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

sentry bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

vercel bot left a comment

Choose a reason for hiding this comment

Analysis

Missing model imports in Alembic environment configuration break auto-migration detection

Uh oh!

vercel bot Oct 31, 2025

Choose a reason for hiding this comment

Analysis

Configuration loss in reindex_website endpoint when preserving crawl parameters

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Oct 27, 2025 •

edited

Loading