Skip to content

Add server-side PostHog tracking for AI agent analytics#752

Open
nearestnabors wants to merge 3 commits intomainfrom
feature/posthog-ai-agent-tracking
Open

Add server-side PostHog tracking for AI agent analytics#752
nearestnabors wants to merge 3 commits intomainfrom
feature/posthog-ai-agent-tracking

Conversation

@nearestnabors
Copy link
Contributor

Summary

  • Disable bot filtering in client-side PostHog (opt_out_useragent_filter: true) to capture AI agent traffic
  • Add posthog-node for server-side tracking of markdown API requests
  • Create AI agent classification system with 40+ patterns covering:
    • OpenAI (GPTBot, ChatGPT-User, OAI-SearchBot)
    • Anthropic (ClaudeBot, Claude-User, Claude-SearchBot)
    • Perplexity, Google, Amazon, Meta, Cohere
    • Developer tools (Cursor, GitHub Copilot, Codeium, Tabnine)
    • Agent frameworks (LangChain, LlamaIndex, AutoGPT)
    • Doc AI tools (Kapa.ai, Mendable, Inkeep, Glean)
  • Track markdown API requests with classification properties:
    • is_ai_agent (boolean)
    • ai_agent_type (e.g., "ClaudeBot")
    • ai_agent_provider (e.g., "Anthropic")

This enables tracking AI agent traffic that doesn't execute client-side JavaScript (e.g., agents hitting cached CDN content).

Related

Based on Q1 Agent-Led Growth (ALG) proposal and conversation with PostHog support (Lucas Ricoy).

Test plan

  • Verify build passes
  • Test locally with AI user-agent: curl -H "User-Agent: ClaudeBot" http://localhost:3000/en/home
  • Check PostHog dashboard for is_ai_agent events after deployment
  • Create cohorts for "AI Agents" vs "Humans" in PostHog

🤖 Generated with Claude Code

- Disable bot filtering in client-side PostHog to capture AI agent traffic
- Add posthog-node for server-side tracking
- Create AI agent classification system with 40+ patterns (OpenAI, Anthropic, Perplexity, etc.)
- Track markdown API requests with agent classification properties:
  - is_ai_agent (boolean)
  - ai_agent_type (e.g., "ClaudeBot")
  - ai_agent_provider (e.g., "Anthropic")

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Feb 13, 2026 11:20am

Request Review

}

// AI agent detection patterns with classification
const AI_AGENT_CLASSIFIERS: Array<{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this list up-to-date ourselves seems tedious. Is there a pacakge we can install from someone else who is maintaining this list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and if not, we should publish and own that package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't change that much/is relatively stable. I think publishing such a package is a great idea! But perhaps we should wait to do so until this needs updating?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, cool - then move this into its own data file and we can keep it up-to-date more simply


posthog.capture({
distinctId: event.distinctId,
event: "$pageview",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these agents are using browsers, will we get 2 events for each pageview (one from the frontend and one from the backend)? Perhaps a different event name would clear that up (backend_pageview?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the event name from $pageview to server_markdown_request.

This way:

  • Client-side: $pageview (from posthog-js when JS executes)
  • Server-side: server_markdown_request (from posthog-node when markdown is served)

No double-counting, and it's clear in dashboards which is which. You can reply to the PR
comment with that explanation.

Comment on lines 99 to 101
} catch {
// Silently fail - tracking errors should not affect the response
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log the error so we can fix it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this!

Using a distinct event name instead of $pageview prevents confusion
when AI agents that use headless browsers trigger both client-side
and server-side events.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit about where to store the browserlist, otherwise, let's try it!

}

// AI agent detection patterns with classification
const AI_AGENT_CLASSIFIERS: Array<{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, cool - then move this into its own data file and we can keep it up-to-date more simply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants