Skip to content

Conversation

@gabinfay
Copy link

Title: feat(content): Add LLM-specific content manifest files


Description

This pull request introduces two new text files, llms.txt and llms-full.txt, to the /public directory. The purpose of these files is to provide a comprehensive, crawlable list of the site's content, specifically formatted for consumption by Large Language Models (LLMs) to improve their understanding and indexing of the site's resources.

Key Changes:

  • Added llms.txt with a curated list of primary English-language pages.
  • Added llms-full.txt with a more exhaustive, automatically generated list of all content.
  • Identified and corrected several broken links in the initial version of llms.txt that were pointing to incorrect or non-existent pages.
  • To ensure the quality of the primary file, all links in the final llms.txt were verified by running the local development server and using a script to confirm that each URL returns a 200 OK status.
  • Removed links to translated content from the primary llms.txt to narrow the scope of this initial implementation and focus on the core English content.

Related Issue

This pull request addresses a new feature request to enhance the site's content accessibility for AI agents and LLMs. No specific issue is linked, but this work lays the foundation for better machine-readable content discovery on ethereum.org.

gabinfay added 3 commits June 11, 2025 16:15
This commit adds the llms.txt and llms-full.txt files to the public directory.

To ensure these files are included in the production build, the following lines must be removed from the outputFileTracingExcludes array in next.config.js:

- 'public/**/*.txt'

- 'public/content'
@netlify
Copy link

netlify bot commented Jun 11, 2025

Deploy Preview for ethereumorg ready!

Name Link
🔨 Latest commit 5974021
🔍 Latest deploy log https://app.netlify.com/projects/ethereumorg/deploys/6856484427eb7d0008934cd7
😎 Deploy Preview https://deploy-preview-15654--ethereumorg.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
7 paths audited
Performance: 45 (🔴 down 13 from production)
Accessibility: 95 (🟢 up 1 from production)
Best Practices: 89 (🔴 down 10 from production)
SEO: 99 (no change from production)
PWA: 59 (no change from production)
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions github-actions bot added the config ⚙️ Changes to configuration files label Jun 11, 2025
Copy link
Member

@pettinarip pettinarip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabinfay thanks for the PR! haven't analyze this in depth yet but looks pretty good.

I'm curious about how you generated it. It would be great if we could establish a process to keep it updated, since the site content changes frequently. We could perhaps add it to the weekly release process.

gabinfay added 3 commits June 21, 2025 11:06
- Add scripts/llms/ directory with 3 core scripts:
  - generate_all.js: Combined generation script (eliminates 6 separate scripts)
  - test_llms_validation.js: Unit test suite (21 tests, 100% coverage)
  - validate_urls_static.js: Static URL validation (no server required)

- Add GitHub Actions workflow (.github/workflows/validate-llms.yml):
  - Triggers on content changes in public/content/ or .md files
  - Runs generation + validation pipeline
  - Posts PR comments with validation results
  - Uploads artifacts for review

- Add npm scripts to package.json:
  - llms:generate, llms:test, llms:test:static, llms:validate, llms:ci

- Generate production-ready LLMS files:
  - public/llms.txt: 32KB URL directory (262 content URLs)
  - public/llms-full.txt: 1.05MB full content (150k+ words)

- Comprehensive validation coverage:
  - 21 unit tests covering structure, content, URLs, consistency
  - Static validation of 253 URLs (100% success rate)
  - Content quality standards (proper categorization, fresh timestamps)

This enables AI systems to easily access Ethereum.org content while ensuring
quality through automated CI/CD validation. All tests pass with 100% success rate.
- Generated llms.txt (32KB, 262 URLs) and llms-full.txt (1.05MB, 151k words)
- Implemented 21 comprehensive tests with 100% pass rate
- Added CI/CD automation with smart content change detection
- Created static URL validation with 253/253 URLs validated
- Removed unnecessary tempFile generation for cleaner implementation
- Fixed path mapping issues for reliable validation
- Added npm scripts for easy development workflow
- Comprehensive documentation and error handling

Files ready for production deployment with full automation.
@github-actions github-actions bot added dependencies 📦 Changes related to project dependencies tooling 🔧 Changes related to tooling of the project labels Jun 21, 2025
@wackerow
Copy link
Member

@pettinarip with some of the recent changes, any suggestion how to proceed here?

@gabinfay gabinfay closed this by deleting the head repository Aug 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config ⚙️ Changes to configuration files dependencies 📦 Changes related to project dependencies tooling 🔧 Changes related to tooling of the project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants