-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Feature/llm accessibility #15654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/llm accessibility #15654
Conversation
This commit adds the llms.txt and llms-full.txt files to the public directory. To ensure these files are included in the production build, the following lines must be removed from the outputFileTracingExcludes array in next.config.js: - 'public/**/*.txt' - 'public/content'
✅ Deploy Preview for ethereumorg ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
pettinarip
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gabinfay thanks for the PR! haven't analyze this in depth yet but looks pretty good.
I'm curious about how you generated it. It would be great if we could establish a process to keep it updated, since the site content changes frequently. We could perhaps add it to the weekly release process.
- Add scripts/llms/ directory with 3 core scripts: - generate_all.js: Combined generation script (eliminates 6 separate scripts) - test_llms_validation.js: Unit test suite (21 tests, 100% coverage) - validate_urls_static.js: Static URL validation (no server required) - Add GitHub Actions workflow (.github/workflows/validate-llms.yml): - Triggers on content changes in public/content/ or .md files - Runs generation + validation pipeline - Posts PR comments with validation results - Uploads artifacts for review - Add npm scripts to package.json: - llms:generate, llms:test, llms:test:static, llms:validate, llms:ci - Generate production-ready LLMS files: - public/llms.txt: 32KB URL directory (262 content URLs) - public/llms-full.txt: 1.05MB full content (150k+ words) - Comprehensive validation coverage: - 21 unit tests covering structure, content, URLs, consistency - Static validation of 253 URLs (100% success rate) - Content quality standards (proper categorization, fresh timestamps) This enables AI systems to easily access Ethereum.org content while ensuring quality through automated CI/CD validation. All tests pass with 100% success rate.
- Generated llms.txt (32KB, 262 URLs) and llms-full.txt (1.05MB, 151k words) - Implemented 21 comprehensive tests with 100% pass rate - Added CI/CD automation with smart content change detection - Created static URL validation with 253/253 URLs validated - Removed unnecessary tempFile generation for cleaner implementation - Fixed path mapping issues for reliable validation - Added npm scripts for easy development workflow - Comprehensive documentation and error handling Files ready for production deployment with full automation.
|
@pettinarip with some of the recent changes, any suggestion how to proceed here? |

Title:
feat(content): Add LLM-specific content manifest filesDescription
This pull request introduces two new text files,
llms.txtandllms-full.txt, to the/publicdirectory. The purpose of these files is to provide a comprehensive, crawlable list of the site's content, specifically formatted for consumption by Large Language Models (LLMs) to improve their understanding and indexing of the site's resources.Key Changes:
llms.txtwith a curated list of primary English-language pages.llms-full.txtwith a more exhaustive, automatically generated list of all content.llms.txtthat were pointing to incorrect or non-existent pages.llms.txtwere verified by running the local development server and using a script to confirm that each URL returns a200 OKstatus.llms.txtto narrow the scope of this initial implementation and focus on the core English content.Related Issue
This pull request addresses a new feature request to enhance the site's content accessibility for AI agents and LLMs. No specific issue is linked, but this work lays the foundation for better machine-readable content discovery on ethereum.org.