Skip to content

Commit 5115d40

Browse files
authored
Merge pull request #1 from angular/docs
Polish documentation for open-source release
2 parents 48a8b0e + 6f0fcb3 commit 5115d40

File tree

5 files changed

+174
-85
lines changed

5 files changed

+174
-85
lines changed

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2025 Angular
3+
Copyright (c) 2025 Google LLC
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 113 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,57 @@
11
# Web Codegen Scorer
22

3-
This project is a tool designed to assess the quality of front-end code generated by Large Language Models (LLMs).
3+
**Web Codegen Scorer** is a tool for evaluating the quality of web code generated by Large Language
4+
Models (LLMs).
45

5-
## Documentation directory
6+
You can use this tool to make evidence-based decisions relating to AI-generated code. For example:
67

7-
- [Environment config reference](./docs/environment-reference.md)
8-
- [How to set up a new model?](./docs/model-setup.md)
8+
* 🔄 Iterate on a system prompt to find most effective instructions for your project.
9+
* ⚖️ Compare the code quality of code produced by different models.
10+
* 📈 Monitor generated code quality over time as models and agents evolve.
11+
12+
Web Codegen Scorer is different from other code benchmarks in that it focuses specifically on _web_
13+
code and relies primarily on well-established measures of code quality.
14+
15+
## Features
16+
17+
* ⚙️ Configure your evaluations with different models, frameworks, and tools.
18+
* ✍️ Specify system instructions and add MCP servers.
19+
* 📋 Use built-in checks for build success, runtime errors, accessibility, security, LLM rating, and
20+
coding best practices. (More built-in checks coming soon!)
21+
* 🔧 Automatically attempt to repair issues detected during code generating.
22+
* 📊 View and compare results with an intuitive report viewer UI.
923

1024
## Setup
1125

12-
1. **Install the package:**
26+
1. **Install the package:**
27+
1328
```bash
1429
npm install -g web-codegen-scorer
1530
```
1631

17-
2. **Set up your API keys:**
18-
In order to run an eval, you have to specify an API keys for the relevant providers as environment variables:
32+
2. **Set up your API keys:**
33+
34+
In order to run an eval, you have to specify an API keys for the relevant providers as
35+
environment variables:
36+
1937
```bash
2038
export GEMINI_API_KEY="YOUR_API_KEY_HERE" # If you're using Gemini models
2139
export OPENAI_API_KEY="YOUR_API_KEY_HERE" # If you're using OpenAI models
2240
export ANTHROPIC_API_KEY="YOUR_API_KEY_HERE" # If you're using Anthropic models
2341
```
2442

2543
3. **Run an eval:**
26-
You can run your first eval using our Angular example with the following command:
44+
45+
You can run your first eval using our Angular example with the following command:
46+
2747
```bash
2848
web-codegen-scorer eval --env=angular-example
2949
```
3050

3151
4. (Optional) **Set up your own eval:**
32-
If you want to set up a custom eval, instead of using our built-in examples, you can run the following
33-
command which will guide you through the process:
52+
53+
If you want to set up a custom eval, instead of using our built-in examples, you can run the
54+
following command which will guide you through the process:
3455

3556
```bash
3657
web-codegen-scorer init
@@ -40,54 +61,112 @@ web-codegen-scorer init
4061

4162
You can customize the `web-codegen-scorer eval` script with the following flags:
4263

43-
- `--env=<path>` (alias: `--environment`): (**Required**) Specifies the path from which to load the environment config.
44-
- Example: `web-codegen-scorer eval --env=foo/bar/my-env.js`
64+
- `--env=<path>` (alias: `--environment`): (**Required**) Specifies the path from which to load the
65+
environment config.
66+
- Example: `web-codegen-scorer eval --env=foo/bar/my-env.js`
4567

46-
- `--model=<name>`: Specifies the model to use when generating code. Defaults to the value of `DEFAULT_MODEL_NAME`.
47-
- Example: `web-codegen-scorer eval --model=gemini-2.5-flash --env=<config path>`
68+
- `--model=<name>`: Specifies the model to use when generating code. Defaults to the value of
69+
`DEFAULT_MODEL_NAME`.
70+
- Example: `web-codegen-scorer eval --model=gemini-2.5-flash --env=<config path>`
4871

49-
- `--runner=<name>`: Specifies the runner to use to execute the eval. Supported runners are `genkit` (default) or `gemini-cli`.
72+
- `--runner=<name>`: Specifies the runner to use to execute the eval. Supported runners are
73+
`genkit` (default) or `gemini-cli`.
5074

51-
- `--local`: Runs the script in local mode for the initial code generation request. Instead of calling the LLM, it will attempt to read the initial code from a corresponding file in the `.llm-output` directory (e.g., `.llm-output/todo-app.ts`). This is useful for re-running assessments or debugging the build/repair process without incurring LLM costs for the initial generation.
52-
- **Note:** You typically need to run `web-codegen-scorer eval` once without `--local` to generate the initial files in `.llm-output`.
53-
- The `web-codegen-scorer eval:local` script is a shortcut for `web-codegen-scorer eval --local`.
75+
- `--local`: Runs the script in local mode for the initial code generation request. Instead of
76+
calling the LLM, it will attempt to read the initial code from a corresponding file in the
77+
`.web-codegen-scorer/llm-output` directory (e.g., `.web-codegen-scorer/llm-output/todo-app.ts`).
78+
This is useful for re-running assessments or debugging the build/repair process without incurring
79+
LLM costs for the initial generation.
80+
- **Note:** You typically need to run `web-codegen-scorer eval` once without `--local` to
81+
generate the initial files in `.web-codegen-scorer/llm-output`.
82+
- The `web-codegen-scorer eval:local` script is a shortcut for
83+
`web-codegen-scorer eval --local`.
5484

5585
- `--limit=<number>`: Specifies the number of application prompts to process. Defaults to `5`.
56-
- Example: `web-codegen-scorer eval --limit=10 --env=<config path>`
86+
- Example: `web-codegen-scorer eval --limit=10 --env=<config path>`
5787

58-
- `--output-directory=<name>` (alias: `--output-dir`): Specifies which directory to output the generated code under which is useful for debugging. By default the code will be generated in a temporary directory.
59-
- Example: `web-codegen-scorer eval --output-dir=test-output --env=<config path>`
88+
- `--output-directory=<name>` (alias: `--output-dir`): Specifies which directory to output the
89+
generated code under which is useful for debugging. By default, the code will be generated in a
90+
temporary directory.
91+
- Example: `web-codegen-scorer eval --output-dir=test-output --env=<config path>`
6092

61-
- `--concurrency=<number>`: Sets the maximum number of concurrent AI API requests. Defaults to `5` (as defined by `DEFAULT_CONCURRENCY` in `src/config.ts`).
62-
- Example: `web-codegen-scorer eval --concurrency=3 --env=<config path>`
93+
- `--concurrency=<number>`: Sets the maximum number of concurrent AI API requests. Defaults to `5` (
94+
as defined by `DEFAULT_CONCURRENCY` in `src/config.ts`).
95+
- Example: `web-codegen-scorer eval --concurrency=3 --env=<config path>`
6396

64-
- `--report-name=<name>`: Sets the name for the generated report directory. Defaults to a timestamp (e.g., `2023-10-27T10-30-00-000Z`). The name will be sanitized (non-alphanumeric characters replaced with hyphens).
65-
- Example: `web-codegen-scorer eval --report-name=my-custom-report --env=<config path>`
97+
- `--report-name=<name>`: Sets the name for the generated report directory. Defaults to a
98+
timestamp (e.g., `2023-10-27T10-30-00-000Z`). The name will be sanitized (non-alphanumeric
99+
characters replaced with hyphens).
100+
- Example: `web-codegen-scorer eval --report-name=my-custom-report --env=<config path>`
66101

67-
- `--rag-endpoint=<url>`: Specifies a custom RAG (Retrieval-Augmented Generation) endpoint URL. The URL must contain a `PROMPT` substring, which will be replaced with the user prompt.
68-
- Example: `web-codegen-scorer eval --rag-endpoint="http://localhost:8080/my-rag-endpoint?query=PROMPT" --env=<config path>`
102+
- `--rag-endpoint=<url>`: Specifies a custom RAG (Retrieval-Augmented Generation) endpoint URL. The
103+
URL must contain a `PROMPT` substring, which will be replaced with the user prompt.
104+
- Example:
105+
`web-codegen-scorer eval --rag-endpoint="http://localhost:8080/my-rag-endpoint?query=PROMPT" --env=<config path>`
69106

70-
- `--prompt-filter=<name>`: String used to filter which prompts should be run. By default a random sample (controlled by `--limit`) will be taken from the prompts in the current environment. Setting this can be useful for debugging a specific prompt.
71-
- Example: `web-codegen-scorer eval --prompt-filter=tic-tac-toe --env=<config path>`
107+
- `--prompt-filter=<name>`: String used to filter which prompts should be run. By default, a random
108+
sample (controlled by `--limit`) will be taken from the prompts in the current environment.
109+
Setting this can be useful for debugging a specific prompt.
110+
- Example: `web-codegen-scorer eval --prompt-filter=tic-tac-toe --env=<config path>`
72111

73-
- `--skip-screenshots`: Whether to skip taking screenshots of the generated app. Defaults to `false`.
74-
- Example: `web-codegen-scorer eval --skip-screenshots --env=<config path>`
112+
- `--skip-screenshots`: Whether to skip taking screenshots of the generated app. Defaults to
113+
`false`.
114+
- Example: `web-codegen-scorer eval --skip-screenshots --env=<config path>`
75115

76116
- `--labels=<label1> <label2>`: Metadata labels that will be attached to the run.
77-
- Example: `web-codegen-scorer eval --labels my-label another-label --env=<config path>`
117+
- Example: `web-codegen-scorer eval --labels my-label another-label --env=<config path>`
78118

79119
- `--mcp`: Whether to start an MCP for the evaluation. Defaults to `false`.
80-
- Example: `web-codegen-scorer eval --mcp --env=<config path>`
120+
- Example: `web-codegen-scorer eval --mcp --env=<config path>`
81121

82122
- `--help`: Prints out usage information about the script.
83123

124+
### Additional configuration options
125+
126+
- [Environment config reference](./docs/environment-reference.md)
127+
- [How to set up a new model?](./docs/model-setup.md)
128+
84129
## Local development
85130

86-
If you've cloned this repo and want to work on the tool, you have to install its dependencies by running `pnpm install`.
131+
If you've cloned this repo and want to work on the tool, you have to install its dependencies by
132+
running `pnpm install`.
87133
Once they're installed, you can run the following commands:
88134

89135
* `pnpm run release-build` - Builds the package in the `dist` directory for publishing to npm.
90136
* `pnpm run eval` - Runs an eval from source.
91137
* `pnpm run report` - Runs the report app from source.
92138
* `pnpm run init` - Runs the init script from source.
93139
* `pnpm run format` - Formats the source code using Prettier.
140+
141+
## FAQ
142+
143+
### Who built this tool?
144+
145+
This tool is built by the Angular team at Google.
146+
147+
### Does this tool only work for Angular code or Google models?
148+
149+
No! You can use this tool with any web library or framework (or none at all) as well as any model.
150+
151+
### Why did you build this tool?
152+
153+
As more and more developers reach for LLM-based tools to create and modify code, we wanted to be
154+
able to empirically measure the effect of different factors on the quality of generated code. While
155+
many LLM coding benchmarks exist, we found that these were often too broad and didn't measure the
156+
specific quality metrics we cared about.
157+
158+
In the absence of such a tool, we found that many developers based their judgements on codegen with
159+
different models, frameworks, and tools on loosely structured trial-and-error. In contrast, Web
160+
Codegen Scorer gives us a platform to consistently measure codegen across different configurations
161+
with consistency and repeatability.
162+
163+
### Will you add more features over time?
164+
165+
Yes! We plan to both expand the number of built-in checks and the variety of codegen scenarios.
166+
167+
Our roadmap includes:
168+
169+
* Including _interaction testing_ in the rating, to ensure the generated code performs any requested
170+
behaviors.
171+
* Measure Core Web Vitals.
172+
* Measuring the effectiveness of LLM-driven edits on an existing codebase.

0 commit comments

Comments
 (0)