Skip to content

Commit 68b2089

Browse files
kyleconroyclaude
andauthored
feat(postgresql): add analyzerv2 experiment for database-only analysis (#4237)
* feat(postgresql): add accurate analyzer mode for database-only analysis Add an optional `analyzer.accurate: true` mode for PostgreSQL that bypasses the internal catalog and uses only database-backed analysis. Key features: - Uses database PREPARE for all type resolution (columns, parameters) - Uses expander package for SELECT * and RETURNING * expansion - Queries pg_catalog to build catalog structures for code generation - Skips internal catalog building from schema files Configuration: ```yaml sql: - engine: postgresql database: uri: "postgres://..." # or managed: true analyzer: accurate: true ``` This mode requires a database connection and the schema must exist in the database. It provides more accurate type information for complex queries. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * test: add end-to-end tests for accurate analyzer mode Add three end-to-end test cases for the accurate analyzer mode: 1. accurate_star_expansion - Tests SELECT *, INSERT RETURNING *, UPDATE RETURNING *, DELETE RETURNING * 2. accurate_enum - Tests enum type introspection from pg_catalog 3. accurate_cte - Tests CTE (Common Table Expression) with star expansion All tests use the managed-db context which requires Docker to run PostgreSQL containers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(tests): update expected output for accurate mode end-to-end tests Update expected output files to match actual sqlc generate output: - Fix parameter naming (Column1, Column2, dollar_1) - Fix nullability types (sql.NullString, sql.NullInt32) - Fix CTE formatting (single line) - Fix query semicolons 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * test(e2e): add accurate mode test for CTE with VALUES clause Tests CTE using VALUES clause with column aliasing to verify accurate analyzer handles inline table expressions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(ast): fix VALUES clause formatting to output multiple rows The VALUES clause was incorrectly formatting multiple rows as a single row with multiple columns. For example: VALUES ('A'), ('B'), ('C') was being formatted as: VALUES ('A', 'B', 'C') This caused the star expander to think the VALUES table had 3 columns instead of 1, resulting in incorrect SELECT * expansion. The fix properly iterates over each row in ValuesLists and wraps each in parentheses. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat: rename accurate mode to analyzer.database: only with analyzerv2 experiment This change refactors the "accurate analyzer mode" feature: 1. Rename config option from `analyzer.accurate: true` to `analyzer.database: only` - a third option in addition to true/false 2. Gate the feature behind the `analyzerv2` experiment flag. The feature is only enabled when: - `analyzer.database: only` is set in the config - `SQLCEXPERIMENT=analyzerv2` environment variable is set 3. Update JSON schemas to support boolean or "only" for analyzer.database 4. Add experiment tests for analyzerv2 flag 5. Update end-to-end test configs and expected outputs The database-only mode skips building the internal catalog from schema files and instead relies entirely on the database for type resolution and star expansion. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: add SQLite support for database-only mode (analyzer.database: only) This extends the database-only analyzer mode to support SQLite in addition to PostgreSQL: 1. Add EnsureConn, GetColumnNames, and IntrospectSchema methods to the SQLite analyzer for database-only mode functionality 2. Update compiler to handle SQLite database-only mode: - Add sqliteAnalyzer field to Compiler struct - Initialize SQLite analyzer when database-only mode is enabled - Build catalog from SQLite database via PRAGMA table_info 3. Add SQLite end-to-end test case for database-only mode The SQLite database-only mode uses PRAGMA table_info to introspect tables and columns, and prepares queries to get column names for star expansion. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: use analyzer interface for database-only mode - Add EnsureConn and GetColumnNames methods to Analyzer interface - Remove engine-specific pgAnalyzer and sqliteAnalyzer fields from compiler - Use unified analyzer interface for database connection initialization - Keep parsing schema files to build catalog, only use database for star expansion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat: parse schema for syntax validation only in database-only mode In database-only mode, parse the schema migrations to validate syntax and collect them for the database connection, but skip updating the catalog. The database will be the source of truth for schema information. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>
1 parent 53b12f9 commit 68b2089

File tree

42 files changed

+1304
-81
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1304
-81
lines changed

internal/analyzer/analyzer.go

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,21 @@ func (c *CachedAnalyzer) Close(ctx context.Context) error {
110110
return c.a.Close(ctx)
111111
}
112112

113+
func (c *CachedAnalyzer) EnsureConn(ctx context.Context, migrations []string) error {
114+
return c.a.EnsureConn(ctx, migrations)
115+
}
116+
117+
func (c *CachedAnalyzer) GetColumnNames(ctx context.Context, query string) ([]string, error) {
118+
return c.a.GetColumnNames(ctx, query)
119+
}
120+
113121
type Analyzer interface {
114122
Analyze(context.Context, ast.Node, string, []string, *named.ParamSet) (*analysis.Analysis, error)
115123
Close(context.Context) error
124+
// EnsureConn initializes the database connection with the given migrations.
125+
// This is required for database-only mode where we need to connect before analyzing queries.
126+
EnsureConn(ctx context.Context, migrations []string) error
127+
// GetColumnNames returns the column names for a query by preparing it against the database.
128+
// This is used for star expansion in database-only mode.
129+
GetColumnNames(ctx context.Context, query string) ([]string, error)
116130
}

internal/cmd/generate.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,7 @@ func remoteGenerate(ctx context.Context, configPath string, conf *config.Config,
295295

296296
func parse(ctx context.Context, name, dir string, sql config.SQL, combo config.CombinedSettings, parserOpts opts.Parser, stderr io.Writer) (*compiler.Result, bool) {
297297
defer trace.StartRegion(ctx, "parse").End()
298-
c, err := compiler.NewCompiler(sql, combo)
298+
c, err := compiler.NewCompiler(sql, combo, parserOpts)
299299
defer func() {
300300
if c != nil {
301301
c.Close(ctx)

internal/compiler/compile.go

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
package compiler
22

33
import (
4+
"context"
45
"errors"
56
"fmt"
67
"io"
@@ -39,11 +40,20 @@ func (c *Compiler) parseCatalog(schemas []string) error {
3940
}
4041
contents := migrations.RemoveRollbackStatements(string(blob))
4142
c.schema = append(c.schema, contents)
43+
44+
// In database-only mode, we parse the schema to validate syntax
45+
// but don't update the catalog - the database will be the source of truth
4246
stmts, err := c.parser.Parse(strings.NewReader(contents))
4347
if err != nil {
4448
merr.Add(filename, contents, 0, err)
4549
continue
4650
}
51+
52+
// Skip catalog updates in database-only mode
53+
if c.databaseOnlyMode {
54+
continue
55+
}
56+
4757
for i := range stmts {
4858
if err := c.catalog.Update(stmts[i], c); err != nil {
4959
merr.Add(filename, contents, stmts[i].Pos(), err)
@@ -58,6 +68,15 @@ func (c *Compiler) parseCatalog(schemas []string) error {
5868
}
5969

6070
func (c *Compiler) parseQueries(o opts.Parser) (*Result, error) {
71+
ctx := context.Background()
72+
73+
// In database-only mode, initialize the database connection before parsing queries
74+
if c.databaseOnlyMode && c.analyzer != nil {
75+
if err := c.analyzer.EnsureConn(ctx, c.schema); err != nil {
76+
return nil, fmt.Errorf("failed to initialize database connection: %w", err)
77+
}
78+
}
79+
6180
var q []*Query
6281
merr := multierr.New()
6382
set := map[string]struct{}{}
@@ -113,6 +132,7 @@ func (c *Compiler) parseQueries(o opts.Parser) (*Result, error) {
113132
if len(q) == 0 {
114133
return nil, fmt.Errorf("no queries contained in paths %s", strings.Join(c.conf.Queries, ","))
115134
}
135+
116136
return &Result{
117137
Catalog: c.catalog,
118138
Queries: q,

internal/compiler/engine.go

Lines changed: 50 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ import (
1414
sqliteanalyze "github.com/sqlc-dev/sqlc/internal/engine/sqlite/analyzer"
1515
"github.com/sqlc-dev/sqlc/internal/opts"
1616
"github.com/sqlc-dev/sqlc/internal/sql/catalog"
17+
"github.com/sqlc-dev/sqlc/internal/x/expander"
1718
)
1819

1920
type Compiler struct {
@@ -27,23 +28,49 @@ type Compiler struct {
2728
selector selector
2829

2930
schema []string
31+
32+
// databaseOnlyMode indicates that the compiler should use database-only analysis
33+
// and skip building the internal catalog from schema files (analyzer.database: only)
34+
databaseOnlyMode bool
35+
// expander is used to expand SELECT * and RETURNING * in database-only mode
36+
expander *expander.Expander
3037
}
3138

32-
func NewCompiler(conf config.SQL, combo config.CombinedSettings) (*Compiler, error) {
39+
func NewCompiler(conf config.SQL, combo config.CombinedSettings, parserOpts opts.Parser) (*Compiler, error) {
3340
c := &Compiler{conf: conf, combo: combo}
3441

3542
if conf.Database != nil && conf.Database.Managed {
3643
client := dbmanager.NewClient(combo.Global.Servers)
3744
c.client = client
3845
}
3946

47+
// Check for database-only mode (analyzer.database: only)
48+
// This feature requires the analyzerv2 experiment to be enabled
49+
databaseOnlyMode := conf.Analyzer.Database.IsOnly() && parserOpts.Experiment.AnalyzerV2
50+
4051
switch conf.Engine {
4152
case config.EngineSQLite:
42-
c.parser = sqlite.NewParser()
53+
parser := sqlite.NewParser()
54+
c.parser = parser
4355
c.catalog = sqlite.NewCatalog()
4456
c.selector = newSQLiteSelector()
45-
if conf.Database != nil {
46-
if conf.Analyzer.Database == nil || *conf.Analyzer.Database {
57+
58+
if databaseOnlyMode {
59+
// Database-only mode requires a database connection
60+
if conf.Database == nil {
61+
return nil, fmt.Errorf("analyzer.database: only requires database configuration")
62+
}
63+
if conf.Database.URI == "" && !conf.Database.Managed {
64+
return nil, fmt.Errorf("analyzer.database: only requires database.uri or database.managed")
65+
}
66+
c.databaseOnlyMode = true
67+
// Create the SQLite analyzer (implements Analyzer interface)
68+
sqliteAnalyzer := sqliteanalyze.New(*conf.Database)
69+
c.analyzer = analyzer.Cached(sqliteAnalyzer, combo.Global, *conf.Database)
70+
// Create the expander using the analyzer as the column getter
71+
c.expander = expander.New(c.analyzer, parser, parser)
72+
} else if conf.Database != nil {
73+
if conf.Analyzer.Database.IsEnabled() {
4774
c.analyzer = analyzer.Cached(
4875
sqliteanalyze.New(*conf.Database),
4976
combo.Global,
@@ -56,11 +83,27 @@ func NewCompiler(conf config.SQL, combo config.CombinedSettings) (*Compiler, err
5683
c.catalog = dolphin.NewCatalog()
5784
c.selector = newDefaultSelector()
5885
case config.EnginePostgreSQL:
59-
c.parser = postgresql.NewParser()
86+
parser := postgresql.NewParser()
87+
c.parser = parser
6088
c.catalog = postgresql.NewCatalog()
6189
c.selector = newDefaultSelector()
62-
if conf.Database != nil {
63-
if conf.Analyzer.Database == nil || *conf.Analyzer.Database {
90+
91+
if databaseOnlyMode {
92+
// Database-only mode requires a database connection
93+
if conf.Database == nil {
94+
return nil, fmt.Errorf("analyzer.database: only requires database configuration")
95+
}
96+
if conf.Database.URI == "" && !conf.Database.Managed {
97+
return nil, fmt.Errorf("analyzer.database: only requires database.uri or database.managed")
98+
}
99+
c.databaseOnlyMode = true
100+
// Create the PostgreSQL analyzer (implements Analyzer interface)
101+
pgAnalyzer := pganalyze.New(c.client, *conf.Database)
102+
c.analyzer = analyzer.Cached(pgAnalyzer, combo.Global, *conf.Database)
103+
// Create the expander using the analyzer as the column getter
104+
c.expander = expander.New(c.analyzer, parser, parser)
105+
} else if conf.Database != nil {
106+
if conf.Analyzer.Database.IsEnabled() {
64107
c.analyzer = analyzer.Cached(
65108
pganalyze.New(c.client, *conf.Database),
66109
combo.Global,

internal/compiler/parse.go

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,56 @@ func (c *Compiler) parseQuery(stmt ast.Node, src string, o opts.Parser) (*Query,
7171
}
7272

7373
var anlys *analysis
74-
if c.analyzer != nil {
74+
if c.databaseOnlyMode && c.expander != nil {
75+
// In database-only mode, use the expander for star expansion
76+
// and rely entirely on the database analyzer for type resolution
77+
expandedQuery, err := c.expander.Expand(ctx, rawSQL)
78+
if err != nil {
79+
return nil, fmt.Errorf("star expansion failed: %w", err)
80+
}
81+
82+
// Parse named parameters from the expanded query
83+
expandedStmts, err := c.parser.Parse(strings.NewReader(expandedQuery))
84+
if err != nil {
85+
return nil, fmt.Errorf("parsing expanded query failed: %w", err)
86+
}
87+
if len(expandedStmts) == 0 {
88+
return nil, errors.New("no statements in expanded query")
89+
}
90+
expandedRaw := expandedStmts[0].Raw
91+
92+
// Use the analyzer to get type information from the database
93+
result, err := c.analyzer.Analyze(ctx, expandedRaw, expandedQuery, c.schema, nil)
94+
if err != nil {
95+
return nil, err
96+
}
97+
98+
// Convert the analyzer result to the internal analysis format
99+
var cols []*Column
100+
for _, col := range result.Columns {
101+
cols = append(cols, convertColumn(col))
102+
}
103+
var params []Parameter
104+
for _, p := range result.Params {
105+
params = append(params, Parameter{
106+
Number: int(p.Number),
107+
Column: convertColumn(p.Column),
108+
})
109+
}
110+
111+
// Determine the insert table if applicable
112+
var table *ast.TableName
113+
if insert, ok := expandedRaw.Stmt.(*ast.InsertStmt); ok {
114+
table, _ = ParseTableName(insert.Relation)
115+
}
116+
117+
anlys = &analysis{
118+
Table: table,
119+
Columns: cols,
120+
Parameters: params,
121+
Query: expandedQuery,
122+
}
123+
} else if c.analyzer != nil {
75124
inference, _ := c.inferQuery(raw, rawSQL)
76125
if inference == nil {
77126
inference = &analysis{}

internal/config/config.go

Lines changed: 68 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,8 +122,75 @@ type SQL struct {
122122
Analyzer Analyzer `json:"analyzer" yaml:"analyzer"`
123123
}
124124

125+
// AnalyzerDatabase represents the database analyzer setting.
126+
// It can be a boolean (true/false) or the string "only" for database-only mode.
127+
type AnalyzerDatabase struct {
128+
value *bool // nil means not set, true/false for boolean values
129+
isOnly bool // true when set to "only"
130+
}
131+
132+
// IsEnabled returns true if the database analyzer should be used.
133+
// Returns true for both `true` and `"only"` settings.
134+
func (a AnalyzerDatabase) IsEnabled() bool {
135+
if a.isOnly {
136+
return true
137+
}
138+
return a.value == nil || *a.value
139+
}
140+
141+
// IsOnly returns true if the analyzer is set to "only" mode.
142+
func (a AnalyzerDatabase) IsOnly() bool {
143+
return a.isOnly
144+
}
145+
146+
func (a *AnalyzerDatabase) UnmarshalJSON(data []byte) error {
147+
// Try to unmarshal as boolean first
148+
var b bool
149+
if err := json.Unmarshal(data, &b); err == nil {
150+
a.value = &b
151+
a.isOnly = false
152+
return nil
153+
}
154+
155+
// Try to unmarshal as string
156+
var s string
157+
if err := json.Unmarshal(data, &s); err == nil {
158+
if s == "only" {
159+
a.isOnly = true
160+
a.value = nil
161+
return nil
162+
}
163+
return errors.New("analyzer.database must be true, false, or \"only\"")
164+
}
165+
166+
return errors.New("analyzer.database must be true, false, or \"only\"")
167+
}
168+
169+
func (a *AnalyzerDatabase) UnmarshalYAML(unmarshal func(interface{}) error) error {
170+
// Try to unmarshal as boolean first
171+
var b bool
172+
if err := unmarshal(&b); err == nil {
173+
a.value = &b
174+
a.isOnly = false
175+
return nil
176+
}
177+
178+
// Try to unmarshal as string
179+
var s string
180+
if err := unmarshal(&s); err == nil {
181+
if s == "only" {
182+
a.isOnly = true
183+
a.value = nil
184+
return nil
185+
}
186+
return errors.New("analyzer.database must be true, false, or \"only\"")
187+
}
188+
189+
return errors.New("analyzer.database must be true, false, or \"only\"")
190+
}
191+
125192
type Analyzer struct {
126-
Database *bool `json:"database" yaml:"database"`
193+
Database AnalyzerDatabase `json:"database" yaml:"database"`
127194
}
128195

129196
// TODO: Figure out a better name for this

internal/config/v_one.json

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,10 @@
7979
"type": "object",
8080
"properties": {
8181
"database": {
82-
"type": "boolean"
82+
"oneOf": [
83+
{"type": "boolean"},
84+
{"const": "only"}
85+
]
8386
}
8487
}
8588
},

internal/config/v_two.json

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,10 @@
8282
"type": "object",
8383
"properties": {
8484
"database": {
85-
"type": "boolean"
85+
"oneOf": [
86+
{"type": "boolean"},
87+
{"const": "only"}
88+
]
8689
}
8790
}
8891
},

internal/endtoend/endtoend_test.go

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -263,8 +263,9 @@ func TestReplay(t *testing.T) {
263263

264264
opts := cmd.Options{
265265
Env: cmd.Env{
266-
Debug: opts.DebugFromString(args.Env["SQLCDEBUG"]),
267-
NoRemote: true,
266+
Debug: opts.DebugFromString(args.Env["SQLCDEBUG"]),
267+
Experiment: opts.ExperimentFromString(args.Env["SQLCEXPERIMENT"]),
268+
NoRemote: true,
268269
},
269270
Stderr: &stderr,
270271
MutateConfig: testctx.Mutate(t, path),
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"contexts": ["managed-db"],
3+
"env": {
4+
"SQLCEXPERIMENT": "analyzerv2"
5+
}
6+
}

0 commit comments

Comments
 (0)