Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
8a76e4e
deploy github pages by action
tanzaku Dec 1, 2024
d4fb317
make patch
tanzaku Oct 14, 2024
b7e7d68
update generated files
tanzaku Oct 14, 2024
67b8654
original scan.l
tanzaku Oct 14, 2024
2019b2d
update Makefile
tanzaku Oct 14, 2024
a1ca156
update scan.l
tanzaku Oct 14, 2024
118a1c6
update
tanzaku Oct 14, 2024
d2216b8
fix: comment
tanzaku Oct 25, 2024
f56055c
fix: makefile
tanzaku Mar 15, 2025
2ea7246
Merge pull request #10 from tanzaku/refactor
tanzaku Mar 15, 2025
eb3a921
fix: broken github actions
tanzaku Mar 15, 2025
efd1e0d
update: support PostgreSQL v17 (#16)
tanzaku Mar 15, 2025
9b977b2
Fix some bugs (#17)
tanzaku Mar 15, 2025
1fcb0ef
Resolve #13,#14,#18 (#19)
tanzaku Mar 16, 2025
e20a341
Refactor and porting scan.l almost completely (#20)
tanzaku Mar 16, 2025
1133f14
fix: testcase
tanzaku Mar 17, 2025
54a43d0
fix: package-lock.json
tanzaku Mar 17, 2025
011fa79
fix: update package
tanzaku Mar 17, 2025
addaafc
Update README (#22)
tanzaku Mar 18, 2025
607eef4
update: README
tanzaku Mar 18, 2025
a382874
update: Cargo.toml
tanzaku Mar 18, 2025
7e8802e
update: move ported code
tanzaku Mar 19, 2025
3f3699a
update: README.md
tanzaku Mar 19, 2025
f518395
update: description
tanzaku Mar 19, 2025
3886ecb
copy README.md
tanzaku Mar 19, 2025
c48a319
refactor
tanzaku Mar 19, 2025
8dfd5eb
update: README
tanzaku Mar 21, 2025
121625d
#23 improve performance (#24)
tanzaku Apr 2, 2025
056d474
chore: add CHANGELOG and modify version
tanzaku Apr 3, 2025
1428f6a
chore: exclude benches
tanzaku Apr 3, 2025
fa0c59c
build(deps-dev): bump vite from 6.2.2 to 6.2.5 in /demo (#28)
dependabot[bot] Apr 5, 2025
a91ee28
#29 fix clippy command in github actions and resolve clippy warning (…
tanzaku Apr 5, 2025
833986b
Merge remote-tracking branch 'upstream/main' into merge_upstream
tanzaku Jul 15, 2025
7fe41a4
fix: clippy
tanzaku Jul 15, 2025
ccf737e
fix: clippy
tanzaku Jul 15, 2025
0d314d2
fix: clippy
tanzaku Jul 15, 2025
637d352
fix: clippy
tanzaku Jul 15, 2025
2216fec
fix: clippy
tanzaku Jul 15, 2025
ed6eed4
fix: workflow
tanzaku Jul 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions .github/workflows/deploy-gh-pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# yaml-language-server: $schema=https://json.schemastore.org/github-workflow.json
name: Deploy GitHub Pages for branches

on:
push:
branches:
- main

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

jobs:
build:
if: github.repository == 'tanzaku/postgresql-cst-parser'
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "22"

- name: Install Rust toolchain
run: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y

- name: Add cargo to PATH
run: echo "$HOME/.cargo/bin" >> $GITHUB_PATH

- name: Install wasm-pack
run: cargo install wasm-pack

- name: Build
run: bash ./update-gh-page.sh

- name: Upload artifact
uses: actions/upload-pages-artifact@v3
with:
path: ./docs

deploy:
if: github.repository == 'tanzaku/postgresql-cst-parser'
needs: build
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Deploy to GitHub Pages
if: ${{ !env.ACT }}
id: deployment
uses: actions/deploy-pages@v4
32 changes: 20 additions & 12 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,33 @@ name: Rust

on:
push:
branches: [ "main" ]
branches: ["main"]
pull_request:
branches: [ "main" ]

env:
CARGO_TERM_COLOR: always

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Build
run: cargo build --verbose
- name: Run tests
run: cargo test --verbose
- name: clippy
uses: actions-rs/clippy-check@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}
- uses: actions/checkout@v4
- name: Build
run: cargo build --verbose
- name: Run tests
run: cargo test --verbose
- name: Run tests with features
run: cargo test --verbose --features regex-match
- name: Run benchmarks
run: cargo bench
- name: Run benchmarks with features
run: cargo bench --features regex-match
- name: Clippy (automata)
run: cargo clippy -p automata -- -D warnings
- name: Clippy (lexer-generator)
run: cargo clippy -p lexer-generator -- -D warnings
- name: Clippy (parser-generator)
run: cargo clippy -p parser-generator -- -D warnings
- name: Clippy (postgresql-cst-parser)
run: cargo clippy -p postgresql-cst-parser -- -D warnings
14 changes: 14 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# CHANGELOG

## [0.2.0] - 2025-04-04

### Improvements
- Significant performance enhancement

## [0.1.0] - 2025-03-21

### Added
- Initial release
- PostgreSQL 17 syntax support
- CST generation functionality following [gram.y](https://github.com/postgres/postgres/blob/REL_17_0/src/backend/parser/gram.y) structure
- Pure Rust implementation enabling WebAssembly compilation via wasm-bindgen
3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
resolver = "2"

members = [
"crates/automata",
"crates/lexer-generator",
"crates/parser-generator",
"crates/postgresql-cst-parser",
Expand All @@ -11,7 +12,7 @@ members = [
default-members = ["crates/postgresql-cst-parser"]

[workspace.package]
exclude = ["crates/lexer-generator", "crates/parser-generator", "crates/postgresql-cst-parser-wasm"]
exclude = ["crates/automata", "crates/lexer-generator", "crates/parser-generator", "crates/postgresql-cst-parser-wasm"]

[profile.release.package.postgresql-cst-parser-wasm]
opt-level = "s"
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) [year] [fullname]
Copyright (c) 2024 tanzaku

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
26 changes: 26 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
ROOT_DIR := $(dir $(realpath $(lastword $(MAKEFILE_LIST))))

.PHONY: clean prepare-source prepare-original-source make-patch

prepare-source: copy-from-original-source
patch -p1 < $(ROOT_DIR)/patches/patch_scan.patch

prepare-original-source: clean
mkdir -p ./tmp
cd ./tmp; git clone -b 17-6.0.0 --depth=1 [email protected]:pganalyze/libpg_query.git
sed -i 's/sed -i ""/sed -i/g' ./tmp/libpg_query/Makefile
cd ./tmp/libpg_query; make $(ROOT_DIR)tmp/libpg_query/tmp/postgres
mkdir -p ./crates/lexer-generator/resources
mkdir -p ./crates/parser-generator/resources

copy-from-original-source: prepare-original-source
cp $(ROOT_DIR)tmp/libpg_query/tmp/postgres/src/backend/parser/scan.l $(ROOT_DIR)crates/lexer-generator/resources
cp $(ROOT_DIR)tmp/libpg_query/tmp/postgres/src/include/parser/kwlist.h $(ROOT_DIR)crates/lexer-generator/resources
cp $(ROOT_DIR)tmp/libpg_query/tmp/postgres/src/backend/parser/parser.c $(ROOT_DIR)crates/lexer-generator/resources
cp $(ROOT_DIR)tmp/libpg_query/tmp/postgres/src/backend/parser/gram.y $(ROOT_DIR)crates/parser-generator/resources

make-patch: prepare-original-source
diff -u ./tmp/libpg_query/tmp/postgres/src/backend/parser/scan.l ./crates/lexer-generator/resources/scan.l > ./patches/patch_scan.patch || true

clean:
-@ rm -rf ./tmp
151 changes: 137 additions & 14 deletions README.ja.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,158 @@
# postgresql-cst-parser

[![Crates.io](https://img.shields.io/crates/v/postgresql-cst-parser.svg)](https://crates.io/crates/postgresql-cst-parser)

**注意:このパーサーはPostgreSQLの公式プロジェクトではなく、独立した非公式ツールです。**

## 概要

`postgresql-cst-parser`は、Pure Rust で開発された PostgreSQL 専用の Lossless Syntax Tree(CST)パーサーです。この文書では、パーサーの特徴、動機、使用方法、および実装の詳細について説明します。
`postgresql-cst-parser`は、Pure Rustで開発されたPostgreSQL専用の具象構文木(CST)パーサーです。このドキュメントでは、パーサーの機能、開発のモチベーション、使用方法、および実装の詳細について説明します。

## 主な特徴

- **自動生成された CST パーサー**: PostgreSQL の文法から自動的に生成されるため、幅広い文法に対応。
- **部分的な制限**: スキャナーの一部の実装が不完全であるため、全ての文法に対応しているわけではありません。
- **PostgreSQL 17対応**: 最新のPostgreSQL 17の構文をサポートしています。
- **構造化されたCST出力**: 生成されるCSTは、PostgreSQLの[gram.y](https://github.com/postgres/postgres/blob/REL_17_0/src/backend/parser/gram.y)ファイルで定義された構造に厳密に従います。
- **`cstree`の利用**: 構文木の構築に`cstree`クレートを使用しています。
- **wasm-bindgenとの併用**: Pure Rust で書かれているため、wasm-bindgen と併用できます。
- **PL/pgSQL**: 現在はサポートされていません。

## 開発の動機
## 開発のモチベーション

1. Rust から利用でき、広範な文法をサポートする PostgreSQL の CST パーサーが不足している。
2. [pg_query.rs](https://github.com/pganalyze/pg_query.rs)はとても素晴らしいライブラリだが、CST は構築できず WebAssembly(wasm)でビルドできない。
Rustから使用可能で、すべての構文をサポートし、(Pure Rust で書かれており) wasm-bindgen が利用可能なライブラリが必要だったため開発しました。

## 使用方法

以下のコード例のようにして使用します。
以下のように使用することができます:

```rust
let resolved_root = postgresql_cst_parser::parse("SELECT 1;");
dbg!(resolved_root);
use postgresql_cst_parser::{parse, syntax_kind::SyntaxKind};

fn main() {
// Parse SQL query and get the syntax tree
let sql = "SELECT tbl.a as a, tbl.b from TBL tbl WHERE tbl.a > 0;";
let root = parse(sql).unwrap();

// Example 1: Extract all column references from the query
let column_refs: Vec<String> = root
.descendants()
.filter(|node| node.kind() == SyntaxKind::columnref)
.map(|node| node.text().to_string())
.collect();

println!("Column references: {:?}", column_refs); // ["tbl.a", "tbl.b", "tbl.a"]

// Example 2: Find the WHERE condition
if let Some(where_clause) = root
.descendants()
.find(|node| node.kind() == SyntaxKind::where_clause)
{
println!("WHERE condition: {}", where_clause.text());
}

// Example 3: Get the selected table name
if let Some(relation_expr) = root
.descendants()
.find(|node| node.kind() == SyntaxKind::relation_expr)
{
if let Some(name_node) = relation_expr
.descendants()
.find(|node| node.kind() == SyntaxKind::ColId)
{
println!("Table name: {}", name_node.text());
}
}

// Example 4: Parse complex SQL and extract specific nodes
let complex_sql = "WITH data AS (SELECT id, value FROM source WHERE value > 10)
SELECT d.id, d.value, COUNT(*) OVER (PARTITION BY d.id)
FROM data d JOIN other o ON d.id = o.id
ORDER BY d.value DESC LIMIT 10;";

let complex_root = parse(complex_sql).unwrap();

// Extract CTEs (Common Table Expressions)
let ctes: Vec<_> = complex_root
.descendants()
.filter(|node| node.kind() == SyntaxKind::common_table_expr)
.collect();

// Extract window functions
let window_funcs: Vec<_> = complex_root
.descendants()
.filter(|node| node.kind() == SyntaxKind::over_clause)
.collect();

println!("Number of CTEs: {}", ctes.len());
println!("Number of window functions: {}", window_funcs.len());
}
```

生成される構文木の例:

```sql
SELECT tbl.a as a from TBL tbl;
```

さらに、このパーサーを実際に体験してみたい場合は、[こちら](https://tanzaku.github.io/postgresql-cst-parser/)でオンラインで直接試すことができます。実際のコードを入力し、パーサーがどのように動作するかを確認してみましょう。
```
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected] "SELECT"
[email protected] " "
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected] "tbl"
[email protected]
[email protected]
[email protected] "."
[email protected]
[email protected]
[email protected] "a"
[email protected] " "
[email protected] "as"
[email protected] " "
[email protected]
[email protected] "a"
[email protected] " "
[email protected]
[email protected] "from"
[email protected] " "
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected] "TBL"
[email protected] " "
[email protected]
[email protected]
[email protected]
[email protected] "tbl"
[email protected] ";"
```

## オンラインデモ

[こちら](https://tanzaku.github.io/postgresql-cst-parser/)でパーサーを直接試すことができます。SQLクエリを入力して、生成された構文木をリアルタイムで確認できます。

## 実装方法
## 実装

実装には、PostgreSQL の [scan.l](https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/parser/scan.l)[gram.y](https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/parser/gram.y) に対する [libpg_query の patch](https://github.com/pganalyze/libpg_query/tree/16-latest/patches)を適用したものを使用しています。`scan.l`Rust 用に書き換えられ、`scan.l``gram.y` を基にして構文解析表を作成し、パーサーを構築しています。
この実装は、PostgreSQLの[scan.l](https://github.com/postgres/postgres/blob/REL_17_0/src/backend/parser/scan.l)[gram.y](https://github.com/postgres/postgres/blob/REL_17_0/src/backend/parser/gram.y)に対して[libpg_query](https://github.com/pganalyze/libpg_query/tree/17-6.0.0/patches)のパッチを適用したファイルを使用しています。`scan.l`はさらに Rust 用に書き直したうえで、`scan.l``gram.y`に基づいて構文解析テーブルを作成し、パーサーを構築しています。

## ライセンス

`kwlist.h`, `scan.l`, `gram.y` は PostgreSQL License です。
その他のファイルは MIT License の下で公開されています。
- `kwlist.h`、`parser.c`、`scan.l`、`gram.y`はPostgreSQLライセンスの下にあります。
- `lexer_ported.rs`と`generated.rs`はPostgreSQLから移植されたコードを含むため、移植部分はPostgreSQLライセンスの下にあります。
- このプロジェクトでは、`scan.l`、`gram.y`に対して[libpg_query](https://github.com/pganalyze/libpg_query)のパッチを当てていますが、パッチそのものはこのリポジトリには含まれていません。
- その他のファイルはMITライセンスの下で公開されています。
Loading