You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add documentation for Tree-sitter-based language modules
Updated the language module documentation to clarify core components and added sections for ANTLR and Tree-sitter parsing technologies. Included detailed examples for setting up language modules with Tree-sitter, emphasizing its implementation specifics.
@@ -73,8 +73,14 @@ A language module consists of these parts:
73
73
| TokenType class |`de.jplag.TokenType`| contains the language-specific token types |**implement new**|
74
74
||||
75
75
| Lexer and Parser | - | transform code into AST | depends on technology |
76
-
| ParserAdapter class |`de.jplag.AbstractParser`| sets up Parser and calls Traverser | depends on technology |
77
-
| Traverser/<br>TraverserListener classes | - | creates tokens traversing the AST | depends on technology |
76
+
| ParserAdapter class | varies by technology | sets up Parser and calls Traverser | depends on technology |
77
+
| Traverser/<br>TraverserListener classes | varies by technology | creates tokens traversing the AST | depends on technology |
78
+
79
+
## Parser Technology Options
80
+
81
+
JPlag supports two main parsing technologies for implementing language modules:
82
+
83
+
### ANTLR-based Language Modules
78
84
79
85
For example, if ANTLR is used, the setup is as follows:
80
86
@@ -95,6 +101,28 @@ As the table shows, much of a language module can be reused, especially when usi
95
101
- It should still be rather easy to implement the ParserAdapter from the library documentation.
96
102
- Instead of using a listener pattern, the library may require you to do the token extraction in a _Visitor subclass_. In that case, there is only one method call per element, called e.g. `traverseClassDeclaration`. The advantage of this version is that the traversal of the subtrees can be controlled freely. See the Scala language module for an example.
97
103
104
+
105
+
### Tree-sitter-based Language Modules
106
+
107
+
Tree-sitter provides an alternative parsing approach with native performance and community-maintained grammars. For Tree-sitter-based modules, the setup is as follows:
108
+
109
+
| Tree-sitter specific parts/files | Superclass/Interface | Function | How to get there |
| Native grammar libraries | - | Platform-specific parsing libraries | Built from Tree-sitter grammar repositories |
112
+
| Tree-sitter Language class |`de.jplag.treesitter.TreeSitterLanguage`| Loads native grammar |**implement new**|
113
+
| Parser class |`de.jplag.treesitter.AbstractTreeSitterParser`| Sets up parser and calls visitor | copy with small adjustments |
114
+
| TokenCollector class |`de.jplag.treesitter.TreeSitterVisitor`| Extracts tokens from AST using visitor pattern |**implement new**|
115
+
116
+
As with ANTLR modules, much can be reused. The parts to implement specifically for each Tree-sitter language module are:
117
+
- the Tree-sitter Language class (grammar loader)
118
+
- the TokenCollector (visitor implementation), and
119
+
- the TokenTypes.
120
+
121
+
**Note** about Tree-sitter advantages:
122
+
- Native parsing performance with robust error recovery
123
+
- Community-maintained grammars that stay current with language evolution
124
+
- Cross-platform native library distribution
125
+
98
126
## Setting up a new language module with ANTLR
99
127
100
128
JPlag provides a small framework to make it easier to implement language modules with ANTLR
@@ -218,6 +246,127 @@ Additional features for rules:
218
246
2. Semantics - Can be passed by using withSemantics after the map call (see CPP language module for examples)
219
247
3. Delegate - To have more precise control over the token position and length a delegated visitor can be used (see Go language module for examples)
220
248
249
+
## Setting up a new language module with Tree-sitter
250
+
251
+
JPlag provides a comprehensive framework for implementing Tree-sitter-based language modules with built-in handler infrastructure and native library management.
252
+
253
+
### Create the Tree-sitter Language class
254
+
255
+
Create a class that extends `TreeSitterLanguage` to load your language's native grammar:
The handler-based approach eliminates boilerplate code and provides a clean, declarative way to map Tree-sitter node types to JPlag tokens. The abstract base class handles traversal and token collection automatically.
0 commit comments