A comprehensive recursive descent parser for a C-style programming language, built in C++. This parser works in conjunction with a custom lexical analyzer to parse source code and generate an Abstract Syntax Tree (AST).
- Data Types:
int
,float
,double
,char
,bool
,void
- Type Qualifiers:
const
- Control Flow:
if/else
,while
,do-while
,for
,switch/case/default
- Flow Control:
break
,continue
,return
- Functions: Function declarations with parameters and return types
- Variables: Variable declarations with optional initialization
- Preprocessor:
#include
,#define
(with macro support) - Enumerations:
enum
declarations
- Arithmetic:
+
,-
,*
,/
,%
(including modulo) - Comparison:
==
,!=
,<
,>
,<=
,>=
- Logical:
&&
,||
,!
- Bitwise:
&
,|
,^
,~
,<<
,>>
- Assignment:
=
- Increment/Decrement:
++
,--
(both prefix and postfix) - Unary:
+
,-
,!
,~
- Operator Precedence: Proper handling of operator precedence and associativity
- Complex Expressions: Support for nested and complex expressions
- Function Calls: Function invocation with argument lists
- Nested Control Flow: Support for deeply nested control structures
- Type Combinations: Simple type specifiers like
const int
- Macro Definitions: Both object-like and function-like macros
The parser uses a recursive descent parsing strategy with the following key components:
- LexerState Integration: Works directly with the lexer's token stream
- AST Generation: Builds a comprehensive Abstract Syntax Tree
- Error Handling: Robust error reporting with line and column information
- Memory Management: Uses smart pointers for automatic memory management
// Declarations
- FnDecl (Function Declaration)
- VarDecl (Variable Declaration)
- EnumDecl (Enumeration Declaration)
// Statements
- IfStmt, WhileStmt, DoWhileStmt, ForStmt, SwitchStmt
- ReturnStmt, BreakStmt, ContinueStmt
- AssignStmt, ExprStmt, CaseStmt
// Expressions
- BinaryExpr, UnaryExpr, CallExpr
- LiteralExpr, IdentifierExpr
// Preprocessor
- IncludeDirective, DefineDirective
The parser follows a clear hierarchy of parsing methods:
declaration() → statement() → expression()
↓ ↓ ↓
fnDecl() ifStmt() assignment()
varDecl() whileStmt() logicalOr()
enumDecl() forStmt() logicalAnd()
... bitwiseOr()
...
primary()
g++ -o parser parser.cpp
The parser includes comprehensive tests covering all language features:
./parser
#include "parser.cpp" // Includes lexer.cpp automatically
// Create lexer state
LexerState state = createLexerState(sourceCode);
// Create parser and parse
Parser parser(state);
vector<unique_ptr<AstNode>> ast = parser.parse();
// Process AST nodes
for (const auto& node : ast) {
cout << node->toString() << endl;
}
// Basic types
int x = 10;
float pi = 3.14;
bool flag = true;
// With const qualifier
const int constantVar = 42;
const double value = 3.14159;
// Simple function
int add(int a, int b) {
return a + b;
}
// Conditional
if (x > 0) {
processPositive(x);
} else {
processNegative(x);
}
// Loops
while (running) {
update();
}
for (int i = 0; i < count; i++) {
if (i % 2 == 0) continue;
processOdd(i);
}
// Switch statement
switch (option) {
case 1:
handleOption1();
break;
case 2:
handleOption2();
break;
default:
handleDefault();
break;
}
// Arithmetic with proper precedence
int result = (a + b) * c - d / 2 % 3;
// Logical operations
bool condition = (x > 5) && (y < 10) || !flag;
// Bitwise operations
int bits = a & b | c ^ d;
int shifted = value << 2 >> 1;
int negated = ~mask;
// Complex expressions
bool complex = ((a & 2) << 8) | (b >> 4) && !(c % 2) || (d >= e);
// Include directives
#include <stdio.h>
#include "myheader.h"
// Macro definitions
#define PI 3.14159
#define MAX_SIZE 100
#define SQUARE(x) ((x) * (x))
#define MIN(a, b) ((a) < (b))
enum Color {
RED,
GREEN,
BLUE
};
enum Status {
ACTIVE,
INACTIVE,
PENDING
};
The parser generates human-readable AST representations. For example:
Source Code:
int factorial(int n) {
if (n <= 1) {
return 1;
}
return n * factorial(n - 1);
}
The parser provides detailed error messages with location information:
Parse Error: Expected ';' after variable declaration at line 5, column 12
Parse Error: Expected ')' after condition at line 8, column 15
Parse Error: Unexpected token at line 10, column 3
The parser includes comprehensive test suites covering:
- Arithmetic operators with modulo
- Logical operators
- Bitwise operators
- While loops
- Do-while loops
- Switch-case statements
- Continue statements
- Type qualifiers (const)
- Simple type combinations
- Functions with simple const types
- Nested control flow
- Complex expressions
- All unary operators
- Include directives
- Define directives
- Mixed preprocessor and code
- Enum declarations
- Error handling and edge cases
- Lexer: Requires
lexer.cpp
for tokenization - C++ Standard: C++11 or later (uses smart pointers, auto, etc.)
- Compiler: GCC, Clang, or any C++11 compatible compiler
- Memory Efficient: Streams tokens one at a time instead of loading all into memory
- Modular Design: Clear separation between lexing and parsing phases
- Extensible: Easy to add new language constructs
- Robust: Comprehensive error handling and recovery
- Fast: Efficient recursive descent parsing
- Standard Compliant: Follows established C language grammar patterns
This parser is part of a custom compiler project and serves as the syntax analysis phase of the compilation pipeline.