Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to complete 1D array IR generation support across the frontend AST/parser and the IR codegen pipeline.
Changes:
- Added array-related AST/type support (e.g.,
ArrayDefAST,ArrayType, indexedLValAST). - Extended the SysY grammar to parse 1D array definitions, initializer lists, and indexed lvalues.
- Implemented/updated IR codegen for array allocation/initialization and indexed load/store; adjusted test input to exercise
getarray(arr).
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| project/tests/hello.c | Updates test program to declare an array and call getarray(arr). |
| project/src/main.cpp | Generates RISC-V backend output only when -riscv is selected. |
| project/src/ir/codegen.cpp | Adds array definition codegen and indexed lvalue load/store support. |
| project/src/ir/ast.cpp | Adds dump support for new AST node types and renames scalar def dump. |
| project/src/frontend/sysy.y | Extends grammar for array defs, init lists, and indexed LVal. |
| project/include/ir/type.cppm | Adds ArrayType and Type::is_array() support. |
| project/include/ir/ast.cppm | Adds ArrayDefAST, ScalarDefAST, and indexed LValAST representation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
project/src/ir/codegen.cpp
Outdated
| */ | ||
| auto LValAST::CalcValue(ir::KoopaBuilder &builder) const -> int { | ||
| auto sym = builder.symtab().lookup(ident); | ||
| Log::trace(ident); |
There was a problem hiding this comment.
Log::trace(ident); in LValAST::CalcValue looks like leftover debug logging and will spam output during constant folding. Remove it or guard it behind an explicit debug flag if you need the trace.
| Log::trace(ident); |
project/src/ir/codegen.cpp
Outdated
| //! will consistently use ident without the prefix. | ||
| if (ssize(initialize_list) == 0) { | ||
| builder.append( | ||
| fmt::format("global @{} = alloc [i32, {}], zeroinit\n", ident, len)); |
There was a problem hiding this comment.
In the global-scope array definition, the symbol table is only updated in the initializer-list branch. When initialize_list is empty you emit the global @... line but never defineGlobal(...), so later references to this global array will be reported as undefined. Define the symbol in the symbol table in both branches (including the zeroinit case).
| fmt::format("global @{} = alloc [i32, {}], zeroinit\n", ident, len)); | |
| fmt::format("global @{} = alloc [i32, {}], zeroinit\n", ident, len)); | |
| builder.symtab().defineGlobal( | |
| ident, "@" + ident, type::ArrayType::get(type::IntType::get(), len), | |
| SymbolKind::Var, is_const); |
project/src/ir/codegen.cpp
Outdated
| for (const auto &[i, elem] : std::views::enumerate(initialize_list)) { | ||
| builder.append(fmt::format("{}", elem->CalcValue(builder))); | ||
| if (i < len - 1) { | ||
| builder.append(", "); | ||
| } | ||
| } |
There was a problem hiding this comment.
initialize_list is iterated without checking it fits the declared array length (len). If the initializer has more elements than len, the emitted IR will be malformed (missing separators) and semantically incorrect. Add a semantic check (panic) when initialize_list.size() > len, or clamp/ignore extras consistently with the language spec.
project/src/ir/codegen.cpp
Outdated
| if (lval->index) { | ||
| auto tmp_reg = builder.newReg(); | ||
| builder.append(fmt::format(" {} = getelemptr {}, {}\n", tmp_reg, | ||
| sym->irName, lval->index->codeGen(builder))); | ||
| builder.append(fmt::format(" store {}, {}\n", val_reg, tmp_reg)); | ||
| } else { | ||
| builder.append(fmt::format(" store {}, {}\n", val_reg, sym->irName)); | ||
| } |
There was a problem hiding this comment.
getelemptr is emitted whenever lval->index is present, but there is no type check that the LHS symbol is actually an array. As written, code like int x; x[0]=1; would generate invalid IR rather than a semantic error. Add a check (e.g., sym->type->is_array()) and fail fast if indexing a non-array.
project/src/ir/codegen.cpp
Outdated
| reg = builder.newReg(); | ||
|
|
||
| builder.append(fmt::format(" {} = load {}\n", reg, old_reg)); | ||
| } else { |
There was a problem hiding this comment.
LValAST::codeGen always emits a load for non-const lvalues without an index. For array variables used as expressions (e.g. passing arr to getarray(arr) / putarray), this should produce the address (pointer to first element), not load the whole array. Add special handling for sym->type->is_array() when index == nullptr to return an element pointer (e.g. getelemptr sym->irName, 0) or otherwise pass the correct pointer value.
| } else { | |
| } else { | |
| // For array variables used as expressions (e.g. function arguments), | |
| // we should pass a pointer to the first element instead of loading | |
| // the whole array value. | |
| if (sym->type->is_array()) { | |
| std::string elem_ptr_reg = builder.newReg(); | |
| builder.append(fmt::format(" {} = getelemptr {}, 0\n", elem_ptr_reg, | |
| sym->irName)); | |
| return elem_ptr_reg; | |
| } |
project/src/frontend/sysy.y
Outdated
| $$->push_back(std::unique_ptr<DefAST>(static_cast<ScalarDefAST *>($1))); | ||
| }; | ||
| | VarDefList ',' VarDef { | ||
| $$ = $1; | ||
| $$->push_back(std::unique_ptr<DefAST>(static_cast<DefAST *>($3))); | ||
| $$->push_back(std::unique_ptr<DefAST>(static_cast<ScalarDefAST *>($3))); |
There was a problem hiding this comment.
VarDefList also assumes every VarDef is a ScalarDefAST via static_cast<ScalarDefAST *>($1) / static_cast<ScalarDefAST *>($3), but VarDef now has array variants that construct ArrayDefAST, so when those branches are used this performs an invalid downcast from BaseAST* to ScalarDefAST* and then wraps it in unique_ptr<DefAST>, leading to undefined behavior and potential heap corruption when the pointer is deleted. A malicious SysY input that declares non-const arrays (with or without initializers) will hit this path and can crash or corrupt the compiler process. The fix is to treat all VarDef nodes polymorphically as DefAST instances instead of casting them to ScalarDefAST.
project/src/frontend/sysy.y
Outdated
| $$->push_back(std::unique_ptr<DefAST>(static_cast<ScalarDefAST *>($1))); | ||
| } | ||
| | ConstDefList ',' ConstDef { | ||
| $$ = $1; | ||
| $$->push_back(std::unique_ptr<DefAST>(static_cast<DefAST *>($3))); | ||
| $$->push_back(std::unique_ptr<DefAST>(static_cast<ScalarDefAST *>($3))); |
There was a problem hiding this comment.
ConstDefList assumes every ConstDef is a ScalarDefAST by doing static_cast<ScalarDefAST *>($1) / static_cast<ScalarDefAST *>($3), but ConstDef now has an array branch that creates ArrayDefAST, so these casts can downcast a BaseAST* that actually points to an ArrayDefAST to ScalarDefAST*, causing undefined behavior and heap corruption when the resulting unique_ptr<DefAST> is destroyed. An attacker-controlled SysY source file that declares a const array (e.g., const int a[...];) will drive this code path and can trigger memory corruption in the compiler process. To avoid this, store ConstDef results polymorphically as DefAST (or use a safe checked cast) rather than forcing them through ScalarDefAST.
- Removed unused count_name variable in ir_builder.cppm. - Cleaned up formatting in symbol_table.cppm for better readability. - Changed debug() method to toKoopa() in type.cppm and its derived classes for consistency in type representation. - Updated grammar rules in sysy.y to support array initialization and improved handling of parameter arrays. - Enhanced AST code generation in ast.cpp and codegen.cpp to support new initialization semantics and improved memory management for arrays. - Added flattening mechanism for initializer lists in InitValStmtAST to handle complex array initializations. - Improved error handling and semantic checks throughout the codebase.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 36 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
project/src/frontend/sysy.y
Outdated
| delete $3; | ||
| } | ||
| | CONST Btype IDENT ParamArraySuffix { | ||
| $$ = new FuncParamAST(std::move(*$3), std::move(*$3), true, true, std::move(*$4)); |
There was a problem hiding this comment.
In the CONST Btype IDENT ParamArraySuffix production, the FuncParamAST constructor is given std::move(*$3) for both the parameter type and identifier. This makes the parameter's btype incorrect (it becomes the identifier), and it also drops the actual Btype value (you delete $2 without moving it into the AST). Use $2 for the type and $3 for the identifier here (matching the non-const case).
| $$ = new FuncParamAST(std::move(*$3), std::move(*$3), true, true, std::move(*$4)); | |
| $$ = new FuncParamAST(std::move(*$2), std::move(*$3), true, true, std::move(*$4)); |
project/src/ir/codegen.cpp
Outdated
| -> std::string { | ||
| //! No node should call `InitValStmtAST`'s `codeGen` function, as it is only | ||
| //! responsible for expanding the initialize list, not for generating code. | ||
| Log::panic(""); |
There was a problem hiding this comment.
InitValStmtAST::codeGen calls Log::panic("") with an empty message. If this is ever hit (even due to a bug elsewhere), it will be very hard to diagnose. Please provide a clear error message indicating that InitValStmtAST::codeGen is not supposed to be invoked (and ideally include node/context info).
| Log::panic(""); | |
| Log::panic("InitValStmtAST::codeGen was called unexpectedly. This node is " | |
| "only used to expand initializer lists and must not generate " | |
| "IR directly."); |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 42 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto src = fmt::format("a{}", i); | ||
| emitSw(buffer, src, "sp", offset); | ||
| } else { | ||
| // Parameters passed on stack by caller are located ABOVE the current SP. |
There was a problem hiding this comment.
The handling of function parameters passed on the stack (lines 336-340) assumes all parameters are 4 bytes in size when calculating the offset. However, array parameters should be pointers (which are 4 bytes), so this is actually correct, but the code comment should clarify this assumption. If pointer parameters could be of variable size in the future, this would need to be updated.
| // Parameters passed on stack by caller are located ABOVE the current SP. | |
| // Parameters passed on stack by caller are located ABOVE the current SP. | |
| // | |
| // IMPORTANT ABI ASSUMPTION: | |
| // - We assume each argument passed on the stack occupies exactly 4 bytes. | |
| // - This matches 32-bit integers and pointers on RV32; array parameters | |
| // are passed as pointers, which are also 4 bytes under this target. | |
| // - If we ever support wider types or a target where pointers are not | |
| // 4 bytes, this offset calculation (and related stack layout logic) | |
| // must be updated accordingly. |
|
|
||
| if (cur_type->is_array()) { | ||
| cur_type = std::static_pointer_cast<type::ArrayType>(cur_type)->base; | ||
| } |
There was a problem hiding this comment.
Bug: When taking the first branch (i == 0 && sym->type->is_ptr()) and using getptr, the type is not updated because the type update code (lines 818-820) is inside the else block. This means cur_type will be incorrect for subsequent iterations. The type update should be moved outside the if-else block, similar to how it's done in AssignStmtAST::codeGen (line 591-593). This could lead to incorrect type tracking when accessing multi-dimensional array parameters.
| if (cur_type->is_array()) { | |
| cur_type = std::static_pointer_cast<type::ArrayType>(cur_type)->base; | |
| } | |
| } | |
| if (cur_type->is_array()) { | |
| cur_type = std::static_pointer_cast<type::ArrayType>(cur_type)->base; |
| Log::panic(fmt::format("Semantic Error: Excess elements in array " | ||
| "initializer (expected {}, got {}).", | ||
| arr_type->base->toKoopa(), | ||
| ssize(node->initialize_list))); |
There was a problem hiding this comment.
Confusing error message: The error message at lines 518-521 says "expected {}, got {}" where the first placeholder is filled with the type string (e.g., "i32") rather than the expected count. This will produce messages like "expected i32, got 5" instead of something more helpful like "expected 4 elements, got 5". Consider changing to show the actual capacity of the sub-type.
| Log::panic(fmt::format("Semantic Error: Excess elements in array " | |
| "initializer (expected {}, got {}).", | |
| arr_type->base->toKoopa(), | |
| ssize(node->initialize_list))); | |
| Log::panic(fmt::format( | |
| "Semantic Error: Excess elements in array initializer for " | |
| "type {} (got {} elements).", | |
| arr_type->base->toKoopa(), | |
| ssize(node->initialize_list))); |
| auto is_ptr() const -> bool override { return true; } | ||
|
|
||
| static auto get(std::shared_ptr<Type> t) -> std::shared_ptr<PtrType> { | ||
| return std::make_shared<PtrType>(t); |
There was a problem hiding this comment.
Inconsistency: ArrayType::get() uses caching to ensure that identical array types share the same instance (lines 138-146), but PtrType::get() (line 119-121) creates a new instance every time. For consistency and to avoid potential equality comparison issues, consider implementing the same caching pattern for PtrType. This would also reduce memory usage when the same pointer type is used multiple times.
| return std::make_shared<PtrType>(t); | |
| static std::map<std::shared_ptr<Type>, std::shared_ptr<PtrType>> cache; | |
| if (cache.contains(t)) { | |
| return cache[t]; | |
| } | |
| auto instance = std::make_shared<PtrType>(t); | |
| return cache[t] = instance; |
| outputs: | ||
| artifact_name: compiler-binary | ||
| steps: | ||
| - name: Check out repository code | ||
| uses: actions/checkout@v5 | ||
| - name: test koopa | ||
| - uses: actions/checkout@v5 | ||
|
|
||
| - name: Install Ninja | ||
| run: | | ||
| autotest -w debug -koopa -s lv1 . | ||
| autotest -w debug -koopa -s lv3 . | ||
| autotest -w debug -koopa -s lv4 . | ||
| autotest -w debug -koopa -s lv5 . | ||
| autotest -w debug -koopa -s lv6 . | ||
| autotest -w debug -koopa -s lv7 . | ||
| autotest -w debug -koopa -s lv8 . | ||
| - name: test riscv | ||
| apt-get update && apt-get install -y ninja-build | ||
| ninja --version | ||
|
|
||
| - name: Build Binary | ||
| run: make | ||
|
|
||
| test_compiler: | ||
| needs: build_compiler | ||
| runs-on: ubuntu-latest | ||
| container: | ||
| image: maxxing/compiler-dev | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| test_type: [koopa, riscv] | ||
| defaults: | ||
| run: | ||
| working-directory: ./project | ||
| steps: | ||
| - uses: actions/checkout@v5 | ||
|
|
||
| - name: Install Ninja | ||
| run: | | ||
| autotest -w debug -riscv -s lv1 . | ||
| autotest -w debug -riscv -s lv3 . | ||
| autotest -w debug -riscv -s lv4 . | ||
| autotest -w debug -riscv -s lv5 . | ||
| autotest -w debug -riscv -s lv6 . | ||
| autotest -w debug -riscv -s lv7 . | ||
| apt-get update && apt-get install -y ninja-build | ||
| ninja --version | ||
|
|
||
| - name: Run Test | ||
| run: autotest -w debug -${{ matrix.test_type }} . |
There was a problem hiding this comment.
The test_compiler job depends on build_compiler but doesn't actually use the built artifact. The build job declares an output artifact_name: compiler-binary but never uploads the artifact, and the test job doesn't download it. This means the compiler will be rebuilt in the test job. Either remove the dependency and outputs declaration from build_compiler, or implement artifact upload/download to avoid rebuilding in the test job.
| std::shared_ptr<type::Type> param_type; | ||
| if (is_ptr) { | ||
| std::shared_ptr<type::Type> base_type = type::IntType::get(); | ||
| for (const auto &dim : | ||
| indices | reverse | filter([](auto &p) { return p != nullptr; })) { | ||
| base_type = type::ArrayType::get(base_type, dim->CalcValue(builder)); | ||
| } | ||
| param_type = type::PtrType::get(base_type); | ||
| } else { | ||
| std::string addr = builder.newReg(); | ||
| builder.append(fmt::format(" {} = alloc i32\n", addr)); | ||
| builder.symtab().define(ident, addr, type::IntType::get(), SymbolKind::Var, | ||
| false); | ||
| builder.append(fmt::format(" store @{}, {}\n", ident, addr)); | ||
| param_type = type::IntType::get(); | ||
| } |
There was a problem hiding this comment.
Code duplication detected: The type construction logic in FuncParamAST::codeGen (lines 79-89) and FuncParamAST::toKoopa (lines 104-115) is identical. Consider extracting this into a private helper method getParamType() to avoid duplication and make the code more maintainable.
No description provided.