Skip to content

Array support#7

Merged
anfsity merged 18 commits intomainfrom
array-support
Feb 12, 2026
Merged

Array support#7
anfsity merged 18 commits intomainfrom
array-support

Conversation

@anfsity
Copy link
Owner

@anfsity anfsity commented Jan 29, 2026

No description provided.

@anfsity anfsity requested a review from Copilot January 29, 2026 07:18
@anfsity anfsity self-assigned this Jan 29, 2026
@anfsity anfsity linked an issue Jan 29, 2026 that may be closed by this pull request
@anfsity anfsity added the enhancement New feature or request label Jan 29, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to complete 1D array IR generation support across the frontend AST/parser and the IR codegen pipeline.

Changes:

  • Added array-related AST/type support (e.g., ArrayDefAST, ArrayType, indexed LValAST).
  • Extended the SysY grammar to parse 1D array definitions, initializer lists, and indexed lvalues.
  • Implemented/updated IR codegen for array allocation/initialization and indexed load/store; adjusted test input to exercise getarray(arr).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
project/tests/hello.c Updates test program to declare an array and call getarray(arr).
project/src/main.cpp Generates RISC-V backend output only when -riscv is selected.
project/src/ir/codegen.cpp Adds array definition codegen and indexed lvalue load/store support.
project/src/ir/ast.cpp Adds dump support for new AST node types and renames scalar def dump.
project/src/frontend/sysy.y Extends grammar for array defs, init lists, and indexed LVal.
project/include/ir/type.cppm Adds ArrayType and Type::is_array() support.
project/include/ir/ast.cppm Adds ArrayDefAST, ScalarDefAST, and indexed LValAST representation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

*/
auto LValAST::CalcValue(ir::KoopaBuilder &builder) const -> int {
auto sym = builder.symtab().lookup(ident);
Log::trace(ident);
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log::trace(ident); in LValAST::CalcValue looks like leftover debug logging and will spam output during constant folding. Remove it or guard it behind an explicit debug flag if you need the trace.

Suggested change
Log::trace(ident);

Copilot uses AI. Check for mistakes.
//! will consistently use ident without the prefix.
if (ssize(initialize_list) == 0) {
builder.append(
fmt::format("global @{} = alloc [i32, {}], zeroinit\n", ident, len));
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the global-scope array definition, the symbol table is only updated in the initializer-list branch. When initialize_list is empty you emit the global @... line but never defineGlobal(...), so later references to this global array will be reported as undefined. Define the symbol in the symbol table in both branches (including the zeroinit case).

Suggested change
fmt::format("global @{} = alloc [i32, {}], zeroinit\n", ident, len));
fmt::format("global @{} = alloc [i32, {}], zeroinit\n", ident, len));
builder.symtab().defineGlobal(
ident, "@" + ident, type::ArrayType::get(type::IntType::get(), len),
SymbolKind::Var, is_const);

Copilot uses AI. Check for mistakes.
Comment on lines +176 to +181
for (const auto &[i, elem] : std::views::enumerate(initialize_list)) {
builder.append(fmt::format("{}", elem->CalcValue(builder)));
if (i < len - 1) {
builder.append(", ");
}
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initialize_list is iterated without checking it fits the declared array length (len). If the initializer has more elements than len, the emitted IR will be malformed (missing separators) and semantically incorrect. Add a semantic check (panic) when initialize_list.size() > len, or clamp/ignore extras consistently with the language spec.

Copilot uses AI. Check for mistakes.
Comment on lines +347 to +354
if (lval->index) {
auto tmp_reg = builder.newReg();
builder.append(fmt::format(" {} = getelemptr {}, {}\n", tmp_reg,
sym->irName, lval->index->codeGen(builder)));
builder.append(fmt::format(" store {}, {}\n", val_reg, tmp_reg));
} else {
builder.append(fmt::format(" store {}, {}\n", val_reg, sym->irName));
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getelemptr is emitted whenever lval->index is present, but there is no type check that the LHS symbol is actually an array. As written, code like int x; x[0]=1; would generate invalid IR rather than a semantic error. Add a check (e.g., sym->type->is_array()) and fail fast if indexing a non-array.

Copilot uses AI. Check for mistakes.
reg = builder.newReg();

builder.append(fmt::format(" {} = load {}\n", reg, old_reg));
} else {
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LValAST::codeGen always emits a load for non-const lvalues without an index. For array variables used as expressions (e.g. passing arr to getarray(arr) / putarray), this should produce the address (pointer to first element), not load the whole array. Add special handling for sym->type->is_array() when index == nullptr to return an element pointer (e.g. getelemptr sym->irName, 0) or otherwise pass the correct pointer value.

Suggested change
} else {
} else {
// For array variables used as expressions (e.g. function arguments),
// we should pass a pointer to the first element instead of loading
// the whole array value.
if (sym->type->is_array()) {
std::string elem_ptr_reg = builder.newReg();
builder.append(fmt::format(" {} = getelemptr {}, 0\n", elem_ptr_reg,
sym->irName));
return elem_ptr_reg;
}

Copilot uses AI. Check for mistakes.
Comment on lines +258 to +262
$$->push_back(std::unique_ptr<DefAST>(static_cast<ScalarDefAST *>($1)));
};
| VarDefList ',' VarDef {
$$ = $1;
$$->push_back(std::unique_ptr<DefAST>(static_cast<DefAST *>($3)));
$$->push_back(std::unique_ptr<DefAST>(static_cast<ScalarDefAST *>($3)));
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VarDefList also assumes every VarDef is a ScalarDefAST via static_cast<ScalarDefAST *>($1) / static_cast<ScalarDefAST *>($3), but VarDef now has array variants that construct ArrayDefAST, so when those branches are used this performs an invalid downcast from BaseAST* to ScalarDefAST* and then wraps it in unique_ptr<DefAST>, leading to undefined behavior and potential heap corruption when the pointer is deleted. A malicious SysY input that declares non-const arrays (with or without initializers) will hit this path and can crash or corrupt the compiler process. The fix is to treat all VarDef nodes polymorphically as DefAST instances instead of casting them to ScalarDefAST.

Copilot uses AI. Check for mistakes.
Comment on lines +205 to +209
$$->push_back(std::unique_ptr<DefAST>(static_cast<ScalarDefAST *>($1)));
}
| ConstDefList ',' ConstDef {
$$ = $1;
$$->push_back(std::unique_ptr<DefAST>(static_cast<DefAST *>($3)));
$$->push_back(std::unique_ptr<DefAST>(static_cast<ScalarDefAST *>($3)));
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConstDefList assumes every ConstDef is a ScalarDefAST by doing static_cast<ScalarDefAST *>($1) / static_cast<ScalarDefAST *>($3), but ConstDef now has an array branch that creates ArrayDefAST, so these casts can downcast a BaseAST* that actually points to an ArrayDefAST to ScalarDefAST*, causing undefined behavior and heap corruption when the resulting unique_ptr<DefAST> is destroyed. An attacker-controlled SysY source file that declares a const array (e.g., const int a[...];) will drive this code path and can trigger memory corruption in the compiler process. To avoid this, store ConstDef results polymorphically as DefAST (or use a safe checked cast) rather than forcing them through ScalarDefAST.

Copilot uses AI. Check for mistakes.
- Removed unused count_name variable in ir_builder.cppm.
- Cleaned up formatting in symbol_table.cppm for better readability.
- Changed debug() method to toKoopa() in type.cppm and its derived classes for consistency in type representation.
- Updated grammar rules in sysy.y to support array initialization and improved handling of parameter arrays.
- Enhanced AST code generation in ast.cpp and codegen.cpp to support new initialization semantics and improved memory management for arrays.
- Added flattening mechanism for initializer lists in InitValStmtAST to handle complex array initializations.
- Improved error handling and semantic checks throughout the codebase.
@anfsity anfsity changed the title Finish one-d array ir generation Array support Feb 10, 2026
@anfsity anfsity requested a review from Copilot February 10, 2026 16:41
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 36 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

delete $3;
}
| CONST Btype IDENT ParamArraySuffix {
$$ = new FuncParamAST(std::move(*$3), std::move(*$3), true, true, std::move(*$4));
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the CONST Btype IDENT ParamArraySuffix production, the FuncParamAST constructor is given std::move(*$3) for both the parameter type and identifier. This makes the parameter's btype incorrect (it becomes the identifier), and it also drops the actual Btype value (you delete $2 without moving it into the AST). Use $2 for the type and $3 for the identifier here (matching the non-const case).

Suggested change
$$ = new FuncParamAST(std::move(*$3), std::move(*$3), true, true, std::move(*$4));
$$ = new FuncParamAST(std::move(*$2), std::move(*$3), true, true, std::move(*$4));

Copilot uses AI. Check for mistakes.
-> std::string {
//! No node should call `InitValStmtAST`'s `codeGen` function, as it is only
//! responsible for expanding the initialize list, not for generating code.
Log::panic("");
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitValStmtAST::codeGen calls Log::panic("") with an empty message. If this is ever hit (even due to a bug elsewhere), it will be very hard to diagnose. Please provide a clear error message indicating that InitValStmtAST::codeGen is not supposed to be invoked (and ideally include node/context info).

Suggested change
Log::panic("");
Log::panic("InitValStmtAST::codeGen was called unexpectedly. This node is "
"only used to expand initializer lists and must not generate "
"IR directly.");

Copilot uses AI. Check for mistakes.
@anfsity anfsity marked this pull request as ready for review February 12, 2026 12:18
@anfsity anfsity requested a review from Copilot February 12, 2026 12:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 42 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

auto src = fmt::format("a{}", i);
emitSw(buffer, src, "sp", offset);
} else {
// Parameters passed on stack by caller are located ABOVE the current SP.
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling of function parameters passed on the stack (lines 336-340) assumes all parameters are 4 bytes in size when calculating the offset. However, array parameters should be pointers (which are 4 bytes), so this is actually correct, but the code comment should clarify this assumption. If pointer parameters could be of variable size in the future, this would need to be updated.

Suggested change
// Parameters passed on stack by caller are located ABOVE the current SP.
// Parameters passed on stack by caller are located ABOVE the current SP.
//
// IMPORTANT ABI ASSUMPTION:
// - We assume each argument passed on the stack occupies exactly 4 bytes.
// - This matches 32-bit integers and pointers on RV32; array parameters
// are passed as pointers, which are also 4 bytes under this target.
// - If we ever support wider types or a target where pointers are not
// 4 bytes, this offset calculation (and related stack layout logic)
// must be updated accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +817 to +820

if (cur_type->is_array()) {
cur_type = std::static_pointer_cast<type::ArrayType>(cur_type)->base;
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: When taking the first branch (i == 0 && sym->type->is_ptr()) and using getptr, the type is not updated because the type update code (lines 818-820) is inside the else block. This means cur_type will be incorrect for subsequent iterations. The type update should be moved outside the if-else block, similar to how it's done in AssignStmtAST::codeGen (line 591-593). This could lead to incorrect type tracking when accessing multi-dimensional array parameters.

Suggested change
if (cur_type->is_array()) {
cur_type = std::static_pointer_cast<type::ArrayType>(cur_type)->base;
}
}
if (cur_type->is_array()) {
cur_type = std::static_pointer_cast<type::ArrayType>(cur_type)->base;

Copilot uses AI. Check for mistakes.
Comment on lines +518 to +521
Log::panic(fmt::format("Semantic Error: Excess elements in array "
"initializer (expected {}, got {}).",
arr_type->base->toKoopa(),
ssize(node->initialize_list)));
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusing error message: The error message at lines 518-521 says "expected {}, got {}" where the first placeholder is filled with the type string (e.g., "i32") rather than the expected count. This will produce messages like "expected i32, got 5" instead of something more helpful like "expected 4 elements, got 5". Consider changing to show the actual capacity of the sub-type.

Suggested change
Log::panic(fmt::format("Semantic Error: Excess elements in array "
"initializer (expected {}, got {}).",
arr_type->base->toKoopa(),
ssize(node->initialize_list)));
Log::panic(fmt::format(
"Semantic Error: Excess elements in array initializer for "
"type {} (got {} elements).",
arr_type->base->toKoopa(),
ssize(node->initialize_list)));

Copilot uses AI. Check for mistakes.
auto is_ptr() const -> bool override { return true; }

static auto get(std::shared_ptr<Type> t) -> std::shared_ptr<PtrType> {
return std::make_shared<PtrType>(t);
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency: ArrayType::get() uses caching to ensure that identical array types share the same instance (lines 138-146), but PtrType::get() (line 119-121) creates a new instance every time. For consistency and to avoid potential equality comparison issues, consider implementing the same caching pattern for PtrType. This would also reduce memory usage when the same pointer type is used multiple times.

Suggested change
return std::make_shared<PtrType>(t);
static std::map<std::shared_ptr<Type>, std::shared_ptr<PtrType>> cache;
if (cache.contains(t)) {
return cache[t];
}
auto instance = std::make_shared<PtrType>(t);
return cache[t] = instance;

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +53
outputs:
artifact_name: compiler-binary
steps:
- name: Check out repository code
uses: actions/checkout@v5
- name: test koopa
- uses: actions/checkout@v5

- name: Install Ninja
run: |
autotest -w debug -koopa -s lv1 .
autotest -w debug -koopa -s lv3 .
autotest -w debug -koopa -s lv4 .
autotest -w debug -koopa -s lv5 .
autotest -w debug -koopa -s lv6 .
autotest -w debug -koopa -s lv7 .
autotest -w debug -koopa -s lv8 .
- name: test riscv
apt-get update && apt-get install -y ninja-build
ninja --version

- name: Build Binary
run: make

test_compiler:
needs: build_compiler
runs-on: ubuntu-latest
container:
image: maxxing/compiler-dev
strategy:
fail-fast: false
matrix:
test_type: [koopa, riscv]
defaults:
run:
working-directory: ./project
steps:
- uses: actions/checkout@v5

- name: Install Ninja
run: |
autotest -w debug -riscv -s lv1 .
autotest -w debug -riscv -s lv3 .
autotest -w debug -riscv -s lv4 .
autotest -w debug -riscv -s lv5 .
autotest -w debug -riscv -s lv6 .
autotest -w debug -riscv -s lv7 .
apt-get update && apt-get install -y ninja-build
ninja --version

- name: Run Test
run: autotest -w debug -${{ matrix.test_type }} .
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_compiler job depends on build_compiler but doesn't actually use the built artifact. The build job declares an output artifact_name: compiler-binary but never uploads the artifact, and the test job doesn't download it. This means the compiler will be rebuilt in the test job. Either remove the dependency and outputs declaration from build_compiler, or implement artifact upload/download to avoid rebuilding in the test job.

Copilot uses AI. Check for mistakes.
Comment on lines +79 to 89
std::shared_ptr<type::Type> param_type;
if (is_ptr) {
std::shared_ptr<type::Type> base_type = type::IntType::get();
for (const auto &dim :
indices | reverse | filter([](auto &p) { return p != nullptr; })) {
base_type = type::ArrayType::get(base_type, dim->CalcValue(builder));
}
param_type = type::PtrType::get(base_type);
} else {
std::string addr = builder.newReg();
builder.append(fmt::format(" {} = alloc i32\n", addr));
builder.symtab().define(ident, addr, type::IntType::get(), SymbolKind::Var,
false);
builder.append(fmt::format(" store @{}, {}\n", ident, addr));
param_type = type::IntType::get();
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code duplication detected: The type construction logic in FuncParamAST::codeGen (lines 79-89) and FuncParamAST::toKoopa (lines 104-115) is identical. Consider extracting this into a private helper method getParamType() to avoid duplication and make the code more maintainable.

Copilot uses AI. Check for mistakes.
@anfsity anfsity merged commit 94b09c3 into main Feb 12, 2026
12 checks passed
@anfsity anfsity deleted the array-support branch February 12, 2026 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Support for Array

2 participants