This document explains the complete AstroScript compilation flow from source code to execution, then traces exactly how the frontend playground triggers and presents that flow.
It is grounded in the current implementation in:
- backend/compiler/main.cpp
- backend/compiler/lexer/lexer.l
- backend/compiler/parser/parser.y
- backend/compiler/semantic/symbol_table.h
- backend/compiler/semantic/symbol_table.cpp
- backend/compiler/ir/tac.h
- backend/compiler/ir/tac.cpp
- web/src/app/api/compile/route.ts
- web/src/app/api/health/route.ts
- web/src/lib/compilerPath.ts
- web/src/app/playground/page.tsx
- web/src/components/playground/CodeEditorPanel.tsx
- web/src/components/playground/OutputPanel.tsx
- web/scripts/generate-language-reference.mjs
AstroScript uses a classic educational compiler pipeline:
- Lexing (Flex): source text -> token stream.
- Parsing + semantic actions (Bison): token stream -> grammar validation + TAC emission.
- Symbol checks: declarations, duplicate names, undeclared usage checks.
- IR generation: three-address code (TAC) instructions.
- TAC optimization: constant folding, algebraic simplification, redundant move removal.
- C-like projection: structured learning-oriented C output from optimized TAC.
- TAC execution: interpreter runs optimized TAC.
- Output projection: printable lines (
PRINT:), TAC dump, C-like translation, errors.
Frontend playground integration:
- User writes code in Monaco editor.
- Frontend calls
/api/compile. - API writes temporary
.asfile, executes compiler binary. - API parses compiler stdout/stderr into
output,ir,tokens,diagnostics. - UI displays runtime output, intermediate code, and clickable diagnostics.
main.cpp is the orchestrator:
- Prints startup banner.
- Opens source file from CLI argument and assigns to
yyinfor lexer input. - Calls
yyparse(). - If parse returns
0(success):tacGenerator.optimize()tacGenerator.printCode("Optimized Three Address Code")tacGenerator.printCTranslation("C-Like Translation")tacGenerator.execute()symbolTable.printTable()
- Returns parse status.
Important behavior:
- Semantic errors are currently printed during parse actions but do not hard-stop parse automatically.
- Runtime errors are emitted during TAC execution as
RUNTIME ERROR: ....
The lexer:
- Matches mission keywords (
mission,launch,success, ...). - Matches declarations/types (
telemetry,count,real, ...). - Matches control flow (
verify,orbit,scenario, ...). - Matches operators (
add,minus,mul,divide,mod,**, relations, assignment). - Emits punctuation tokens (
{},(),[],.,,,:). - Parses literals:
- Integer (
INT_LITERAL) - Float (
FLOAT_LITERAL) - String (
STRING_LITERAL)
- Integer (
- Ignores comments:
- Single-line:
$$ ... - Multi-line:
$* ... *$
- Single-line:
- Reports unknown characters as:
LEXICAL ERROR at line N: ...
Notes:
+ - * /and keyword operators are normalized into token classes (ADD,MINUS, etc.).- All lexer keywords now have parser productions and active runtime behavior.
The parser both validates syntax and emits TAC on-the-fly.
- Program:
mission ... launch { ... } success - Statement categories include:
- declaration
- assignment
- print/input
- control flow
- loops
- switch-style scenarios
- function/module definitions
- wait
- return
- abort
Implemented forms include:
- scalar declaration
- scalar declaration with initialization
- limit declaration (constant-like but currently mutable at runtime)
- arrays (
telemetry type name[size].) - mode definitions with
trajectory/fallbackentries - alias declarations are metadata mappings; fleet declarations emit executable typed array declarations (
fleet type name[size].or unsizedfleet type name.with default capacity)
- Scalar assignment emits
=TAC. - Array assignment emits
storeTAC. - Undeclared names produce semantic error prints.
Supported expression families:
- arithmetic:
+ - * / % ** - relational:
< > <= >= == != - logical:
AND OR XOR NOT - literals and identifiers
- array load expressions
- function call expressions
Semantic action highlights:
- Function arity check is now enforced when function signature is known:
Semantic Error at line N: function f expects X arguments but got Y
verify / else_verify / otherwiseemits labels + conditional gotos.orbit (...)andorbit while (...)emit loop labels.orbit times (i : start : end)emits assignment, condition, increment TAC.scenario/trajectory/fallbackemits switch-like compare-and-jump TAC.
- Module syntax is parsed with optional
extends. - Members can include declarations and functions.
- Functions emit:
func_beginparam_deffor parameters- body TAC
- implicit or explicit
return func_end
Current symbol table responsibilities:
- Insert/check symbols by name.
- Store declared type text.
- Store numeric value slot (lightweight).
- Report table at end of run.
Most scope and compatibility checks (scope stacks, overload checks, module inheritance/member access checks) are implemented directly in parser.y helper logic rather than symbol_table.cpp.
TACInstruction fields:
oparg1arg2result
Runtime data model:
RuntimeValue = variant<double, string>- Frame stack (
variablesmap per frame) - Global array storage map
- Function boundary metadata
- Parameter stack for function calls
Key emitters:
- declarations:
decl,decl_arr - movement:
= - arithmetic/comparison/logical via
emitBinary - control:
label,goto,ifFalse - I/O:
print,input - arrays:
store,load - functions:
func_begin,param_def,param,call,return,func_end
- Generated from optimized TAC by
printCTranslationinbackend/compiler/ir/tac.cpp. - Preserves side-effecting call/method-call instructions even when temporary results are not reused.
- Carries visible function/method arguments into translated call sites.
- Uses runtime stubs (
astro_new_object,astro_get_field,astro_set_field,astro_call_methodv) for object semantics in the learning view.
- Constant folding
- Folds numeric binary operations when safe.
- Division/modulo by zero are not folded away.
- Algebraic simplification
- Examples:
x+0 -> x,x*1 -> x,x*0 -> 0.
- Examples:
- Redundant move elimination
- Removes
x = x.
- Removes
Execution handles each TAC op with a program counter.
Now includes runtime diagnostics for critical failures:
- division by zero
- modulo by zero
- bad array reference (undeclared array)
- array index out of bounds
- invalid jump target
- invalid conditional jump target
- undefined function call
- stack overflow guard (
kMaxCallDepth = 1024) - invalid frame states on return/end
- out-of-memory during array allocation
- negative array size (clamped with error)
Runtime error format:
RUNTIME ERROR: <message>
This format is consumed by frontend diagnostics parsing.
resolveCompilerPath() tries, in order:
ASTROSCRIPT_COMPILER_PATHenv var.../backend/compiler/build/astroscript.exe../backend/compiler/build/astroscript
This keeps playground portable across Windows/Linux/macOS builds.
GET /api/health:
- Resolves compiler path.
- Returns
compilerReady: true/falsewith metadata. - Used by UI status badges and run guard.
POST /api/compile flow:
- Validate request JSON with
codestring. - Resolve compiler path; fail with actionable message if missing.
- Write code to temporary
.asfile under OS tmp dir. - Execute compiler with timeout and output buffer limit.
- Parse stdout/stderr into UI payload:
output: lines prefixedPRINT:ir: section under--- Optimized Three Address Code ---tokens: token-related lines if any (often empty for this build)diagnostics: lexical/syntax/semantic/runtime classification
- Add human-friendly suggestions (
humanMessage,fixHint). - Return:
200when no diagnostics/stderr422when compile/runtime diagnostics exist500for infrastructure failures
- Cleanup temp file in
finally.
Diagnostic classifier supports patterns:
LEXICAL ERROR ...SYNTAX ERROR ...Semantic Error ...RUNTIME ERROR ...
Main responsibilities:
- Holds editor code and compile results state.
- Runs health checks on mount and on demand.
- Calls
/api/compileon Run orCtrl/Cmd + Enter. - Routes results to output tabs (
output,tokens,ir,errors). - Tracks cursor line/column and run duration.
- Registers custom AstroScript language.
- Defines syntax highlighting for mission keywords/operators/comments.
- Displays diagnostics as Monaco markers.
- Supports jump-to-line from error panel.
- Tabbed display of runtime output, tokens, IR, errors.
- Error tab renders diagnostic cards with line jump support.
- Shows user-friendly summary and fix hints from compile API.
- User writes mission code in Monaco.
- User clicks Run.
- Frontend checks compiler health state.
- Frontend sends code to
/api/compile. - API writes temp file and executes binary.
- Compiler: lexer -> parser+semantic actions -> TAC optimize -> TAC execute.
- Compiler emits:
- IR section
PRINT:lines- semantic/runtime errors (if any)
- API parses and enriches diagnostics.
- Frontend renders output/IR/errors and editor markers.
During this pass, the backend was hardened for consistency and safer failure modes:
- Added function call arity mismatch semantic check in parser.
- Added runtime guards/errors in TAC interpreter for:
- divide/modulo by zero
- array bounds and undeclared array references
- invalid labels/jumps
- undefined function calls
- stack overflow guard
- memory allocation failures
- Added executable support in parser/runtime for documented operators:
mod**AND,OR,XOR,NOT
- Activated previously partial tokens and semantics:
deploy,this,broadcast,alarm- math built-ins:
root,flr,ceil,abs,logarithm,sine,cosine,tan,asine,acosine,atan,prime - loop/switch controls:
stage_sep,coast - declaration tokens:
alias,fleet
- Added stronger static checks:
- assignment/initialization type compatibility
- return type validation
- constant reassignment prevention
This reduces drift between docs and implementation while improving robustness under edge conditions.
These are important for accurate explanation to an instructor:
- Static type checks are now significantly stronger, but not yet equivalent to a full production-grade static type system.
- Object-oriented runtime remains lightweight: module/deploy/this are active, but full class object memory/method dispatch semantics are intentionally simple.
- Advanced exceptions such as Java-style null/class-cast are mostly not applicable to the current runtime model.
Use this short flow while presenting:
- "AstroScript is compiled by Flex+Bison into TAC, then interpreted."
- "The parser is not only validating syntax, it also emits IR instructions directly."
- "Semantic checks currently focus on symbol existence and call arity, while runtime checks catch unsafe execution cases."
- "The web playground calls an API route that executes the compiler binary and converts raw output into structured diagnostics and tabs."
- "So students can inspect both final output and intermediate representation in one place, which is ideal for learning compiler internals."
This section is a direct answer to the common instructor question: "What is your IR/optimization layer doing in practice?"
The compiler lowers AstroScript programs into Three-Address Code (TAC).
- Format:
op, arg1, arg2, result - Representation: linear instruction list
- Execution model: instruction pointer over TAC, with labels and jumps
The TAC layer includes operations for:
- Arithmetic and comparisons (
+,-,*,/,%,**, relational ops) - Boolean logic (
AND,OR,XOR,ifFalse) - Variables and assignment (
decl,=) - Arrays (
decl_arr,store,load) - Functions (
func_begin,param_def,param,call,return,func_end) - Objects/modules (
obj_new,field_set,field_get,mcall) - Built-ins (
root,logarithm,sine,prime, etc.)
Parser semantic actions emit TAC while parsing. Example patterns:
verify (...) { ... }lowers into conditional jump blocks with labels.orbitloops lower into explicit loop labels + back edges (goto).- function declarations emit
func_begin/func_endand parameter definitions. - calls push arguments (
param) beforecallormcall.
So high-level syntax is converted into explicit control flow and data flow.
optimize() runs three passes before execution:
- Constant folding
- Precomputes pure constant expressions and constant unary built-ins.
- Algebraic simplification
- Removes identity operations (
x+0,x*1,x/1) and trivial zero cases (x*0).
- Removes identity operations (
- Redundant move elimination
- Removes assignments that do nothing (
a = a).
- Removes assignments that do nothing (
This is intentionally a lightweight local optimizer (not a global SSA optimizer).
Conceptual lowering for a declaration assignment:
- Source:
telemetry count x := 5 add 3. - Unoptimized TAC idea:
decl COUNT xt1 = 5 + 3x = t1
- After constant folding:
decl COUNT xt1 = 8x = t1
Fewer runtime computations are needed, and IR becomes easier to inspect.
- Correctness architecture
- Parsing/semantic validation and execution are separated by a stable intermediate form.
- Runtime simplicity
- Interpreter only needs to execute TAC ops, not full grammar-level constructs.
- Observability
- Optimized TAC is printable and traceable, which helps debugging and grading.
- Performance improvement
- Basic compile-time simplifications reduce instruction count and repeated math at runtime.
- Extensibility
- New language features can be added by defining their TAC lowering + TAC runtime op behavior.
Direct AST execution can work, but TAC gives a cleaner compiler boundary:
- uniform instruction semantics,
- explicit control-flow graph via labels/jumps,
- reusable optimization stage,
- reusable backend for interpreter and C-like projection.
In one line: IR is the canonical machine-like form of AstroScript in this project, and optimization is the safe pre-execution simplification stage that improves clarity and runtime cost.