Skip to content

EOF token is returned only once in recursive grammar #405

@dsogari

Description

@dsogari

I have the following grammar:

// test.jison
%lex

%%

\s+ // skip whitespace
\w+ return 'IDENTIFIER';
\:  return 'BEGIN_BLOCK';
$   { console.log('EOF'); return 'EOF'; }
.   return 'INVALID';

/lex

%start DOCUMENT

%%

DOCUMENT: STATEMENT EOF;

STATEMENT: IDENTIFIER STMT_BLOCK;

STMT_BLOCK: /**/ | BEGIN_BLOCK DOCUMENT;

This is the test script:

// test.js
import jison from 'jison';
import fs from 'fs';

const grammar = fs.readFileSync('test.jison', 'utf8');
const parser = jison.Parser(grammar);
try {
    parser.parse(process.argv[2]);
} catch(err) {
    console.log(err.message);
}

The command node test.js 'level1' runs without errors and prints EOF.

We should expect node test.js 'level1: level2' to print EOF twice, but it prints this instead:

EOF
Parse error on line 1:
level1: level2
--------------^
Expecting 'EOF', got '1'

The reason is that the EOF token is returned only once, at the nested level. After that, the 1 token (the parser value for end-of-file) is returned. Unfortunately, we cannot reference this special token from the grammar, which makes it impossible to parse this particular language. :(

To fix it, I believe the $ (or equivalent <<EOF>>) rule should get picked up indefinitely while matching the end of file. Or else provide a way to reference the 1 token directly in the grammar.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions