Skip to content

Conversation

@andr-dots
Copy link

@andr-dots andr-dots commented Jul 11, 2025

The state system allows programmers to write complex grammar parsers for the languages with the unusual features.

The PEG action system allows to handle such features as templates/generics in such programming languages like Python and Golang or to handle the labels in such languages as C (it's an automatic linkage of labels).

The parsing state system allows to handle non-linear code parsing scenarios (like cases where the syntax constructions overlap and depend on the previously found constructions during the parsing process).

Both the state and the action systems allow programmers to solve many known and possible issues that the original parser was unable to solve. Previously, such problems were, most likely, solved in the semantic layer of the program (with a much greater complexity and much less readable code).

The con of the new system is that the parsing process becomes a little slower while working with the state system because it needs to take a state snapshot every time before a match can fail.

Updated on 2025-07-20:

To make the tests smoother, an improvement was made to the PEG grammar notation, i.e. separator % operator is now available so that writing repetitions with a separator would be much easier. Although I thought about using , instead, but this way the code looks too odd.

Code review checklist

  • Pull request represents a single change (i.e. not fixing disparate/unrelated things in a single PR)
  • Title summarizes what is changing
  • Commit messages are meaningful (see [this][commit messages] for details)
  • Tests have been included and/or updated
  • Docstrings have been included and/or updated, as appropriate
  • Standalone docs have been updated accordingly
  • Changelog(s) has/have been updated, as needed (see CHANGELOG.md, no need
    to update for typo fixes and such).

@andr-dots
Copy link
Author

andr-dots commented Jul 11, 2025

Also, I'd like to add that the many things can be moved to the state class in the future. And the state class should be passed to the parser expression parse function instead of the parser itself.

@andr-dots
Copy link
Author

andr-dots commented Jul 15, 2025

In a recent commit, I added support of suppress action. This feature is crucial in the PEG parser notation and the lack of it leads to an overcomplex solutions in the semantic layer.

@andr-dots andr-dots force-pushed the peg-actions-and-states branch 2 times, most recently from b6b3810 to b8a5492 Compare July 17, 2025 18:13
@andr-dots andr-dots marked this pull request as draft July 18, 2025 12:43
@andr-dots andr-dots force-pushed the peg-actions-and-states branch 2 times, most recently from e74d29d to d37be8e Compare July 20, 2025 19:56
andr-dots added 23 commits July 21, 2025 19:34
@andr-dots andr-dots force-pushed the peg-actions-and-states branch from 0dc8eba to c667daf Compare July 27, 2025 10:02
@andr-dots andr-dots changed the title The state system and support for the actions and states in PEG parser notation The state system and support for the actions and states in the PEG parser notation Jul 27, 2025
@andr-dots andr-dots marked this pull request as ready for review July 27, 2025 10:21
@andr-dots andr-dots force-pushed the peg-actions-and-states branch 3 times, most recently from 180f3aa to e3ee6ab Compare August 6, 2025 13:47
@andr-dots andr-dots force-pushed the peg-actions-and-states branch from 3685c8d to 4dd394c Compare August 6, 2025 15:59
@andr-dots andr-dots force-pushed the peg-actions-and-states branch from 4dd394c to 931e0b5 Compare August 6, 2025 18:23
@igordejanovic
Copy link
Member

Hi @andr-dots. Thank you for your substantial work in this PR and sorry for took me so long to respond. While the explicit state handling you've implemented offers powerful capabilities for context-sensitive languages, it also introduces significant complexity and performance considerations, as you've noted.

Given Arpeggio's design philosophy of remaining a simple PEG parser, I don't believe this rework aligns with the project's direction. Furthermore, I don't feel comfortable having to maintain this rework.

I'd suggest forking this work into a separate library where you could maintain full control over its development. This would allow users needing more advanced parsing capabilities to benefit from your work, while Arpeggio continues serving those who need a straightforward PEG parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants