Skip to content

Conversation

mcy
Copy link
Member

@mcy mcy commented Oct 8, 2025

This PR allows our lexer to parse the following CEL productions:

  • Suffixed numbers, such as 42u.
  • Prefixed strings, such as r"", b"", and rb"".
  • Triple-quoted strings, such as """foo""".
  • < and > are separate tokens (which the parser now fuses in some cases).
  • in is now a keyword.
  • Added all CEL punctuation, such as arithmetic operators.

These all generate diagnostics in the legalizer. I have also moved things out of the lexer and into the legalizer to simplify verification of constraints.

This makes all of the changes necessary so that we can support full CEL expressions anywhere in Protobuf in the future. Doing this now is ideal, because making this work (especially with <...> matching) will only get harder as time goes on.

This change also causes us to reject some pathological Protobuf code, involving implicitly-concatenated strings. For example, option foo = """"; is valid Protobuf, but it looks like an incomplete triple-quoted string. Rejecting this is almost certainly fine, because this very likely does not exist in the open source corpus.

@mcy mcy requested review from doriable and emcfarlane October 8, 2025 22:29
@mcy mcy merged commit e47b8c2 into main Oct 14, 2025
7 checks passed
@mcy mcy deleted the mcy/cel-lexer branch October 14, 2025 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants