Enrich the lexer to support the entire CEL lexicon #592
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR allows our lexer to parse the following CEL productions:
42u
.r""
,b""
, andrb""
."""foo"""
.<
and>
are separate tokens (which the parser now fuses in some cases).in
is now a keyword.These all generate diagnostics in the legalizer. I have also moved things out of the lexer and into the legalizer to simplify verification of constraints.
This makes all of the changes necessary so that we can support full CEL expressions anywhere in Protobuf in the future. Doing this now is ideal, because making this work (especially with
<...>
matching) will only get harder as time goes on.This change also causes us to reject some pathological Protobuf code, involving implicitly-concatenated strings. For example,
option foo = """";
is valid Protobuf, but it looks like an incomplete triple-quoted string. Rejecting this is almost certainly fine, because this very likely does not exist in the open source corpus.