-
Notifications
You must be signed in to change notification settings - Fork 239
feat: Allow nested interpolated strings in lexer #4055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Possibly another approach is to instead produce ...which would be how we currently produce parentheses (but not how we produce strings — those return a |
I tried this implementation strategy when rewriting from Pest to chumsky, but my bottom line was the same: the parser expects a flat stream of tokens and not a tree. This PR is a good effort, but I don't know how to continue from here. The I like your idea of So the following:
... would be lexed as:
... or maybe like this:
|
// TODO: decide how we want to handle colons in interpolated expressions | ||
// We use rewinds to look ahead and ensure we don't have a closing | ||
// bracket (or colon), before forwarding that to the lexer. | ||
filter(|c| *c != '}' && *c != ':') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you aware that this prevents interpolations from containing }
?
Following is not supported:
f"result: { my_func_that_takes_a_tuple {a=1} }"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I was aware of the issue with colons at least — python also has this problem but doesn't use colons so much.
With closing brackets, that's indeed a more restrictive case, and I agree with your broader point about it being confusing if we don't allow that, given how prominently brackets are used.
One difficulty is that at the lexing stage we don't know about matching opening & closing brackets, so we can't do the thing of "it's the end of the interpolated string if it's the final closing bracket".
We could require escape characters, which seems reasonable if a bit ugly.
), | ||
[], | ||
) | ||
"###); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for the following:
s"LOWER_CASE( {"contents"} )"
and:
s"LOWERCASE( { f"Hello, {"Aljaz"} is my name." } )"
Meh, are nested string interpolation even something that is useful? |
This is a good question! It depends what form of nesting you're asking about. From the linked issue, this seems helpful, and arguably confusing that we don't allow it:
OTOH, having s & f strings inside other s & f strings I agree isn't that helpful. |
I do agree, although there are cases where it would be useful: ...
f"my items are { std.array.join this.items "," }" I worry about language consistency around interpolation:
I'm not against this incremental improvement, but I would like to have a long-term plan on what we plan on supporting. |
Ok, I'm impressed with Pablo's work - Python is able to parse this:
|
I've thought this was hard, not I think this is harder. :D |
But not this!
|
Taking some of PRQL#4055
bce69f9
to
b649696
Compare
An 80% complete attempt to allow having an interpolated string contain expressions — #3279
Supersedes #3280
Currently it successfully lexes
s"{hello + 5}"
into a nested expression of tokens.But I'm not sure how to then parse those nested tokens — chumsky possibly doesn't support this sort of thing, or only a very basic version with https://docs.rs/chumsky/latest/chumsky/stream/struct.Stream.html#method.from_nested. I tried a lot of things and couldn't make progress (this was much harder than I thought it would be...)
It also required refactoring the existing lexer into a function that outputs individual tokens. It uses some
rewind
parsers to know when we've reached the end of the parent.Any ideas for making this work? Have I overcomplicated it?
FYI most of the existing tests pass, because I tried to have the proposed interpolation go back to its string form. There are some corner case failures because of the way that spans work.