-
-
Notifications
You must be signed in to change notification settings - Fork 443
feat: switch to stdlib iterators #1144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few drive-by thoughts.
iterator.go
Outdated
var out []Token | ||
for t := i(); t != EOF; t = i() { | ||
for t := range i { | ||
if t == EOF { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I don't see that this would be necessary: in fact I 'd strongly suggest it's not necessary to define an EOF
token at all. Then this function can be as simple as:
func (i Iterator) Tokens() []Token {
return slices.Collect(i)
}
which to my mind somewhat calls into question whether the Tokens
method is worth having at all.
Maybe Iterator
could be defined just as an alias:
type Iterator = iter.Seq[Token]
But then again, it's quite possibly worth preserving surface API compatibility even if the underlying representation changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thought had crossed my mind and I don't exactly recall why there's an EOF token TBH, but there was a good reason for it at some point, so I left it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the EOF token definitely seems a bit "surprising" to me. An analogy that springs to mind is that it feels a little like passing around length-delimited strings but still keeping a zero-byte at the end and asking everyone to ignore the last character.
I've not seen an example of this pattern before and I personally would think very carefully through (and explicitly document) the reasons for why it needs to be this way, assuming it really does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It existed before to signify that the stream had reached EOF, which is obviously redundant with Go iterators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I was aware of its previous use/need, but wondered if there was a reason you'd left it around when moving to the new iterator API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. No, this is basically a half hour PoC, not ready for merge. I was mostly pondering whether switching to iterators would be worth a major version bump. But in combination with some other cleanup, like eradicating EOF, it could be worthwhile I think.
return func() Token { | ||
if len(tokens) == 0 { | ||
return EOF | ||
return func(yield func(Token) bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return slices.Values(tokens)
return EOF | ||
} | ||
} | ||
if !yield(token) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is at the end of the function, I think this test is redundant.
// Iterator returns the next Token from the lexer. | ||
func (l *LexerState) Iterator() Token { // nolint: gocognit | ||
// Iterator returns a Go iterator over tokens from the lexer. | ||
func (l *LexerState) Iterator(yield func(Token) bool) { // nolint: gocognit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW it's more conventional to return an actual iterator rather than have the method be an iterator, even if the iterator isn't reusable.
func (l *LexerState) Iterator(yield func(Token) bool) { // nolint: gocognit | |
func (l *LexerState) Iterator() chroma.Iterator { | |
return func(yield func(Token) bool) { | |
etc | |
} | |
} |
n := len(l.iteratorStack) - 1 | ||
t := l.iteratorStack[n]() | ||
// Exhaust the token stack, if any. | ||
for len(l.tokenStack) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked in any depth, but I suspect it might be possible to make this much simpler by taking advantage of the push-based iterator and just writing a recursive function rather than a hand-maintained stack.
No description provided.