Skip to content

Conversation

alecthomas
Copy link
Owner

No description provided.

Copy link

@rogpeppe rogpeppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few drive-by thoughts.

iterator.go Outdated
var out []Token
for t := i(); t != EOF; t = i() {
for t := range i {
if t == EOF {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't see that this would be necessary: in fact I 'd strongly suggest it's not necessary to define an EOF token at all. Then this function can be as simple as:

func (i Iterator) Tokens() []Token {
    return slices.Collect(i)
}

which to my mind somewhat calls into question whether the Tokens method is worth having at all.
Maybe Iterator could be defined just as an alias:

type Iterator = iter.Seq[Token]

But then again, it's quite possibly worth preserving surface API compatibility even if the underlying representation changes.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thought had crossed my mind and I don't exactly recall why there's an EOF token TBH, but there was a good reason for it at some point, so I left it in.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the EOF token definitely seems a bit "surprising" to me. An analogy that springs to mind is that it feels a little like passing around length-delimited strings but still keeping a zero-byte at the end and asking everyone to ignore the last character.

I've not seen an example of this pattern before and I personally would think very carefully through (and explicitly document) the reasons for why it needs to be this way, assuming it really does.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It existed before to signify that the stream had reached EOF, which is obviously redundant with Go iterators.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I was aware of its previous use/need, but wondered if there was a reason you'd left it around when moving to the new iterator API.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. No, this is basically a half hour PoC, not ready for merge. I was mostly pondering whether switching to iterators would be worth a major version bump. But in combination with some other cleanup, like eradicating EOF, it could be worthwhile I think.

return func() Token {
if len(tokens) == 0 {
return EOF
return func(yield func(Token) bool) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return slices.Values(tokens)

return EOF
}
}
if !yield(token) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is at the end of the function, I think this test is redundant.

// Iterator returns the next Token from the lexer.
func (l *LexerState) Iterator() Token { // nolint: gocognit
// Iterator returns a Go iterator over tokens from the lexer.
func (l *LexerState) Iterator(yield func(Token) bool) { // nolint: gocognit

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW it's more conventional to return an actual iterator rather than have the method be an iterator, even if the iterator isn't reusable.

Suggested change
func (l *LexerState) Iterator(yield func(Token) bool) { // nolint: gocognit
func (l *LexerState) Iterator() chroma.Iterator {
return func(yield func(Token) bool) {
etc
}
}

n := len(l.iteratorStack) - 1
t := l.iteratorStack[n]()
// Exhaust the token stack, if any.
for len(l.tokenStack) > 0 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked in any depth, but I suspect it might be possible to make this much simpler by taking advantage of the push-based iterator and just writing a recursive function rather than a hand-maintained stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants