Skip to content
16 changes: 10 additions & 6 deletions src/evented-tokenizer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -266,18 +266,18 @@ export default class EventedTokenizer {
endTagName() {
let char = this.consume();

if (isSpace(char)) {
this.transitionTo(TokenizerState.beforeAttributeName);
this.tagNameBuffer = '';
if (isSpace(char) && isAlpha(this.peek())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check if the next char is alpha here? I think the goal is to issue an error if there is white space as the first thing after the </, right?

If so, maybe something like:

if (isSpace(char)) {
  if (this.tagNameBuffer === '') {
    this.delegate.reportSyntaxError('closing tag must only contain tagname');
  }
}

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the check for whitespace after </ on lines 490-493 of this file:

...
} else {
    this.transitionTo(TokenizerState.endTagName);
    this.delegate.beginEndTag();
    this.delegate.reportSyntaxError('closing tag cannot contain whitespace before tagname');
}

The check on line 269 is specifically looking for attributes after the EndTag's tagname. I made the decision to allow whitespace after the tagname because the HTML spec allows for this

  1. After the tag name, there may be one or more ASCII whitespace.

So I'm specifically looking for whitespace followed by and ASCII alpha character (the start of an attribute), which is invalid syntax.

I've you'd prefer to completely disallow any whitespace in closing tags, I'd be happy to update this PR to check for that!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've you'd prefer to completely disallow any whitespace in closing tags, I'd be happy to update this PR to check for that!

Ya, let's do that.

this.delegate.reportSyntaxError('closing tag must only contain tagname');
} else if (char === '/') {
this.transitionTo(TokenizerState.selfClosingStartTag);
this.tagNameBuffer = '';
this.delegate.reportSyntaxError('closing tag cannot be self-closing');
} else if (char === '>') {
this.delegate.finishTag();
this.transitionTo(TokenizerState.beforeData);
this.tagNameBuffer = '';
} else {
this.appendToTagName(char);
if (!this.delegate.current().syntaxError && !isSpace(char)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t fully understand this conditional (reviewing on mobile so forgive me if I’ve missed something obvious).

Why do we check if .current().syntaxError?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that this is confusing, and I feel like there may be a better way to do this.

This check is required since we no longer enter the beforeAttributeName or selfClosingStartTag states, which would reset the tagNameBuffer. Without this check I was getting invalid tag names that include the whitespace and/or attributes. i.e {tagname: 'div foo="bar"'}.

The !isSpace(char) is there to not include whitespace in the tagname, since I had made the decision to allow trailing whitespace in the closing tag. More on that decision in my reply to your other comment.

this.appendToTagName(char);
}
}
},

Expand Down Expand Up @@ -487,6 +487,10 @@ export default class EventedTokenizer {
this.tagNameBuffer = '';
this.delegate.beginEndTag();
this.appendToTagName(char);
} else {
this.transitionTo(TokenizerState.endTagName);
this.delegate.beginEndTag();
this.delegate.reportSyntaxError('closing tag cannot contain whitespace before tagname');
}
}
};
Expand Down
1 change: 1 addition & 0 deletions src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ export interface TokenMap {
}

export interface TokenizerDelegate {
current(): Token;
reset(): void;
finishData(): void;
tagOpen(): void;
Expand Down
44 changes: 42 additions & 2 deletions tests/tokenizer-tests.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,49 @@ QUnit.test('A simple closing tag', function(assert) {
assert.deepEqual(tokens, [endTag('div')]);
});

QUnit.test('A simple closing tag with trailing spaces', function(assert) {
QUnit.test('A closing tag can containg trailing spaces', function(assert) {
let tokens = tokenize('</div \t\n>');
assert.deepEqual(tokens, [endTag('div')]);
let output = [endTag('div')];

assert.deepEqual(tokens, output);
});

QUnit.test('A closing tag cannot containg leading spaces', function(assert) {
let tokens = tokenize('</ div>');
let output = [withSyntaxError(
'closing tag cannot contain whitespace before tagname',
endTag('')
)];

assert.deepEqual(tokens, output);
});

QUnit.test('A closing tag cannot contain an attribute', function(assert) {
let tokens = tokenize('</div foo="bar">');

assert.deepEqual(tokens, [withSyntaxError(
'closing tag must only contain tagname',
endTag('div')
)]);
});

QUnit.test('A closing tag cannot contain multiple attributes', function(assert) {
let tokens = tokenize('</div foo="bar" foo="baz">');
assert.deepEqual(tokens, [withSyntaxError(
'closing tag must only contain tagname',
endTag('div')
)]);
});

QUnit.test('A closing tag cannot be self-closing', function(assert) {
let tokens = tokenize('</div/>');

let output = [withSyntaxError(
'closing tag cannot be self-closing',
endTag('div')
)];

assert.deepEqual(tokens, output);
});

QUnit.test('A pair of hyphenated tags', function(assert) {
Expand Down