Fix UTF16 byte order + extend utf8 & utf16 test #41

Sigma42 · 2024-10-01T12:19:49Z

This fixes #40 by using ByteOrder.nativeOrder().

I changed InputEncoding.valueOfto accept StandardCharsets.UTF_16BE, StandardCharsets.UTF_16LE and StandardCharsets.UTF_16 but ignore the endianness (to stay backwards compatible with e.g. a big-endian machine).

I also changed the parseUtf8 and parseUtf16 tests to verify that their small example code are actually parsed correctly.

ObserverOfTime

Isn't StandardCharsets.UTF_16 enough on its own?

Sigma42 · 2024-10-01T19:21:15Z

Not as a value for the enum variant InputEncoding.UTF_16.
Because String.getBytes(StandardCharsets.UTF_16) may return any byte order, it just includes a byte-order mark at the start. (in my testing using openjdk on linux x86 it still retuned a big-endian encoded string, just with the added BOM). And treesitter ignores the BOM.

But for the InputEncoding.valueOf it might make more sense to only allow StandardCharsets.UTF_16.
I just added StandardCharsets.UTF_16BE so it would not be a breaking change. But I guess it was probably never used anyway because it would have only worked for big-endian machines.

ObserverOfTime · 2024-10-05T08:15:02Z

See tree-sitter/tree-sitter#3740

Fix UTF16 byte order + extend utf8 & utf16 test

5c467db

ObserverOfTime reviewed Oct 1, 2024

View reviewed changes

ObserverOfTime merged commit 6ce00b5 into tree-sitter:master Oct 7, 2024
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix UTF16 byte order + extend utf8 & utf16 test #41

Fix UTF16 byte order + extend utf8 & utf16 test #41

Uh oh!

Sigma42 commented Oct 1, 2024

Uh oh!

ObserverOfTime left a comment

Uh oh!

Sigma42 commented Oct 1, 2024

Uh oh!

ObserverOfTime commented Oct 5, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix UTF16 byte order + extend utf8 & utf16 test #41

Fix UTF16 byte order + extend utf8 & utf16 test #41

Uh oh!

Conversation

Sigma42 commented Oct 1, 2024

Uh oh!

ObserverOfTime left a comment

Choose a reason for hiding this comment

Uh oh!

Sigma42 commented Oct 1, 2024

Uh oh!

ObserverOfTime commented Oct 5, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants