Conversation
jgonggrijp
left a comment
There was a problem hiding this comment.
Thanks for contributing @Krinkle!
At first sight, these tests seem totally reasonable. Of course, a Mustache implementation (or any software, for that matter) should ideally support UTF-8 encoded text.
The more I think about it, however, the more I believe that these tests lack a clear rationale. Why exactly do these tests need to exist? What does it mean if an implementation does not pass them?
If an implementation fails, does that actually say anything about the implementation, or perhaps rather about the library that is responsible for reading the file from disk and/or for parsing the YAML/JSON?
Putting it differently: suppose that the standard library for my programming language is capable of reading UTF-8 encoded text from a file and loading it into a standard text datastructure for the programming language (that will be string, String or str in most cases). Furthermore, suppose that my Mustache implementation reads templates from that standard datastructure and that my implementation is already known to support ASCII-encoded templates. Is it even possible for my implementation to fail in that case? After all, UTF-8 is a superset of ASCII. I don't think the octet 0x7b can mean anything other than { in UTF-8. Likewise for } and all the sigils.
Has libmustache ever failed these tests, @jbboehr? If it did, what did you need to do in order to pass them?
I don't think it needs to be in the spec or have tests but I believe this is possible in some languages. I'm not going to reference where I discuss this as I got rather embarrassingly unhinged but apparently if you write say a PHP lambda code in latin1 (what encoding the file is saved in is what the literals become) and then take a section that is UTF-8 because the template is and return a new template and there is a chance of corruption. While the probability of that happening in 2023 is pretty darn low back in the 2000s when I started my career it happened often (albeit usually with XML...).
I guess it is not possible. I thought there were fringe cases with BOM so yes I guess your safe parsing byte by byte. However if someone sets a delimiter above 127 then there will be problems. Maybe that is something that could be specified. Delimiters must be below 127. But again probably not worth it. |
Upstreaming additional test cases written by @jbboehr. These were helpful in developing https://github.com/jbboehr/libmustache.
Original location: https://github.com/jbboehr/mustache-spec/commits/b96be9fd4c6d6984828d93169fe7e86d8a8aec2f