Remove use of utf8_encode (deprecated in PHP 8.2)#598
Remove use of utf8_encode (deprecated in PHP 8.2)#598sammarshallou wants to merge 1 commit intocatalyst:MOODLE_310_STABLEfrom
Conversation
|
I'm doing some more testing on this locally so it might be best to hold off doing anything with this until I finish that next week (will comment again). Haven't found anything wrong, just realised that I don't think my previous testing actually ran the code in question :) |
|
Hi @sammarshallou, HTH, |
|
Wow, this change actually does break it. Ooops. I made a file called こんにちは.jpg Before change it sends me the file with this header: Content-Disposition: inline;filename="ã��ã��ã�«ã�¡ã�¯.jpg" The URL for this file included this parameter: ?response-content-disposition=inline%3Bfilename%3D%22%C3%A3%C2%81%C2%93%C3%A3%C2%82%C2%93%C3%A3%C2%81%C2%AB%C3%A3%C2%81%C2%A1%C3%A3%C2%81%C2%AF.jpg%22 As you can see this is not really valid but it appears to work (Firefox at least shows the right name in header). After change it is like this: ?response-content-disposition=inline%3Bfilename%3D%22%E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF.jpg%22 And results in error: So, it looks like the reason the current code works is that by calling utf8_encode it is treating UTF-8 as if it is a longer ISO-8859-1 string (because multiple bytes per character from original UTF-8, now counted as single ISO-8859-1 characters), and then encoding those single characters as UTF-8. Later on, something else converts those characters into URL-encoded format. Then browsers receive the header with ISO-8859-1 characters that are actually UTF-8 and decides to play nice and turn it into UTF-8, because there isn't a filename*. The easiest fix is probably to use a non-deprecated function to do the same hideous encoding (UTF-8 broken into single bytes and treated as ISO-8859-1 then turned to UTF-8) then it will not change behaviour. Might be nice to do filename* as well though... |
44e4ac9 to
9e0101e
Compare
|
Our test server deployment system chose today to throw a massive wobbly, so I haven't actually been able to test my fix yet, but I thought I'd push it anyway. I'll update once properly tested. Note: Needs testing for both CloudFront (which we actually use) and S3 (which we don't but I guess I can switch the setting) URLs, as the result of this function is used for both. |
|
I got roped in to fix deployment, so I've now discovered that the recommended approach with filename* doesn't work because S3/CloudFront still won't let you use non-ISO characters. So, the best option is to go back to the original hack but with a comment to explain it, and using a non-deprecated function for the charset conversion (moodle core_text API). Testing this now. |
9e0101e to
aeea48d
Compare
|
Tested as follows:
In the case of steps 2 and 3 it worked before, this was just verifying that nothing changed. As I only used a different function to do the same thing, wasn't really expecting it to. So in summary, I've actually tested this now and will deploy to our live release in a few weeks. |
To go with my other PHP 8.2 fix PR #586...
There is a use of deprecated function utf8_encode. But, the way it's used appears to be entirely pointless - it is calling utf8_encode to create a header based on a string that already came from a header which should already be in UTF-8 (or possibly ISO-8859-1, but anyway).
@marxjohnson already pointed out this was unnecessary in #455
There do seem to be some problems related to S3 and character encoding re. Content-Disposition but I find it hard to believe that calling utf8_encode would help - in fact it sounds like UTF8 isn't technically allowed there and the right answer seems to be something very complicated with URL-encoding and a filename* parmeter as well a filename, according to this blog https://engineering.resolvergroup.com/2022/02/aws-s3-utf-8-content-disposition/