-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Problem: The base64.encode function in base64.js does not correctly encode strings containing non-ASCII characters when the input is a string type. It directly uses input.charCodeAt() values, which are UTF-16 code units, in bitwise operations designed for 8-bit bytes. This leads to out-of-bounds indexing on _keyStr and incorrect Base64 output.
Affected Code (lib/base64.js):
// ...
if (!isArray) { // String input path
chr1 = input.charCodeAt(i++);
// ... chr2, chr3 similarly
}
// ...
enc1 = chr1 >> 2; // If chr1 > 255, enc1 can be >> 64
// ...
output.push(_keyStr.charAt(enc1) + /* ... */); // _keyStr.charAt(>64) is ""
// ...
Failure Example:
Input: "✓"
input.charCodeAt(0) is 9999.
enc1 becomes 2499.
_keyStr.charAt(2499) results in "".
Output: "w=="
Expected (UTF-8 based): "4pyT"
Impact: Direct string inputs to base64.encode containing multi-byte Unicode characters will produce invalid Base64. The array input path functions correctly with pre-converted byte arrays.
Suggested Improvement: To ensure correct Base64 encoding for all string inputs, the string processing path within base64.encode should first convert the input string to a UTF-8 byte sequence. This byte sequence can then be processed by the existing Base64 logic that handles array inputs.
Alternatively, if direct string encoding is intended only for ASCII, this limitation should be clearly documented.