Skip to content

Potential for unexpected output from base64.encode with non-ASCII string inputs #950

@xzyingxiashubro

Description

@xzyingxiashubro

Problem: The base64.encode function in base64.js does not correctly encode strings containing non-ASCII characters when the input is a string type. It directly uses input.charCodeAt() values, which are UTF-16 code units, in bitwise operations designed for 8-bit bytes. This leads to out-of-bounds indexing on _keyStr and incorrect Base64 output.

Affected Code (lib/base64.js):

// ...
if (!isArray) { // String input path
    chr1 = input.charCodeAt(i++);
    // ... chr2, chr3 similarly
}
// ...
enc1 = chr1 >> 2; // If chr1 > 255, enc1 can be >> 64
// ...
output.push(_keyStr.charAt(enc1) + /* ... */); // _keyStr.charAt(>64) is ""
// ...

Failure Example:
Input: "✓"
input.charCodeAt(0) is 9999.
enc1 becomes 2499.
_keyStr.charAt(2499) results in "".
Output: "w=="
Expected (UTF-8 based): "4pyT"

Impact: Direct string inputs to base64.encode containing multi-byte Unicode characters will produce invalid Base64. The array input path functions correctly with pre-converted byte arrays.

Suggested Improvement: To ensure correct Base64 encoding for all string inputs, the string processing path within base64.encode should first convert the input string to a UTF-8 byte sequence. This byte sequence can then be processed by the existing Base64 logic that handles array inputs.
Alternatively, if direct string encoding is intended only for ASCII, this limitation should be clearly documented.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions