Potential for unexpected output from base64.encode with non-ASCII string inputs

**Problem**: The base64.encode function in base64.js does not correctly encode strings containing non-ASCII characters when the input is a string type. It directly uses input.charCodeAt() values, which are UTF-16 code units, in bitwise operations designed for 8-bit bytes. This leads to out-of-bounds indexing on _keyStr and incorrect Base64 output.

**Affected Code (lib/base64.js)**:
```
// ...
if (!isArray) { // String input path
    chr1 = input.charCodeAt(i++);
    // ... chr2, chr3 similarly
}
// ...
enc1 = chr1 >> 2; // If chr1 > 255, enc1 can be >> 64
// ...
output.push(_keyStr.charAt(enc1) + /* ... */); // _keyStr.charAt(>64) is ""
// ...
```
**Failure Example**:
Input: "✓"
input.charCodeAt(0) is 9999.
enc1 becomes 2499.
_keyStr.charAt(2499) results in "".
Output: "w=="
Expected (UTF-8 based): "4pyT"

**Impact**: Direct string inputs to base64.encode containing multi-byte Unicode characters will produce invalid Base64. The array input path functions correctly with pre-converted byte arrays.

**Suggested Improvement**: To ensure correct Base64 encoding for all string inputs, the string processing path within base64.encode should first convert the input string to a UTF-8 byte sequence. This byte sequence can then be processed by the existing Base64 logic that handles array inputs.
Alternatively, if direct string encoding is intended only for ASCII, this limitation should be clearly documented.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Potential for unexpected output from base64.encode with non-ASCII string inputs #950

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Potential for unexpected output from base64.encode with non-ASCII string inputs #950

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions