Skip to content

Conversation

@ShotaroKataoka
Copy link
Contributor

Description of Changes

This PR fixes the duplicate document name error that occurred when uploading multiple Japanese files with the same character length (#1143, #1250).

Root Cause

The previous implementation replaced all non-ASCII characters with 'X', causing different Japanese filenames with the same length to become identical:

  • 県名リスト.xlsxXXXXX
  • 会員リスト.xlsxXXXXX

This resulted in AWS Bedrock API rejecting requests with:

ValidationException: Messages can't contain duplicate document names.

Solution

  1. Centralized filename sanitization logic - Created fileNameUtils.ts with convertToSafeFilename() function
  2. MD5 hash suffix - Appends 8-character hash only when non-ASCII characters are replaced, ensuring uniqueness:
    • 県名リスト.xlsx_____46a890b2
    • 会員リスト.xlsx_____5c4aa342
  3. Preserved readability - ASCII-only filenames remain unchanged without hash suffix
  4. Comprehensive tests - Added test coverage for various filename patterns

Changes

  • Extract filename conversion logic to fileNameUtils.ts
  • Replace inline implementations in bedrockAgentApi.ts and models.ts
  • Add MD5 hash suffix only when non-ASCII chars are replaced
  • Add comprehensive test coverage for filename conversion

Impact on Existing Users

No breaking changes or compatibility issues.

  • ASCII filenames remain unchanged
  • Japanese filenames now work correctly without errors
  • Existing functionality is preserved and enhanced

Checklist

  • Modified relevant documentation
  • Verified operation in local environment
  • Executed npm run cdk:test and if there are snapshot differences, execute npm run cdk:test:update-snapshot to update snapshots

Related Issues

Closes #1143
Closes #1250

- Extract filename conversion logic to fileNameUtils.ts
- Replace inline implementations in bedrockAgentApi.ts and models.ts
- Add MD5 hash suffix only when non-ASCII chars are replaced
- Add comprehensive test coverage for filename conversion

SPEC: fix/duplicate-ja-doc-bug
Progress: Implementation and testing completed
const result = convertToSafeFilename('file@#$.pdf');
expect(result).toBe('file____cf25ced4');
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test

@tbrand tbrand merged commit 2dcdd90 into aws-samples:main Oct 16, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants