-
-
Notifications
You must be signed in to change notification settings - Fork 132
fix(core): handle backspace decomposition #15488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Ensure that when a single backspace decomposes the last NFC character in the app context, the remainder of the 'cluster' is preserved, matching the implication of the CLDR keyboard specification. This addresses the behavior in #15487 where the cached context became out of sync with the app context after deleting an entire NFC cluster such as ê, which caused a loop ending up with the entire context being deleted, at which point the loop exited with a fail-safe. Note that the LDML keyboard tests (ldml.cpp) do not currently exercise the normalization code; this is a gap that should be addressed to ensure that we are testing final application behavior. Fixes: #15487
User Test ResultsTest specification and instructions
Results TemplateTest Artifacts
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix is entirely in this file; other changes are additional unit tests and enabling correct logging of keystrokes in the unit test log files.
|
@rc-swag @sgschantz @ermshiperete given this change is a significant fix to Keyman Core, I would like to request review from all of you. @srl295, feel free to ignore; just as you have capacity to take a look. |
| @@expected: abce | ||
| Comment: #15487: bksp will delete just acute (even though it will have combined to NFC, internally we do NFD bksp) | ||
|
|
||
| TODO: this seems to be working NFD on app context side also, do we need to set a flag? this may break other tests so care is needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: this seems to be working NFD on app context side also, do we need to set a flag? this may break other tests so care is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #15491
Ensure that when a single backspace decomposes the last NFC character in the app context, the remainder of the 'cluster' is preserved, matching the implication of the CLDR keyboard specification. This addresses the behavior in #15487 where the cached context became out of sync with the app context after deleting an entire NFC cluster such as ê, which caused a loop ending up with the entire context being deleted, at which point the loop exited with a fail-safe. Note that the LDML keyboard tests (ldml.cpp) do not currently exercise the normalization code; this is a gap that should be addressed to ensure that we are testing final application behavior. This cherry-pick only adds the direct normalization patch and unit test and skips the additional changes for logging. Fixes: #15487 Cherry-pick-of: #15488
| if(app_context_nfd == cached_context_string.substr(0, app_context_nfd.length())) { | ||
| context_final = cached_context_string.substr(app_context_nfd.length()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe want some more comments here.
What we're trying to say is, if the app context matches the beginning of the cache,
break but remember if there was any content that was added after the cache.
So context_final will now have any 'suffix' that is NOT reflected in the app context.
| */ | ||
|
|
||
| auto output_nfc = output; | ||
| auto output_nfc = context_final + output; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, any of that 'final' suffix that was NOT already in the app context, needs to be prepended to the output.
so the full picture before the insertion point is:
<app_context><context_final><output>|
|
looks good! i think it could be good to add some comments and even put the case into the comments (the example case) to document what |
Co-authored-by: [email protected]
|
@mcdurdin For the E-ACUTE test cases, shouldn't the final unicode value be U+00E9? |
Thank you, good catch -- I had started with A-ACUTE... but then wanted to differentiate. |
When normalizing, we need to stop processing on an NFC boundary, not an NFD boundary, to support normalizations such as in Bengali, where appending `U+09D7` to a context of `U+0995 U+09C7` should result in `U+0995 U+09CC`. The specification is unclear on this; see https://unicode-org.atlassian.net/browse/CLDR-19218 This also updates the ldml keyboard unit test suite to support running in full NFC mode (used in all Engine implementations) as well retaining the NFD mode (now only used by the debugger). Side note: the Bengali normalization failure case was picked up by the improvements to the unit test suite, proving once again that good tests are so valuable. Fixes: #15491 Fixes: #15505 Follows: #15488 Relates-to: CLDR-19218
When normalizing, we need to stop processing on an NFC boundary, not an NFD boundary, to support normalizations such as in Bengali, where appending `U+09D7` to a context of `U+0995 U+09C7` should result in `U+0995 U+09CC`. The specification is unclear on this; see https://unicode-org.atlassian.net/browse/CLDR-19218 This also updates the ldml keyboard unit test suite to support running in full NFC mode (used in all Engine implementations) as well retaining the NFD mode (now only used by the debugger). Side note: the Bengali normalization failure case was picked up by the improvements to the unit test suite, proving once again that good tests are so valuable. Fixes: #15491 Fixes: #15505 Follows: #15488 Relates-to: CLDR-19218
When normalizing, we need to stop processing on an NFC boundary, not an NFD boundary, to support normalizations such as in Bengali, where appending `U+09D7` to a context of `U+0995 U+09C7` should result in `U+0995 U+09CC`. The specification is unclear on this; see https://unicode-org.atlassian.net/browse/CLDR-19218 This also updates the ldml keyboard unit test suite to support running in full NFC mode (used in all Engine implementations) as well retaining the NFD mode (now only used by the debugger). Side note: the Bengali normalization failure case was picked up by the improvements to the unit test suite, proving once again that good tests are so valuable. Fixes: #15491 Fixes: #15505 Follows: #15488 Relates-to: CLDR-19218
| test_actions_normalize( | ||
| "One backspace to delete last NFD character (#15487)", | ||
| /* app context pre transform: */ u"abcê", // NFC | ||
| /* cached context post transform: */ u"abce", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the app context pre transform is the same as l.327 (both have NFC \u00EA),
shouldn't cached context post transform match l.328? (u"abce\u0323\u0302")
Ensure that when a single backspace decomposes the last NFC character in the app context, the remainder of the 'cluster' is preserved, matching the implication of the CLDR keyboard specification.
This addresses the behavior in #15487 where the cached context became out of sync with the app context after deleting an entire NFC cluster such as ê, which caused a loop ending up with the entire context being deleted, at which point the loop exited with a fail-safe.
Note that the LDML keyboard tests (ldml.cpp) do not currently exercise the normalization code; this is a gap that should be addressed to ensure that we are testing final application behavior.
Fixes: #15487
User Testing
Preparation: install the keyboard bksp_ldml.kmp from the archive below.
For these tests, always type the following key sequences:
GROUP_WINDOWS: Test on Windows 11
GROUP_MAC: Test on macOS
GROUP_LINUX: Test on Linux
TEST_COMPLIANT_E_ACUTE_VERIFY_CONTEXT: In a compliant app, type the E-ACUTE sequence. Select the text, copy and paste it into a character viewer and verify that the last character is
U+00E9.TEST_COMPLIANT_E_ACUTE: In a compliant app, type the E-ACUTE sequence. Then press bksp. The result should be
abc e.TEST_NONCOMPLIANT_E_ACUTE_VERIFY_CONTEXT: In a non-compliant app, type the E-ACUTE sequence. Select the text, copy and paste it into a character viewer and verify that the last character is
U+00E9.TEST_NONCOMPLIANT_E_ACUTE: In a non-compliant app, type the E-ACUTE sequence. Then press bksp. The result should be
abc e.TEST_COMPLIANT_D_ACUTE_VERIFY_CONTEXT: In a compliant app, type the D-ACUTE sequence. Select the text, copy and paste it into a character viewer and verify that the last two characters are
U+0064 U+0301.TEST_COMPLIANT_D_ACUTE: In a compliant app, type the D-ACUTE sequence. Then press bksp. The result should be
abc d.TEST_NONCOMPLIANT_D_ACUTE_VERIFY_CONTEXT: In a non-compliant app, type the D-ACUTE sequence. Select the text, copy and paste it into a character viewer and verify that the last two characters are
U+0064 U+0301.TEST_NONCOMPLIANT_D_ACUTE: In a non-compliant app, type the D-ACUTE sequence. Then press bksp. The result should be
abc d.TEST_COMPLIANT_A_UMLAUT_VERIFY_CONTEXT: In a compliant app, type the A-UMLAUT sequence. Select the text, copy and paste it into a character viewer and verify that the last character is
U+00E4.TEST_COMPLIANT_A_UMLAUT: In a compliant app, type the A-UMLAUT sequence. Then press bksp. The result should be
abc a.TEST_NONCOMPLIANT_A_UMLAUT_VERIFY_CONTEXT: In a non-compliant app, type the A-UMLAUT sequence. Select the text, copy and paste it into a character viewer and verify that the last character is
U+00E4.TEST_NONCOMPLIANT_A_UMLAUT: In a non-compliant app, type the A-UMLAUT sequence. Then press bksp. The result should be
abc a.Note on compliant and non-compliant apps