Skip to content

Commit d34a2a2

Browse files
authored
[browser][non-icu] HybridGlobalization compare (#84249)
* Added support for hybrid globalization comparison. * Clean-up. Added exception test cases to docs. * Nit changes from @kg's review. * Applied @kg's review. * Revert unintentional change. * Refactor: all hybrid globalization js methods in one file. * Color the syntax. * Undeline the fact that this PR is only for WASM. * Move interop functions to proper location.
1 parent f18c88d commit d34a2a2

File tree

15 files changed

+658
-191
lines changed

15 files changed

+658
-191
lines changed

docs/design/features/hybrid-globalization.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,157 @@ Affected public APIs:
1818
- TextInfo.ToTitleCase.
1919

2020
Case change with invariant culture uses `toUpperCase` / `toLoweCase` functions that do not guarantee a full match with the original invariant culture.
21+
22+
**String comparison**
23+
24+
Affected public APIs:
25+
- CompareInfo.Compare,
26+
- String.Compare,
27+
- String.Equals.
28+
29+
The number of `CompareOptions` and `StringComparison` combinations is limited. Originally supported combinations can be found [here for CompareOptions](https://learn.microsoft.com/dotnet/api/system.globalization.compareoptions) and [here for StringComparison](https://learn.microsoft.com/dotnet/api/system.stringcomparison).
30+
31+
- `IgnoreWidth` is not supported because there is no equivalent in Web API. Throws `PlatformNotSupportedException`.
32+
``` JS
33+
let high = String.fromCharCode(65281) // %uff83 = テ
34+
let low = String.fromCharCode(12486) // %u30c6 = テ
35+
high.localeCompare(low, "ja-JP", { sensitivity: "case" }) // -1 ; case: a ≠ b, a = á, a ≠ A; expected: 0
36+
37+
let wide = String.fromCharCode(65345) // %uFF41 = a
38+
let narrow = "a"
39+
wide.localeCompare(narrow, "en-US", { sensitivity: "accent" }) // 0; accent: a ≠ b, a ≠ á, a = A; expected: -1
40+
```
41+
42+
For comparison where "accent" sensitivity is used, ignoring some type of character widths is applied and cannot be switched off (see: point about `IgnoreCase`).
43+
44+
- `IgnoreKanaType`:
45+
46+
It is always switched on for comparison with locale "ja-JP", even if this comparison option was not set explicitly.
47+
48+
``` JS
49+
let hiragana = String.fromCharCode(12353) // %u3041 = ぁ
50+
let katakana = String.fromCharCode(12449) // %u30A1 = ァ
51+
let enCmp = hiragana.localeCompare(katakana, "en-US") // -1
52+
let jaCmp = hiragana.localeCompare(katakana, "ja-JP") // 0
53+
```
54+
55+
For locales different than "ja-JP" it cannot be used separately (no equivalent in Web API) - throws `PlatformNotSupportedException`.
56+
57+
- `None`:
58+
59+
No equivalent in Web API for "ja-JP" locale. See previous point about `IgnoreKanaType`. For "ja-JP" it throws `PlatformNotSupportedException`.
60+
61+
- `IgnoreCase`, `CurrentCultureIgnoreCase`, `InvariantCultureIgnoreCase`
62+
63+
For `IgnoreCase | IgnoreKanaType`, argument `sensitivity: "accent"` is used.
64+
65+
``` JS
66+
let hiraganaBig = `${String.fromCharCode(12353)} A` // %u3041 = ぁ
67+
let katakanaSmall = `${String.fromCharCode(12449)} a` // %u30A1 = ァ
68+
hiraganaBig.localeCompare(katakanaSmall, "en-US", { sensitivity: "accent" }) // 0; accent: a ≠ b, a ≠ á, a = A
69+
```
70+
71+
Known exceptions:
72+
73+
| **character 1** | **character 2** | **CompareOptions** | **hybrid globalization** | **icu** | **comments** |
74+
|:---------------:|:---------------:|--------------------|:------------------------:|:-------:|:-------------------------------------------------------:|
75+
| a | `\uFF41`| IgnoreKanaType | 0 | -1 | applies to all wide-narrow chars |
76+
| `\u30DC`| `\uFF8E`| IgnoreCase | 1 | -1 | 1 is returned in icu when we additionally ignore width |
77+
| `\u30BF`| `\uFF80`| IgnoreCase | 0 | -1 | |
78+
79+
80+
For `IgnoreCase` alone, a comparison with default option: `sensitivity: "variant"` is used after string case unification.
81+
82+
``` JS
83+
let hiraganaBig = `${String.fromCharCode(12353)} A` // %u3041 = ぁ
84+
let katakanaSmall = `${String.fromCharCode(12449)} a` // %u30A1 = ァ
85+
let unchangedLocale = "en-US"
86+
let unchangedStr1 = hiraganaBig.toLocaleLowerCase(unchangedLocale);
87+
let unchangedStr2 = katakanaSmall.toLocaleLowerCase(unchangedLocale);
88+
unchangedStr1.localeCompare(unchangedStr2, unchangedLocale) // -1;
89+
let changedLocale = "ja-JP"
90+
let changedStr1 = hiraganaBig.toLocaleLowerCase(changedLocale);
91+
let changedStr2 = katakanaSmall.toLocaleLowerCase(changedLocale);
92+
changedStr1.localeCompare(changedStr2, changedLocale) // 0;
93+
```
94+
95+
From this reason, comparison with locale `ja-JP` `CompareOption` `IgnoreCase` and `StringComparison`: `CurrentCultureIgnoreCase` and `InvariantCultureIgnoreCase` behave like a combination `IgnoreCase | IgnoreKanaType` (see: previous point about `IgnoreKanaType`). For other locales the behavior is unchanged with the following known exceptions:
96+
97+
| **character 1** | **character 2** | **CompareOptions** | **hybrid globalization** | **icu** |
98+
|:------------------------------------------------:|:----------------------------------------------------------:|-----------------------------------|:------------------------:|:-------:|
99+
| `\uFF9E` (HALFWIDTH KATAKANA VOICED SOUND MARK) | `\u3099` (COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK) | None / IgnoreCase / IgnoreSymbols | 1 | 0 |
100+
101+
- `IgnoreNonSpace`
102+
103+
`IgnoreNonSpace` cannot be used separately without `IgnoreKanaType`. Argument `sensitivity: "case"` is used for comparison and it ignores both types of characters. Option `IgnoreNonSpace` alone throws `PlatformNotSupportedException`.
104+
105+
``` JS
106+
let hiraganaAccent = `${String.fromCharCode(12353)} á` // %u3041 = ぁ
107+
let katakanaNoAccent = `${String.fromCharCode(12449)} a` // %u30A1 = ァ
108+
hiraganaAccent.localeCompare(katakanaNoAccent, "en-US", { sensitivity: "case" }) // 0; case: a ≠ b, a = á, a ≠ A
109+
```
110+
111+
- `IgnoreNonSpace | IgnoreCase`
112+
Combination of `IgnoreNonSpace` and `IgnoreCase` cannot be used without `IgnoreKanaType`. Argument `sensitivity: "base"` is used for comparison and it ignores three types of characters. Combination `IgnoreNonSpace | IgnoreCase` alone throws `PlatformNotSupportedException`.
113+
114+
``` JS
115+
let hiraganaBigAccent = `${String.fromCharCode(12353)} A á` // %u3041 = ぁ
116+
let katakanaSmallNoAccent = `${String.fromCharCode(12449)} a a` // %u30A1 = ァ
117+
hiraganaBigAccent.localeCompare(katakanaSmallNoAccent, "en-US", { sensitivity: "base" }) // 0; base: a ≠ b, a = á, a = A
118+
```
119+
120+
- `IgnoreSymbols`
121+
122+
The subset of ignored symbols is limited to the symbols ignored by `string1.localeCompare(string2, locale, { ignorePunctuation: true })`. E.g. currency symbols, & are not ignored
123+
124+
``` JS
125+
let hiraganaAccent = `${String.fromCharCode(12353)} á` // %u3041 = ぁ
126+
let katakanaNoAccent = `${String.fromCharCode(12449)} a` // %u30A1 = ァ
127+
hiraganaBig.localeCompare(katakanaSmall, "en-US", { sensitivity: "base" }) // 0; base: a ≠ b, a = á, a = A
128+
```
129+
130+
- List of all `CompareOptions` combinations always throwing `PlatformNotSupportedException`:
131+
132+
`IgnoreCase`,
133+
134+
`IgnoreNonSpace`,
135+
136+
`IgnoreNonSpace | IgnoreCase`,
137+
138+
`IgnoreSymbols | IgnoreCase`,
139+
140+
`IgnoreSymbols | IgnoreNonSpace`,
141+
142+
`IgnoreSymbols | IgnoreNonSpace | IgnoreCase`,
143+
144+
`IgnoreWidth`,
145+
146+
`IgnoreWidth | IgnoreCase`,
147+
148+
`IgnoreWidth | IgnoreNonSpace`,
149+
150+
`IgnoreWidth | IgnoreNonSpace | IgnoreCase`,
151+
152+
`IgnoreWidth | IgnoreSymbols`
153+
154+
`IgnoreWidth | IgnoreSymbols | IgnoreCase`
155+
156+
`IgnoreWidth | IgnoreSymbols | IgnoreNonSpace`
157+
158+
`IgnoreWidth | IgnoreSymbols | IgnoreNonSpace | IgnoreCase`
159+
160+
`IgnoreKanaType | IgnoreWidth`
161+
162+
`IgnoreKanaType | IgnoreWidth | IgnoreCase`
163+
164+
`IgnoreKanaType | IgnoreWidth | IgnoreNonSpace`
165+
166+
`IgnoreKanaType | IgnoreWidth | IgnoreNonSpace | IgnoreCase`
167+
168+
`IgnoreKanaType | IgnoreWidth | IgnoreSymbols`
169+
170+
`IgnoreKanaType | IgnoreWidth | IgnoreSymbols | IgnoreCase`
171+
172+
`IgnoreKanaType | IgnoreWidth | IgnoreSymbols | IgnoreNonSpace`
173+
174+
`IgnoreKanaType | IgnoreWidth | IgnoreSymbols | IgnoreNonSpace | IgnoreCase`
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
// Licensed to the .NET Foundation under one or more agreements.
2+
// The .NET Foundation licenses this file to you under the MIT license.
3+
4+
using System.Runtime.CompilerServices;
5+
6+
internal static partial class Interop
7+
{
8+
internal static unsafe partial class JsGlobalization
9+
{
10+
[MethodImplAttribute(MethodImplOptions.InternalCall)]
11+
internal static extern unsafe int CompareString(out string exceptionMessage, in string culture, char* str1, int str1Len, char* str2, int str2Len, global::System.Globalization.CompareOptions options);
12+
}
13+
}
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
// Licensed to the .NET Foundation under one or more agreements.
2+
// The .NET Foundation licenses this file to you under the MIT license.
3+
4+
using System.Runtime.CompilerServices;
5+
6+
internal static partial class Interop
7+
{
8+
internal static unsafe partial class JsGlobalization
9+
{
10+
[MethodImplAttribute(MethodImplOptions.InternalCall)]
11+
internal static extern unsafe void ChangeCaseInvariant(out string exceptionMessage, char* src, int srcLen, char* dstBuffer, int dstBufferCapacity, bool bToUpper);
12+
[MethodImplAttribute(MethodImplOptions.InternalCall)]
13+
internal static extern unsafe void ChangeCase(out string exceptionMessage, in string culture, char* src, int srcLen, char* dstBuffer, int dstBufferCapacity, bool bToUpper);
14+
}
15+
}

src/libraries/Common/tests/TestUtilities/System/PlatformDetection.cs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -348,10 +348,14 @@ public static string GetDistroVersionString()
348348
private static readonly Lazy<bool> m_isInvariant = new Lazy<bool>(()
349349
=> (bool?)Type.GetType("System.Globalization.GlobalizationMode")?.GetProperty("Invariant", BindingFlags.NonPublic | BindingFlags.Static)?.GetValue(null) == true);
350350

351+
private static readonly Lazy<bool> m_isHybrid = new Lazy<bool>(()
352+
=> (bool?)Type.GetType("System.Globalization.GlobalizationMode")?.GetProperty("Hybrid", BindingFlags.NonPublic | BindingFlags.Static)?.GetValue(null) == true);
353+
351354
private static readonly Lazy<Version> m_icuVersion = new Lazy<Version>(GetICUVersion);
352355
public static Version ICUVersion => m_icuVersion.Value;
353356

354357
public static bool IsInvariantGlobalization => m_isInvariant.Value;
358+
public static bool IsHybridGlobalizationOnWasm => m_isHybrid.Value && (IsBrowser || IsWasi);
355359
public static bool IsNotInvariantGlobalization => !IsInvariantGlobalization;
356360
public static bool IsIcuGlobalization => ICUVersion > new Version(0, 0, 0, 0);
357361
public static bool IsNlsGlobalization => IsNotInvariantGlobalization && !IsIcuGlobalization;

0 commit comments

Comments
 (0)