-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Open
Labels
Area-CompilersBugLanguage-C#Tenet-LocalizationSome piece of UI isn’t localized, often due to hard-coding of strings or other visible elements.Some piece of UI isn’t localized, often due to hard-coding of strings or other visible elements.help wantedThe issue is "up for grabs" - add a comment if you are interested in working on itThe issue is "up for grabs" - add a comment if you are interested in working on it
Milestone
Description
The C# specification states that an identifier
can start with or contain anything matching letter-character
, which is defined as:
letter-character::
A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
A unicode-escape-sequence representing a character of classes Lu, Ll, Lt, Lm, Lo, or Nl
However, the compiler does not appear to correctly interpret some characters which match the above categories if they are part of a surrogate pair.
For example the sumerian character 𒅴
is categorized as 'OtherLetter' (matching 'Lo' above) when processed through char.GetUnicodeCategory("𒅴", 0)
.
However, the compiler is interpreting this character as two separate characters (and reporting CS1056 for both). It is likely checking each character individually, rather than checking if the first character is part of a surrogate pair and interpreting the character appropriately if it is.
HrmsTrsmgs, enlo, ChiiAyano, KenjiMurata, exyi and 1 moreysmgthntt
Metadata
Metadata
Assignees
Labels
Area-CompilersBugLanguage-C#Tenet-LocalizationSome piece of UI isn’t localized, often due to hard-coding of strings or other visible elements.Some piece of UI isn’t localized, often due to hard-coding of strings or other visible elements.help wantedThe issue is "up for grabs" - add a comment if you are interested in working on itThe issue is "up for grabs" - add a comment if you are interested in working on it