Compiler does not correctly interpret surrogate pairs when used in an identifier

The C# specification states that an `identifier` can start with or contain anything matching `letter-character`, which is defined as:

```
letter-character::
A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
A unicode-escape-sequence representing a character of classes Lu, Ll, Lt, Lm, Lo, or Nl
```

However, the compiler does not appear to correctly interpret some characters which match the above categories if they are part of a surrogate pair.

For example the sumerian character `𒅴` is categorized as 'OtherLetter' (matching 'Lo' above) when processed through `char.GetUnicodeCategory("𒅴", 0)`.

However, the compiler is interpreting this character as two separate characters (and reporting CS1056 for both). It is likely checking each character individually, rather than checking if the first character is part of a surrogate pair and interpreting the character appropriately if it is.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compiler does not correctly interpret surrogate pairs when used in an identifier #9731

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compiler does not correctly interpret surrogate pairs when used in an identifier #9731

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions