Skip to content

Conversation

@krikera
Copy link
Contributor

@krikera krikera commented May 30, 2025

This fix strips the UTF-8 BOM character from the first line before performing syntax detection using strip_prefix('\u{feff}'), ensuring that files with BOM are handled correctly.

Changes

  • Modified get_first_line_syntax in src/assets.rs to strip BOM before pattern matching
  • Added comprehensive test coverage for XML, shell scripts, and PHP files with BOM
  • Updated CHANGELOG.md

Fixes #3314

Copy link
Collaborator

@keith-hall keith-hall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks

@keith-hall keith-hall merged commit 6886cda into sharkdp:master May 31, 2025
23 of 24 checks passed
@krikera krikera deleted the fix-utf8-bom-syntax-detection branch June 1, 2025 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Detection fails when file is saved as UTF-8 with BOM

2 participants