-
Notifications
You must be signed in to change notification settings - Fork 370
Description
Summary:
The sanitizer escapes special characters such as <, >, &, and " inside HTML attribute values by converting them to entities like <, >, &, and ". While this is generally correct for HTML, it can cause problems in specific contexts like EPUB alt attributes or other environments expecting plain text.
Expected behavior:
In some use cases (e.g., EPUB alt attributes), special characters inside attribute values should remain unescaped to maintain compatibility with downstream tools and readers.
Actual behavior:
Special characters are always escaped inside attribute values. For example, an alt attribute with:
<img alt="diagram action->dispatcher->store->view" />
becomes
<img alt="diagram action->dispatcher->store->view" />
which causes Kindle EPUB readers to reject or mishandle the file.
Why this matters:
Certain platforms or file formats require literal characters in attributes, and the current escaping breaks these use cases, leading to rendering errors or outright rejection of the content.
Minimal reproducible example:
Input:
<img alt="Use > and < characters in alt" />
Output after sanitization:
<img alt="Use > and < characters in alt" />
Version:
Workarounds:
Currently manually post-processing sanitized HTML to unescape special characters in attributes or remove special characters before sanitization.
Request:
Consider adding:
An option to disable or customize escaping of special characters inside attributes, such as escapeAttributes: false.