Skip to content

Escaping of special characters inside attributes causes EPUB compatibility issues #708

@hi-slava

Description

@hi-slava

Summary:

The sanitizer escapes special characters such as <, >, &, and " inside HTML attribute values by converting them to entities like &lt;, &gt;, &amp;, and &quot;. While this is generally correct for HTML, it can cause problems in specific contexts like EPUB alt attributes or other environments expecting plain text.


Expected behavior:

In some use cases (e.g., EPUB alt attributes), special characters inside attribute values should remain unescaped to maintain compatibility with downstream tools and readers.


Actual behavior:

Special characters are always escaped inside attribute values. For example, an alt attribute with:
<img alt="diagram action->dispatcher->store->view" />
becomes
<img alt="diagram action-&gt;dispatcher-&gt;store-&gt;view" />
which causes Kindle EPUB readers to reject or mishandle the file.


Why this matters:

Certain platforms or file formats require literal characters in attributes, and the current escaping breaks these use cases, leading to rendering errors or outright rejection of the content.


Minimal reproducible example:

Input:
<img alt="Use > and < characters in alt" />
Output after sanitization:
<img alt="Use &gt; and &lt; characters in alt" />


Version:

[email protected]


Workarounds:

Currently manually post-processing sanitized HTML to unescape special characters in attributes or remove special characters before sanitization.


Request:

Consider adding:
An option to disable or customize escaping of special characters inside attributes, such as escapeAttributes: false.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions