-
Notifications
You must be signed in to change notification settings - Fork 65
Open
Description
I've noticed a pretty annoying problem on some websites (I think there are at least a thousand of them in Alexa 1M).
An unclosed Iframe tag breaks all the HTML below it.
Here is an example:
<noscript>
<iframe
height="0" width="0" data-src="https://www.googletagmanager.com/ns.html?id=GTM-M5RK4MW" class="lazyload"
src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==">
<noscript>
<iframe src="https://www.googletagmanager.com/ns.html?id=GTM-M5RK4MW"
height="0" width="0">
</noscript>
</iframe>
</noscript>
It's missing the closing iframe
tag but still works when parsing it using Modest.
But for some reason, if you open it in Chrome (to render the javascript parts) and dump HTML, you get this:
<noscript>
<iframe
height="0" width="0" data-src="https://www.googletagmanager.com/ns.html?id=" class="lazyload"
src="data:image/gif;base64,R0lGODlhAQ">
<noscript>
<iframe src="https://www.googletagmanager.com/ns.html?id=" height="0" width="0">
</noscript>
Now there are no closing tags for both iframes.
The problem with this is that Modest will ignore everything after such a tag:
<noscript>
<iframe data-src="https://www.googletagmanager.com/ns.html?id=">
</noscript>
<script></script>
<script></script>
<script></script>
Seaching for script
nodes using myhtml_get_nodes_by_name
or using CSS selectors returns no results.
@lexborisov Are there any ways to improve this? Other parsers can still handle this.
Metadata
Metadata
Assignees
Labels
No labels