Fix regressions from 3.1.4 #530

kiilerix · 2020-04-05T17:16:04Z

No description provided.

g-k

Thanks for the PR!

This matches: background-image: 'url(javascript:alert("XSS"))' which we probably don't want. https://regex101.com/r/vVInfw/1

Also, there's overlap in the match groups so it might be vulnerable to ReDoS too.

kiilerix · 2020-04-06T18:16:19Z

Thanks for the feedback.
Everything with url() is sanitized a few lines above. I can thus not reproduce the url problem you mention. I can add a test with your test case.
But ok, it was a bit aggressive to also allow the slightly tricky () while adding the non-tricky characters. I can leave that out.
I don't see any overlap in match groups / matching. Can you say more about the problem you see?

g-k · 2020-04-06T19:06:58Z

Everything with url() is sanitized a few lines above. I can thus not reproduce the url problem you mention. I can add a test with your test case.

Good call. That'd be great!

We should also test ur/**/l(javascript:alert("XSS")) since that might get through.

I'd like to have a lot more "bad CSS" testcases before making the regex more permissive in a stable release (we could push a dev release in the meantime), since I don't have context around the original CSS filtering.

But ok, it was a bit aggressive to also allow the slightly tricky () while adding the non-tricky characters. I can leave that out.

Yes, please limit this PR to fixing the dashed attributes regression / #529.

I don't see any overlap in match groups / matching. Can you say more about the problem you see?

The ReDoS in GHSA-vqhp-cxgc-6wmm (with https://bugzilla.mozilla.org/show_bug.cgi?id=1623633#c0 for context) exploited the overlap in -, ', " alternation patterns and it looks like this PR adds - back. I couldn't find a string that triggers DoS though, so it might be fine.

kiilerix · 2020-04-06T19:52:03Z

The pre-3.1.4 code had the problem of matching -'"() both as ordinary characters and more constrained patterns. The 3.1.4 fix removed all handling as ordinary characters and left everything to other patterns. This PR is just moving - handling back to being an ordinary character and removing its special handling. Handling of the other characters doesn't change in this PR. I don't see any match ambiguity.

Note that all discussion seems to be about the the last changeset. The 3 first changesets are already limited to fixing the dashed attributes regression. Could we agree on the 3 first changesets and land them?

The discussion of the 4th changeset seems to be about not undoing some 3.1.4 changes that wasn't mentioned in commit message or Changelog. Since there hasn't been any claims of fixing any sanitize bugs, I would expect the ideal would be to get behaviour back to the pre-3.1.4 state but with the ReDoS fixed. I don't think this PR is making things more permissive. It is just fixing some things that 3.1.4 broke, while still leaving it much less permissive than 3.1.3 was.

For the 4th changeset:
I am not sure what new css security properties 3.1.4 established and which attacks it protects against. From what I can see (for example googling https://stackoverflow.com/questions/476276/using-javascript-in-css and https://dougrathbone.com/blog/2013/10/30/executing-javascript-inside-css-another-reason-to-whitelist-and-encode-user-input ) it is not actually possible to execute javascript from css. While "security in depth" sometimes makes sense, I don't think it is feasible to predict and protect against all kinds of browser non-compliance. Protecting against "strings that look like a css javascript url" would be a mis-feature; such strings are strings, and their content shouldn't be evaluated as css. Contrary to that: it is a bug if strings in css for no good reason can't contain for example : or ( or or ' or ; .

tony · 2020-05-06T20:44:07Z

Hi there, will this PR also include values with hyphens? e.g. top: -2rem? (This also are removed for us in 3.1.4)

KaTeX generates HTML like this to represent super/subscript.

If so, should the test include an example of a negative CSS value as well?

kiilerix · 2020-05-06T20:55:16Z

There was already a test of style="cursor: -moz-grab;"> - I think that gave sufficient coverage for values with leading hyphens. The test was failing and disabled for the release of 3.1.4.

tony · 2020-05-06T20:58:04Z

@kiilerix You are correct! I see it now. Thank you

kiilerix · 2020-05-06T21:45:17Z

Looking at this again, I still think my full PR is safe and an improvement.

As a minimal first step, the 3 first changesets do what g-k agreed on for fixing the - regressions introduced by 3.1.4 .

For fixing other 3.1.4 regressions, I still wonder what security issues it is trying to block.
I do not think it should block background-image: 'ur/**/l(javascript:alert("XSS"))' in style attributes - just like it shouldn't block that string if it occur as data in any other place in html.

It seems like it would have been better if 3.1.4 just fixed the DoS by removing superfluous regexp options:

         gauntlet = re.compile(
-            r"""^([-/:,#%.'"\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$""",
+            r"""^([-/:,#%.'"\s!\w]|\([\d,%\.\s]+\))*$""",
             flags=re.U
         )

That will not make it more permissive than 3.1.3 was but will just fix the 3.1.4 regression. Going back to this will thus be very suitable for a stable 3.1.6 release.

But as a next step I will now propose something more drastic:

I don't see how CSS values for whitelisted CSS attributes (in HTML that already has been parsed) ever can be dangerous in reasonably modern and wellbehaving browsers. It seems to me like the whole gauntlet check safely could be dropped.

(I can see how values with unquoted url() could trigger unfortunate network access ... but that is already handled so aggresively that it also not is valid when quoted. But if it is a concern that it slip through if written as ur/**/l() then it would be better to strip such comments before checking for url().)

rmihael · 2020-08-12T15:44:57Z

We also were hit by this regression. Is there any traction on this PR?

kiilerix · 2020-08-12T17:27:00Z

We also were hit by this regression. Is there any traction on this PR?

I don't know. It is unclear if the project owners had any undeclared intents with the 3.1.4 regressions. It seems like it is a big concern to avoid emitting url(javascript:...) even though it seems like it would be a browser bug if that meant anything - and especially if it had potential security implications.

tony · 2021-12-02T23:53:46Z

@kiilerix can you rebase?

Rework d6018f2 and choose an alternative approach to fixing bug 1623633. Before that change, '-' could be matched either by itself, or by '\w-\w'. The latter seemed to have the odd intent of avoiding leading or trailing '-' ... but also meant it allowed 'a-aa-a' but not 'a-a-a'. The primary intent with the change seemed to be to avoid a backtracking explosion exploiding the ambiguous matching of '-'. The chosen solution of allowing '\w-\w' but not '-' broke some tests and some real world use cases. There doesn't seem to be any risk from allowing leading or trailing or consecutive '-' in styles. Thus, instead, this change will choose the path of allowing '-' as a normal character and drop the special handling of '\w-\w'.

Rework d6018f2 and choose an alternative approach to handling ' and " while fixing bug 1623633. Before that change, ' and " could be matched either by themselves, or balanced around [\s\w]+ . The intent with the change seemed to be to avoid a potential backtracking explosion exploiting the ambiguous matching. The chosen solution of for example allowing 'foo' and "foo bar" but not 'foo-bar' broke some tests and some real world use cases. There doesn't seem to be any risk from allowing any of the "usual" characters inside balanced quotation marks. This change will expand the set of valid strings inside quotation marks without adding any ambiguity that can cause backtracking. We will accept all the characters that already are allowed outside quotation marks, but also accept the other kind of quotation marks, allow ( and ) , and empty strings.

g-k · 2022-01-05T15:06:49Z

I don't see how CSS values for whitelisted CSS attributes (in HTML that already has been parsed) ever can be dangerous in reasonably modern and wellbehaving browsers. It seems to me like the whole gauntlet check safely could be dropped.

Agreed. The gauntlet regex was added to bleach when it was a concern, but we can drop support for legacy browsers instead of continuing to tweak the regex.

willkg · 2022-02-07T17:22:23Z

I'd like to close this out and move forward with @g-k's idea in #530 (comment) . I think that'll be a lot better for everyone.

kiilerix · 2022-02-08T00:03:08Z

Nice to see some progress, one way or another ;-)

I subscribed to #627 . Is that the way to follow how it works out?

kiilerix mentioned this pull request Apr 6, 2020

support dashes in style attributes #529

Closed

g-k requested changes Apr 6, 2020

View reviewed changes

kiilerix added 2 commits December 3, 2021 19:56

sanitizer: Drop superfluous regexp escaping of . inside []

72bf259

sanitizer: Drop ambiguous regexp matching of space after '...'

1408c6a

kiilerix force-pushed the master branch from 9aa23ac to b62982e Compare December 3, 2021 19:11

kiilerix added 2 commits December 3, 2021 20:44

kiilerix force-pushed the master branch from b62982e to 3f4726f Compare December 3, 2021 20:00

kiilerix requested a review from g-k December 3, 2021 20:37

g-k mentioned this pull request Jan 5, 2022

update goals / non-goals #627

Closed

6 tasks

willkg closed this Feb 7, 2022

Fix regressions from 3.1.4 #530

Fix regressions from 3.1.4 #530

Uh oh!

Conversation

kiilerix commented Apr 5, 2020

Uh oh!

g-k left a comment

Choose a reason for hiding this comment

Uh oh!

kiilerix commented Apr 6, 2020

Uh oh!

g-k commented Apr 6, 2020

Uh oh!

kiilerix commented Apr 6, 2020

Uh oh!

tony commented May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kiilerix commented May 6, 2020

Uh oh!

tony commented May 6, 2020

Uh oh!

kiilerix commented May 6, 2020

Uh oh!

rmihael commented Aug 12, 2020

Uh oh!

kiilerix commented Aug 12, 2020

Uh oh!

tony commented Dec 2, 2021

Uh oh!

g-k commented Jan 5, 2022

Uh oh!

willkg commented Feb 7, 2022

Uh oh!

kiilerix commented Feb 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tony commented May 6, 2020 •

edited

Loading