Skip to content

Conversation

kiilerix
Copy link

@kiilerix kiilerix commented Apr 5, 2020

No description provided.

Copy link
Collaborator

@g-k g-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

This matches: background-image: 'url(javascript:alert("XSS"))' which we probably don't want. https://regex101.com/r/vVInfw/1

Also, there's overlap in the match groups so it might be vulnerable to ReDoS too.

@kiilerix
Copy link
Author

kiilerix commented Apr 6, 2020

Thanks for the feedback.
Everything with url() is sanitized a few lines above. I can thus not reproduce the url problem you mention. I can add a test with your test case.
But ok, it was a bit aggressive to also allow the slightly tricky () while adding the non-tricky characters. I can leave that out.
I don't see any overlap in match groups / matching. Can you say more about the problem you see?

@g-k
Copy link
Collaborator

g-k commented Apr 6, 2020

Everything with url() is sanitized a few lines above. I can thus not reproduce the url problem you mention. I can add a test with your test case.

Good call. That'd be great!

We should also test ur/**/l(javascript:alert("XSS")) since that might get through.

I'd like to have a lot more "bad CSS" testcases before making the regex more permissive in a stable release (we could push a dev release in the meantime), since I don't have context around the original CSS filtering.

But ok, it was a bit aggressive to also allow the slightly tricky () while adding the non-tricky characters. I can leave that out.

Yes, please limit this PR to fixing the dashed attributes regression / #529.

I don't see any overlap in match groups / matching. Can you say more about the problem you see?

The ReDoS in GHSA-vqhp-cxgc-6wmm (with https://bugzilla.mozilla.org/show_bug.cgi?id=1623633#c0 for context) exploited the overlap in -, ', " alternation patterns and it looks like this PR adds - back. I couldn't find a string that triggers DoS though, so it might be fine.

@kiilerix
Copy link
Author

kiilerix commented Apr 6, 2020

The pre-3.1.4 code had the problem of matching -'"() both as ordinary characters and more constrained patterns. The 3.1.4 fix removed all handling as ordinary characters and left everything to other patterns. This PR is just moving - handling back to being an ordinary character and removing its special handling. Handling of the other characters doesn't change in this PR. I don't see any match ambiguity.

Note that all discussion seems to be about the the last changeset. The 3 first changesets are already limited to fixing the dashed attributes regression. Could we agree on the 3 first changesets and land them?

The discussion of the 4th changeset seems to be about not undoing some 3.1.4 changes that wasn't mentioned in commit message or Changelog. Since there hasn't been any claims of fixing any sanitize bugs, I would expect the ideal would be to get behaviour back to the pre-3.1.4 state but with the ReDoS fixed. I don't think this PR is making things more permissive. It is just fixing some things that 3.1.4 broke, while still leaving it much less permissive than 3.1.3 was.

For the 4th changeset:
I am not sure what new css security properties 3.1.4 established and which attacks it protects against. From what I can see (for example googling https://stackoverflow.com/questions/476276/using-javascript-in-css and https://dougrathbone.com/blog/2013/10/30/executing-javascript-inside-css-another-reason-to-whitelist-and-encode-user-input ) it is not actually possible to execute javascript from css. While "security in depth" sometimes makes sense, I don't think it is feasible to predict and protect against all kinds of browser non-compliance. Protecting against "strings that look like a css javascript url" would be a mis-feature; such strings are strings, and their content shouldn't be evaluated as css. Contrary to that: it is a bug if strings in css for no good reason can't contain for example : or ( or or ' or ; .

@tony
Copy link

tony commented May 6, 2020

Hi there, will this PR also include values with hyphens? e.g. top: -2rem? (This also are removed for us in 3.1.4)

KaTeX generates HTML like this to represent super/subscript.

If so, should the test include an example of a negative CSS value as well?

@kiilerix
Copy link
Author

kiilerix commented May 6, 2020

There was already a test of style="cursor: -moz-grab;"> - I think that gave sufficient coverage for values with leading hyphens. The test was failing and disabled for the release of 3.1.4.

@tony
Copy link

tony commented May 6, 2020

@kiilerix You are correct! I see it now. Thank you

image

@kiilerix
Copy link
Author

kiilerix commented May 6, 2020

Looking at this again, I still think my full PR is safe and an improvement.

As a minimal first step, the 3 first changesets do what g-k agreed on for fixing the - regressions introduced by 3.1.4 .

For fixing other 3.1.4 regressions, I still wonder what security issues it is trying to block.
I do not think it should block background-image: 'ur/**/l(javascript:alert("XSS"))' in style attributes - just like it shouldn't block that string if it occur as data in any other place in html.

It seems like it would have been better if 3.1.4 just fixed the DoS by removing superfluous regexp options:

         gauntlet = re.compile(
-            r"""^([-/:,#%.'"\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$""",
+            r"""^([-/:,#%.'"\s!\w]|\([\d,%\.\s]+\))*$""",
             flags=re.U
         )

That will not make it more permissive than 3.1.3 was but will just fix the 3.1.4 regression. Going back to this will thus be very suitable for a stable 3.1.6 release.

But as a next step I will now propose something more drastic:

I don't see how CSS values for whitelisted CSS attributes (in HTML that already has been parsed) ever can be dangerous in reasonably modern and wellbehaving browsers. It seems to me like the whole gauntlet check safely could be dropped.

(I can see how values with unquoted url() could trigger unfortunate network access ... but that is already handled so aggresively that it also not is valid when quoted. But if it is a concern that it slip through if written as ur/**/l() then it would be better to strip such comments before checking for url().)

@rmihael
Copy link

rmihael commented Aug 12, 2020

We also were hit by this regression. Is there any traction on this PR?

@kiilerix
Copy link
Author

We also were hit by this regression. Is there any traction on this PR?

I don't know. It is unclear if the project owners had any undeclared intents with the 3.1.4 regressions. It seems like it is a big concern to avoid emitting url(javascript:...) even though it seems like it would be a browser bug if that meant anything - and especially if it had potential security implications.

@tony
Copy link

tony commented Dec 2, 2021

@kiilerix can you rebase?

Rework d6018f2 and choose an alternative
approach to fixing bug 1623633.

Before that change, '-' could be matched either by itself, or by '\w-\w'. The
latter seemed to have the odd intent of avoiding leading or trailing '-' ...
but also meant it allowed 'a-aa-a' but not 'a-a-a'.

The primary intent with the change seemed to be to avoid a backtracking
explosion exploiding the ambiguous matching of '-'.

The chosen solution of allowing '\w-\w' but not '-' broke some tests and some
real world use cases.

There doesn't seem to be any risk from allowing leading or trailing or
consecutive '-' in styles. Thus, instead, this change will choose the path of
allowing '-' as a normal character and drop the special handling of '\w-\w'.
Rework d6018f2 and choose an alternative
approach to handling ' and " while fixing bug 1623633.

Before that change, ' and " could be matched either by themselves, or balanced
around [\s\w]+ .

The intent with the change seemed to be to avoid a potential backtracking
explosion exploiting the ambiguous matching.

The chosen solution of for example allowing 'foo' and "foo bar" but not
'foo-bar' broke some tests and some real world use cases.

There doesn't seem to be any risk from allowing any of the "usual" characters
inside balanced quotation marks.

This change will expand the set of valid strings inside quotation marks without
adding any ambiguity that can cause backtracking. We will accept all the
characters that already are allowed outside quotation marks, but also accept
the other kind of quotation marks, allow ( and ) , and empty strings.
@g-k
Copy link
Collaborator

g-k commented Jan 5, 2022

I don't see how CSS values for whitelisted CSS attributes (in HTML that already has been parsed) ever can be dangerous in reasonably modern and wellbehaving browsers. It seems to me like the whole gauntlet check safely could be dropped.

Agreed. The gauntlet regex was added to bleach when it was a concern, but we can drop support for legacy browsers instead of continuing to tweak the regex.

@g-k g-k mentioned this pull request Jan 5, 2022
6 tasks
@willkg
Copy link
Member

willkg commented Feb 7, 2022

I'd like to close this out and move forward with @g-k's idea in #530 (comment) . I think that'll be a lot better for everyone.

@willkg willkg closed this Feb 7, 2022
@kiilerix
Copy link
Author

kiilerix commented Feb 8, 2022

Nice to see some progress, one way or another ;-)

I subscribed to #627 . Is that the way to follow how it works out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants