Percent encoding |
in paths
#3479
Unanswered
nathaniel-daniel
asked this question in
Potential Issue
Replies: 1 comment
-
Should I open an issue/pr? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
OS: Windows 11
python --version
:Python 3.12.8
httpx
version:0.28.1
I believe the
|
should be percent encoded in paths, which is not currently the case. If I'm understanding RFC3986 correctly, path characters arepchar
, which can beunreserved
,pct-encoded
,sub-delims
,":"
, or"@"
.unreserved
can be composed ofALPHA
,DIGIT
,"-"
,"."
,"_"
, or"~"
.pct-encoded
is the percent encoding sequences.sub-delims
can be"!"
,"$"
,"&"
,"'"
,"("
,")"
,"*"
,"+"
,","
,";"
, or"="
. Nowhere in this set is the|
character present, meaning it has to be percent-encoded.Simplifying my problem,
httpx
seems to call its internalurlparse
function to process urls. So, here's an example using that function. This function normally percent-encodes characters as needed, like spaces:will return
However, this does not happen for
|
:will return
In Firefox and Google Chrome,
|
is percent-encoded:will return
In the
requests
library,|
is also percent-encoded:will return
The
rfc3986
library also percent encodes|
:will return
Using
urllib
itself,|
also seems to be percent-encoded for path components:will return
'/%7C'
I'm fairly certain that I've interpreted this RFC right, and I think that
|
should be excluded from thePATH_SAFE
set here. Here is its current value:"!$%&'()*+,-./0123456789:;=@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_abcdefghijklmnopqrstuvwxyz|~"
.Potential Fix: nathaniel-daniel@a2f327f
Beta Was this translation helpful? Give feedback.
All reactions