Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion url/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,14 @@ bencher = "0.1"

[dependencies]
form_urlencoded = { version = "1.0.0", path = "../form_urlencoded" }
idna = { version = "0.2.0", path = "../idna" }
idna = { version = "0.2.0", path = "../idna", optional = true }
matches = "0.1"
percent-encoding = { version = "2.1.0", path = "../percent_encoding" }
serde = {version = "1.0", optional = true, features = ["derive"]}

[features]
default = ["idna"]

[[bench]]
name = "parse_url"
path = "benches/parse_url.rs"
Expand Down
5 changes: 5 additions & 0 deletions url/src/host.rs
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,12 @@ impl Host<String> {
return parse_ipv6addr(&input[1..input.len() - 1]).map(Host::Ipv6);
}
let domain = percent_decode(input.as_bytes()).decode_utf8_lossy();

#[cfg(feature = "idna")]
let domain = idna::domain_to_ascii(&domain)?;
#[cfg(not(feature = "idna"))]
let domain = domain.to_string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should panic if domain is not ASCII? I think the rest of the code expects Host::Domain to hold the normalized domain name, so this way we don't introduce weird incompatibilities.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not panic, but return an error?
Not only if the domain is not ASCII, but also if the domain contains any xn-- labels - which we can't know if they are correct or no.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue panicking is correct since it's an "unrecoverable programmer error"; disabling the idna feature should only be done when the programmer is absolutely sure the program won't need to handle IDNs; if that assumption turns out to be false, there is no way to handle this gracefully.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I happen to be a backend engineer also, panic is the last thing I expect a base lib to do, especially when it may handle external input. I disable the idna doesn't mean I want my service panic for idna input...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panic should used when the program enter some bad state and not easy to recover. For this case, it has no problem to recover.


if domain.is_empty() {
return Err(ParseError::EmptyHost);
}
Expand Down
5 changes: 5 additions & 0 deletions url/src/origin.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
use crate::host::Host;
use crate::parser::default_port;
use crate::Url;
#[cfg(feature = "idna")]
use idna::domain_to_unicode;
use std::sync::atomic::{AtomicUsize, Ordering};

Expand Down Expand Up @@ -93,7 +94,11 @@ impl Origin {
Origin::Tuple(ref scheme, ref host, port) => {
let host = match *host {
Host::Domain(ref domain) => {
#[cfg(feature = "idna")]
let (domain, _errors) = domain_to_unicode(domain);
#[cfg(not(feature = "idna"))]
let domain = domain.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like with the above comment, this should maybe panic if domain would change after idna::domain_to_unicode (if we can determine that easily?); otherwise, this function would return two different things based on whether the feature is enabled or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If idna feature disabled, then the whole fn unicode_serialization can be opt out? I searched spec about unicode_serialization, it is strictly related with idna.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://html.spec.whatwg.org/multipage/origin.html#unicode-serialisation-of-an-origin now the spec even deletes the section of unicode serialization, just saying "There used to also be a Unicode serialization of an origin. However, it was never widely adopted."


Host::Domain(domain)
}
_ => host.clone(),
Expand Down
1 change: 1 addition & 0 deletions url/src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ simple_enum_error! {
Overflow => "URLs more than 4 GB are not supported",
}

#[cfg(feature = "idna")]
impl From<::idna::Errors> for ParseError {
fn from(_: ::idna::Errors) -> ParseError {
ParseError::IdnaError
Expand Down
1 change: 1 addition & 0 deletions url/src/quirks.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ pub fn domain_to_ascii(domain: &str) -> String {
}

/// https://url.spec.whatwg.org/#dom-url-domaintounicode
#[cfg(feature = "idna")]
pub fn domain_to_unicode(domain: &str) -> String {
match Host::parse(domain) {
Ok(Host::Domain(ref domain)) => {
Expand Down
1 change: 1 addition & 0 deletions url/tests/unit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -703,6 +703,7 @@ fn test_set_href() {
);
}

#[cfg(feature = "idna")]
#[test]
fn test_domain_encoding_quirks() {
use url::quirks::{domain_to_ascii, domain_to_unicode};
Expand Down