-
Notifications
You must be signed in to change notification settings - Fork 355
Reimplement idna on top of ICU4X #923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 14 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
a8977a8
Reimplement idna on top of ICU4X
hsivonen 09765af
Add an even faster lower-case ASCII letter path to avoid regressing p…
hsivonen 7e929ce
Comments and verify_dns_length tweak
hsivonen f413387
Parametrize internal vs. external Punycode caller; restore external A…
hsivonen 71c03b9
Add bench for to_ascii on an already-Punycode name
hsivonen 9af00cb
Avoid re-encoding Punycode when possible
hsivonen dc8f301
Pass through the input slice in many more cases
hsivonen 41e0192
Add testing for the simultaneous mode
hsivonen 41f2107
Omit the invalid domain character check on the url side
hsivonen 4d7d41a
Document that Punycode labels must result in non-ASCII
hsivonen 98ca752
Rename files called uts46.rs to deprecated.rs
hsivonen 4bbabe9
Rename uts46bis to uts46
hsivonen 7dc0082
Tweak docs
hsivonen f8eb96e
Avoid useless copying and useless UTF-8 decode
hsivonen eb6e3d5
Use inline(never) to optimize binary size
hsivonen ce3d4d1
Split CheckHyphens into a separate concern form the ASCII deny list
hsivonen 6672161
Make the ASCII deny list customizable
hsivonen 90fe4b3
Better docs and top-level functions
hsivonen 50381ff
Parameter for VerifyDNSLength
hsivonen 8268c5a
Restore support for transitional processing to minimize breakage
hsivonen 999bef4
In the deprecated API, use empty deny list with use_std3_ascii_rules=…
hsivonen b277c85
Tweak docs
hsivonen 980348c
Docs, rename AsciiDenyList::WHATWG to ::URL, tweak top-level functions
hsivonen 4efd589
Use idna crate top-level function in the url crate to dogfood the top…
hsivonen da6cf50
Add an Usage section to the README
hsivonen d938024
Add an early return to map_transitional for readability
hsivonen 679edb9
Document internal vs. external Punycode caller differences
hsivonen 4f605c9
Per discussion with Valentin, revert deprecated API to the old behavi…
hsivonen bbf4308
Add comments about not fixing deprecated API
hsivonen e842dae
Merge branch 'main' into icu4x
hsivonen 6690c49
Add a comment explaining FailFast in deprecated.rs
hsivonen 38cedad
For future-proofing, add compiled_data cargo feature (currently alway…
hsivonen 52137e7
Remove remark about spec violation by making root dot permissibility …
hsivonen 081f44b
Clarify README about IDNA 2003/2008
hsivonen aaa7a40
Add a historical remark to the README
hsivonen 8b03034
Fix typo
hsivonen c8a4bd3
Depend on crates.io versions of icu_normalizer and icu_properties
hsivonen be3db8e
Address clippy lints
hsivonen 6020673
Update versions
hsivonen 245c514
Increment dependency versions
hsivonen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# `idna` | ||
|
||
IDNA library for Rust implementing [UTS 46: Unicode IDNA Compatibility Processing](https://www.unicode.org/reports/tr46/) as parametrized by the [WHATWG URL Standard](https://url.spec.whatwg.org/#idna). | ||
|
||
## What it does | ||
|
||
* An implementation of the non-transitional mode of UTS 46 is provided, both with STD3 rules and WHATWG rules. | ||
* A callback mechanism is provided for pluggable logic for deciding if a label is deemed potentially too misleading to render as Unicode in a user interface. | ||
* Errors are marked as U+FFFD REPLACEMENT CHARACTERs in Unicode output so that locations of errors may be illustrated to the user. | ||
|
||
## What it does not do | ||
|
||
* There is no default/sample policy provided for the callback mechanism mentioned above. | ||
* Earlier variants of IDNA (2003, 2008) are not implemented—only UTS 46. | ||
* The transitional mode is not supported. The transition is considered to be over: The transitional mode is deprecated in the UTS 46 specification, and the three major browser engines use non-transitional processing. | ||
* There is no API for categorizing errors beyond there being an error. | ||
* Checks that are configurable in UTS 46 but that the WHATWG URL Standard always set a particular way (regardless of the _beStrict_ flag in the URL Standard) cannot be configured. | ||
* The _UseSTD3ASCIIRules_ and _CheckHyphens_ flags cannot be set individually: they are bundled into one setting. | ||
* There is no support for a caller-provided ASCII deny list (there is only the choice between STD3 and WHATWG deny lists). | ||
|
||
## Known spec violations | ||
|
||
* The `verify_dns_length` behavior that this crate implements allows a trailing dot in the input as required by the UTS 46 test suite despite the UTS 46 spec saying that this isn't allowed. | ||
|
||
## Breaking changes since 0.5.0 | ||
|
||
* Transitional processing is no longer supported. Attempting to enable it panics immediately. | ||
* IDNA 2008 rules are no longer supported. Attempting to enable them panics immediately. | ||
* Setting `check_hyphens` and `use_std3_ascii_rules` to different values is no longer supported. Attempting conversion with such a configuration panics. | ||
* `check_hyphens` now performs the full _CheckHyphens_ check, including rejecting the hyphen in the third and fourth position in a label. | ||
* `domain_to_ascii_strict` now performs the _CheckHyphens_ check (matching previous documentation). | ||
* When `use_std3_ascii_rules` is `false` the [forbidden domain code point](https://url.spec.whatwg.org/#forbidden-domain-code-point) ASCII deny list from the WHATWG URL Standard is now enforced. | ||
* The `Idna::to_ascii_inner` method has been removed. It didn't make sense as a public method, since callers were unable to figure out if there were errors. (A GitHub search found no callers for this method.) | ||
* Punycode labels whose decoding does not yield any non-ASCII characters are now treated as being in error. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,189 @@ | ||
// Copyright 2013-2014 The rust-url developers. | ||
// | ||
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or | ||
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license | ||
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your | ||
// option. This file may not be copied, modified, or distributed | ||
// except according to those terms. | ||
|
||
//! [*Unicode IDNA Compatibility Processing* | ||
//! (Unicode Technical Standard #46)](http://www.unicode.org/reports/tr46/) | ||
|
||
#![allow(deprecated)] | ||
|
||
use alloc::string::String; | ||
|
||
use crate::uts46::*; | ||
use crate::Errors; | ||
|
||
/// Deprecated. Use the crate-top-level functions or [`Uts46`]. | ||
#[derive(Default)] | ||
#[deprecated] | ||
pub struct Idna { | ||
config: Config, | ||
} | ||
|
||
impl Idna { | ||
pub fn new(config: Config) -> Self { | ||
Self { config } | ||
} | ||
|
||
/// [UTS 46 ToASCII](http://www.unicode.org/reports/tr46/#ToASCII) | ||
#[allow(clippy::wrong_self_convention)] | ||
pub fn to_ascii(&mut self, domain: &str, out: &mut String) -> Result<(), Errors> { | ||
hsivonen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
match Uts46::new().process( | ||
domain.as_bytes(), | ||
self.config.strictness(), | ||
ErrorPolicy::FailFast, | ||
hsivonen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|_, _, _| false, | ||
out, | ||
None, | ||
) { | ||
Ok(ProcessingSuccess::Passthrough) => { | ||
if self.config.verify_dns_length && !verify_dns_length(domain) { | ||
return Err(crate::Errors::default()); | ||
} | ||
out.push_str(domain); | ||
Ok(()) | ||
} | ||
Ok(ProcessingSuccess::WroteToSink) => { | ||
if self.config.verify_dns_length && !verify_dns_length(out) { | ||
return Err(crate::Errors::default()); | ||
} | ||
Ok(()) | ||
} | ||
Err(ProcessingError::ValidityError) => Err(crate::Errors::default()), | ||
Err(ProcessingError::SinkError) => unreachable!(), | ||
} | ||
} | ||
|
||
/// [UTS 46 ToUnicode](http://www.unicode.org/reports/tr46/#ToUnicode) | ||
#[allow(clippy::wrong_self_convention)] | ||
pub fn to_unicode(&mut self, domain: &str, out: &mut String) -> Result<(), Errors> { | ||
hsivonen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
match Uts46::new().process( | ||
domain.as_bytes(), | ||
self.config.strictness(), | ||
ErrorPolicy::MarkErrors, | ||
|_, _, _| true, | ||
out, | ||
None, | ||
) { | ||
Ok(ProcessingSuccess::Passthrough) => { | ||
out.push_str(domain); | ||
Ok(()) | ||
} | ||
Ok(ProcessingSuccess::WroteToSink) => Ok(()), | ||
Err(ProcessingError::ValidityError) => Err(crate::Errors::default()), | ||
Err(ProcessingError::SinkError) => unreachable!(), | ||
} | ||
} | ||
} | ||
|
||
/// Deprecated configuration API. | ||
#[derive(Clone, Copy)] | ||
#[must_use] | ||
#[deprecated] | ||
pub struct Config { | ||
use_std3_ascii_rules: bool, | ||
verify_dns_length: bool, | ||
check_hyphens: bool, | ||
} | ||
|
||
/// The defaults are that of _beStrict=false_ in the [WHATWG URL Standard](https://url.spec.whatwg.org/#idna) | ||
impl Default for Config { | ||
hsivonen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
fn default() -> Self { | ||
Config { | ||
use_std3_ascii_rules: false, | ||
check_hyphens: false, | ||
// Only use for to_ascii, not to_unicode | ||
verify_dns_length: false, | ||
} | ||
} | ||
} | ||
|
||
impl Config { | ||
/// Whether to enforce STD3 or WHATWG URL Standard ASCII deny list. | ||
/// | ||
/// `true` for STD3, `false` for WHATWG. | ||
/// | ||
/// Note that `true` rejects pseudo-hosts used by various TXT record-based protocols. | ||
/// | ||
/// Must be set to the same value as [`Config::check_hyphens`]. | ||
#[inline] | ||
pub fn use_std3_ascii_rules(mut self, value: bool) -> Self { | ||
self.use_std3_ascii_rules = value; | ||
self | ||
} | ||
|
||
/// Obsolete method retained to ease migration. The argument must be `false`. | ||
/// | ||
/// Panics | ||
/// | ||
/// If the argument is `true`. | ||
#[inline] | ||
#[allow(unused_mut)] | ||
pub fn transitional_processing(mut self, value: bool) -> Self { | ||
assert!(!value, "Transitional processing is no longer supported"); | ||
self | ||
} | ||
|
||
/// Whether the _VerifyDNSLength_ operation should be performed | ||
/// by `to_ascii`. | ||
#[inline] | ||
pub fn verify_dns_length(mut self, value: bool) -> Self { | ||
self.verify_dns_length = value; | ||
self | ||
} | ||
|
||
/// Whether to enforce IETF rules for hyphen placement. | ||
/// | ||
/// `true` to deny hyphens in the first, last, third, and fourth | ||
/// position of a label. `false` to not enforce. | ||
/// | ||
/// Note that `true` rejects real-world names, including YouTube CDN nodes | ||
/// and some GitHub user pages. | ||
/// | ||
/// Must be set to the same value as [`Config::use_std3_ascii_rules`]. | ||
#[inline] | ||
pub fn check_hyphens(mut self, value: bool) -> Self { | ||
self.check_hyphens = value; | ||
self | ||
} | ||
|
||
/// Obsolete method retained to ease migration. The argument must be `false`. | ||
/// | ||
/// Panics | ||
/// | ||
/// If the argument is `true`. | ||
#[inline] | ||
#[allow(unused_mut)] | ||
pub fn use_idna_2008_rules(mut self, value: bool) -> Self { | ||
assert!(!value, "IDNA 2008 rules are no longer supported"); | ||
self | ||
} | ||
|
||
/// Compute strictness | ||
fn strictness(&self) -> Strictness { | ||
assert_eq!(self.check_hyphens, self.use_std3_ascii_rules, "Setting check_hyphens and use_std3_ascii_rules to different values is no longer supported"); | ||
if self.use_std3_ascii_rules { | ||
Strictness::Std3ConformanceChecker | ||
} else { | ||
Strictness::WhatwgUserAgent | ||
} | ||
} | ||
|
||
/// [UTS 46 ToASCII](http://www.unicode.org/reports/tr46/#ToASCII) | ||
pub fn to_ascii(self, domain: &str) -> Result<String, Errors> { | ||
let mut result = String::with_capacity(domain.len()); | ||
let mut codec = Idna::new(self); | ||
codec.to_ascii(domain, &mut result).map(|()| result) | ||
} | ||
|
||
/// [UTS 46 ToUnicode](http://www.unicode.org/reports/tr46/#ToUnicode) | ||
pub fn to_unicode(self, domain: &str) -> (String, Result<(), Errors>) { | ||
let mut codec = Idna::new(self); | ||
let mut out = String::with_capacity(domain.len()); | ||
let result = codec.to_unicode(domain, &mut out); | ||
(out, result) | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.