-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[ruff
] re
and regex
calls with unraw string as first argument (RUF039
)
#14446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 4 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
769e771
[`ruff`] `re` and `regex` calls with unraw string as first argument (…
InSyncWithFoo 3bd4cba
Schema
InSyncWithFoo 1ef167d
Clippy
InSyncWithFoo 7528216
Formatting
InSyncWithFoo 5ad591c
Per review
InSyncWithFoo 989eae4
Also check implicitly concatenated strings/bytes
InSyncWithFoo 2655578
Clippy
InSyncWithFoo c3195fb
Remove `.is_regex()`
InSyncWithFoo 34caa92
Per review
InSyncWithFoo a1e9e1c
This too
InSyncWithFoo ff3a061
Minor fix
InSyncWithFoo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
import re | ||
import regex | ||
|
||
# Errors | ||
re.compile('single free-spacing', flags=re.X) | ||
re.findall('si\ngle') | ||
re.finditer("dou\ble") | ||
re.fullmatch('''t\riple single''') | ||
re.match("""\triple double""") | ||
re.search('two', 'args') | ||
re.split("raw", r'second') | ||
re.sub(u'''nicode''', u"f(?i)rst") | ||
re.subn(b"""ytes are""", f"\u006e") | ||
|
||
regex.compile('single free-spacing', flags=regex.X) | ||
regex.findall('si\ngle') | ||
regex.finditer("dou\ble") | ||
regex.fullmatch('''t\riple single''') | ||
regex.match("""\triple double""") | ||
regex.search('two', 'args') | ||
regex.split("raw", r'second') | ||
regex.sub(u'''nicode''', u"f(?i)rst") | ||
regex.subn(b"""ytes are""", f"\u006e") | ||
|
||
regex.template("""(?m) | ||
(?:ulti)? | ||
(?=(?<!(?<=(?!l))) | ||
l(?i:ne) | ||
""", flags = regex.X) | ||
|
||
|
||
# No errors | ||
re.compile(R'uppercase') | ||
re.findall(not_literal) | ||
re.finditer(0, literal_but_not_string) | ||
re.fullmatch() # no first argument | ||
re.match('string' f'''concatenation''') | ||
re.search(R"raw" r'concatenation') | ||
re.split(rf"multiple", f"""lags""") | ||
re.sub(FR'ee', '''as in free speech''') | ||
re.subn(br"""eak your machine with rm -""", rf"""/""") | ||
|
||
regex.compile(R'uppercase') | ||
regex.findall(not_literal) | ||
regex.finditer(0, literal_but_not_string) | ||
regex.fullmatch() # no first argument | ||
regex.match('string' f'''concatenation''') | ||
regex.search(R"raw" r'concatenation') | ||
regex.split(rf"multiple", f"""lags""") | ||
regex.sub(FR'ee', '''as in free speech''') | ||
regex.subn(br"""eak your machine with rm -""", rf"""/""") | ||
|
||
regex.splititer(both, non_literal) | ||
regex.subf(f, lambda _: r'means', '"format"') | ||
regex.subfn(fn, f'''a$1n't''', lambda: "'function'") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
176 changes: 176 additions & 0 deletions
176
crates/ruff_linter/src/rules/ruff/rules/unraw_re_pattern.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,176 @@ | ||
use ruff_diagnostics::{Diagnostic, Violation}; | ||
use ruff_macros::{derive_message_formats, violation}; | ||
use ruff_python_ast::{Expr, ExprBytesLiteral, ExprCall, ExprStringLiteral}; | ||
use ruff_python_semantic::{Modules, SemanticModel}; | ||
use ruff_text_size::{Ranged, TextRange}; | ||
use std::fmt::{Display, Formatter}; | ||
|
||
use crate::checkers::ast::Checker; | ||
|
||
/// ## What it does | ||
/// Reports the following `re` and `regex` calls when | ||
/// their first arguments are not raw strings: | ||
/// | ||
/// - Both modules: `compile`, `findall`, `finditer`, | ||
/// `fullmatch`, `match`, `search`, `split`, `sub`, `subn`. | ||
InSyncWithFoo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// - `regex`-specific: `splititer`, `subf`, `subfn`, `template`. | ||
/// | ||
/// ## Why is this bad? | ||
/// Regular expressions should be written | ||
/// using raw strings to avoid double escaping. | ||
/// | ||
/// ## Example | ||
/// | ||
/// ```python | ||
/// re.compile("foo\\bar") | ||
/// ``` | ||
/// | ||
/// Use instead: | ||
/// | ||
/// ```python | ||
/// re.compile(r"foo\bar") | ||
/// ``` | ||
#[violation] | ||
pub struct UnrawRePattern { | ||
module: RegexModule, | ||
func: String, | ||
kind: PatternKind, | ||
} | ||
|
||
impl Violation for UnrawRePattern { | ||
#[derive_message_formats] | ||
fn message(&self) -> String { | ||
let Self { module, func, kind } = &self; | ||
let call = format!("`{module}.{func}()`"); | ||
|
||
match kind { | ||
PatternKind::String => format!("First argument to {call} is not raw string"), | ||
PatternKind::Bytes => format!("First argument to {call} is not raw bytes literal"), | ||
} | ||
} | ||
|
||
fn fix_title(&self) -> Option<String> { | ||
match self.kind { | ||
PatternKind::String => Some("Replace with raw string".to_string()), | ||
PatternKind::Bytes => Some("Replace with raw bytes literal".to_string()), | ||
} | ||
} | ||
} | ||
|
||
#[derive(Debug, Eq, PartialEq)] | ||
InSyncWithFoo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
enum RegexModule { | ||
Re, | ||
Regex, | ||
} | ||
|
||
impl RegexModule { | ||
fn is_regex(&self) -> bool { | ||
matches!(self, RegexModule::Regex) | ||
} | ||
} | ||
|
||
impl Display for RegexModule { | ||
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result { | ||
write!( | ||
f, | ||
"{}", | ||
match self { | ||
RegexModule::Re => "re", | ||
RegexModule::Regex => "regex", | ||
} | ||
) | ||
} | ||
InSyncWithFoo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
#[derive(Debug, Eq, PartialEq)] | ||
InSyncWithFoo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
enum PatternKind { | ||
String, | ||
Bytes, | ||
} | ||
|
||
/// RUF051 | ||
pub(crate) fn unraw_re_pattern(checker: &mut Checker, call: &ExprCall) { | ||
let semantic = checker.semantic(); | ||
|
||
if !semantic.seen_module(Modules::RE) && !semantic.seen_module(Modules::REGEX) { | ||
return; | ||
} | ||
|
||
let Some((module, func)) = regex_module_and_func(semantic, call.func.as_ref()) else { | ||
return; | ||
}; | ||
let Some((kind, range)) = pattern_kind_and_range(call.arguments.args.as_ref()) else { | ||
return; | ||
}; | ||
|
||
let diagnostic = Diagnostic::new(UnrawRePattern { module, func, kind }, range); | ||
|
||
checker.diagnostics.push(diagnostic); | ||
} | ||
|
||
fn regex_module_and_func(semantic: &SemanticModel, expr: &Expr) -> Option<(RegexModule, String)> { | ||
let qualified_name = semantic.resolve_qualified_name(expr)?; | ||
|
||
let (module, func) = match qualified_name.segments() { | ||
[module, func] => match *module { | ||
"re" => (RegexModule::Re, *func), | ||
"regex" => (RegexModule::Regex, *func), | ||
_ => return None, | ||
}, | ||
InSyncWithFoo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
_ => return None, | ||
}; | ||
|
||
if is_shared(func) || module.is_regex() && is_regex_specific(func) { | ||
return Some((module, func.to_string())); | ||
} | ||
InSyncWithFoo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
None | ||
InSyncWithFoo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
fn pattern_kind_and_range(arguments: &[Expr]) -> Option<(PatternKind, TextRange)> { | ||
let first = arguments.first()?; | ||
let range = first.range(); | ||
|
||
let pattern_kind = match first { | ||
Expr::StringLiteral(ExprStringLiteral { value, .. }) => { | ||
if value.is_implicit_concatenated() || value.is_raw() { | ||
return None; | ||
} | ||
InSyncWithFoo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
PatternKind::String | ||
} | ||
|
||
Expr::BytesLiteral(ExprBytesLiteral { value, .. }) => { | ||
if value.is_implicit_concatenated() || value.is_raw() { | ||
return None; | ||
} | ||
|
||
PatternKind::Bytes | ||
} | ||
|
||
_ => return None, | ||
}; | ||
|
||
Some((pattern_kind, range)) | ||
} | ||
|
||
/// Whether `func` is an attribute of both `re` and `regex`. | ||
fn is_shared(func: &str) -> bool { | ||
matches!( | ||
func, | ||
"compile" | ||
| "findall" | ||
| "finditer" | ||
| "fullmatch" | ||
| "match" | ||
| "search" | ||
| "split" | ||
| "sub" | ||
| "subn" | ||
) | ||
} | ||
|
||
/// Whether `func` is an extension specific to `regex`. | ||
fn is_regex_specific(func: &str) -> bool { | ||
matches!(func, "splititer" | "subf" | "subfn" | "template") | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.