Skip to content

Conversation

@MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Oct 13, 2025

Summary

This PR improves our lexer to preserve STRING tokens instead of converting them to Unknown if a string literal misses its closing quotes.
Instead of converting to UNKNOWN, it sets a flag on the string literal that allows upstream tools to check if it's an unclosed string literal.

The benefit of preserving string literals is that it gives us much better error recovery because the parser now recognizes those literals.
That means, ty will correctly infer the literal type for a = "unclosed to be Literal["unclosed"].

Unfortunately, preserving the kind for unclosed string literals regressed the f-string's and t-string's recovery mechanism. So, I went ahead and improved that too.

There are a few improvements:

  • Preserve the F-STRING middle even if it's unclosed (e.g. f"unclosed) instead of parsing this as f""
  • Better recovery for missing }. E.g., the parser now matches the quotes for f"{ab" instead of assuming that the closing quotes start a new string
  • Better recovery for r format specifiers if the } is missing: f"{ab:r" now parses the r as the raw conversion flag rather than r" the start of a raw string literal

Fixes #19751
Fixes #20849

Review

You probably want to skip the first commit :) It updates all snapshots to now include the unclosed: <UNCLOSED> flag.

Test Plan

Reviewed and updated the snapshot tests. I also reviewed all usages of TokenKind::String to find cases where the missing closing quote could now cause issues.

This change should have no impact on AST-based lint rules or the formatter because they both only run when there are no parse errors.

@MichaReiser MichaReiser added the parser Related to the parser label Oct 13, 2025
Comment on lines +7 to 8
# This is also true for
# unterminated f-strings.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this looks silly, but keeping the comment over two lines reduces the snapshot changes.

bitflags! {
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub(crate) struct TokenFlags: u8 {
pub(crate) struct TokenFlags: u16 {
Copy link
Member Author

@MichaReiser MichaReiser Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this increases the size of TokenFlags, it doesn't increase the size of Token. Which is why I didn't bother with any fancy encoding (e.g. it's unclosed if RAW_STRING_UPPERCASE and RAW_STRING_LOWERCASE are set)

@github-actions
Copy link
Contributor

github-actions bot commented Oct 13, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser MichaReiser force-pushed the micha/unclosed-string branch from 698f74c to a012e2b Compare October 14, 2025 09:06
self.current_flags |= TokenFlags::UNCLOSED_STRING;

self.push_error(LexicalError::new(
LexicalErrorType::StringError,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error was just wrong. a = "string<EOF reported an unexpected string error rather than unclosed string literal

@MichaReiser MichaReiser requested a review from dylwil3 October 14, 2025 09:31
@MichaReiser MichaReiser marked this pull request as ready for review October 14, 2025 09:31
@MichaReiser MichaReiser changed the title Better error recovery for unclosed strings (including f- and t-strings) Improved error recovery for unclosed strings (including f- and t-strings) Oct 14, 2025
@MichaReiser MichaReiser requested a review from ntBre October 14, 2025 16:54
Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This looks great to me!

@MichaReiser MichaReiser merged commit 4fc7dd3 into main Oct 15, 2025
38 checks passed
@MichaReiser MichaReiser deleted the micha/unclosed-string branch October 15, 2025 07:50
dcreager added a commit that referenced this pull request Oct 15, 2025
…rable

* origin/main:
  [ty] Add (unused) `inferable` parameter to type property methods (#20865)
  Run macos tests on macos (#20889)
  Remove `release` CI job (#20887)
  [ty] CI: Faster ecosystem analysis (#20886)
  Remove `strip` from release profile (#20885)
  [ty] Sync vendored typeshed stubs (#20876)
  [ty] Add some completion ranking improvements (#20807)
  Improved error recovery for unclosed strings (including f- and t-strings) (#20848)
  Enable lto=fat (#20863)
  [`pyupgrade`] Extend `UP019` to detect `typing_extensions.Text` (`UP019`) (#20825)
  [`flake8-bugbear`] Omit annotation in preview fix for `B006` (#20877)
  fix(docs): Fix typo in `RUF015` description (#20873)
  [ty] Improve and extend tests for instance attributes redeclared in subclasses (#20866)
  [ty] Ignore slow seeds as a temporary measure (#20870)
  Remove parentheses around multiple exception types on Python 3.14+ (#20768)
  Update Black tests (#20794)
dcreager added a commit that referenced this pull request Oct 15, 2025
…nt-sets

* dcreager/non-non-inferable: (174 commits)
  [ty] Add (unused) `inferable` parameter to type property methods (#20865)
  Run macos tests on macos (#20889)
  Remove `release` CI job (#20887)
  [ty] CI: Faster ecosystem analysis (#20886)
  Remove `strip` from release profile (#20885)
  [ty] Sync vendored typeshed stubs (#20876)
  [ty] Add some completion ranking improvements (#20807)
  Improved error recovery for unclosed strings (including f- and t-strings) (#20848)
  Enable lto=fat (#20863)
  [`pyupgrade`] Extend `UP019` to detect `typing_extensions.Text` (`UP019`) (#20825)
  [`flake8-bugbear`] Omit annotation in preview fix for `B006` (#20877)
  fix(docs): Fix typo in `RUF015` description (#20873)
  [ty] Improve and extend tests for instance attributes redeclared in subclasses (#20866)
  [ty] Ignore slow seeds as a temporary measure (#20870)
  use existing method
  Remove parentheses around multiple exception types on Python 3.14+ (#20768)
  Update Black tests (#20794)
  just the api parts
  [ty] Fix further issues in `super()` inference logic (#20843)
  [ty] Document when a rule was added (#20859)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parser Related to the parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Preserve unterminated string tokens Panic f-string: unexpected token TStringMiddle

3 participants