Skip to content

Conversation

@Ananya2
Copy link
Contributor

@Ananya2 Ananya2 commented Oct 8, 2025

Overview
This PR fixes bulk insert parsing of isolated quotes in tab-delimited data by removing problematic global quote state tracking from the parseString method in SQLServerBulkCSVFileRecord. The fix ensures that isolated quote characters are treated as literal data rather than field boundary markers, resolving IndexOutOfBoundsException errors during bulk copy operations.

Problem Description
The current implementation uses a global quoted boolean state in the parseString method that toggles on every quote character encounter. This causes issues when tab-delimited data contains isolated quotes within fields:
if (buffer.charAt(i) == doubleQuoteChar) { quoted = !quoted; } else if (!quoted && /* delimiter found */) { // Process delimiter }

When parsing data like "Do you wish to remove the product "\t22451\t1", the isolated quote incorrectly toggles the quoted state, causing subsequent tab delimiters to be ignored. This results in:
Expected: 5 fields parsed correctly
Actual: 3 fields parsed, causing IndexOutOfBoundsException

Root Cause
PR #2434 introduced quote handling logic to fix stack overflow issues in CSV parsing. While the fix successfully resolved the stack overflow problem for CSV files, it created a new issue where isolated quotes in tab-delimited data are treated as field boundary markers instead of literal characters.

Solution
Reverted to using currentLine.split(delimiter, -1) instead of parseString(currentLine, delimiter) for simple delimiter-based parsing.
Maintained stack overflow fix from PR #2434 while fixing the quote parsing regression.
Added comprehensive test coverage with the exact problematic data patterns from issue #2792

Testing

Closes #2792

@Ananya2 Ananya2 self-assigned this Oct 8, 2025
@codecov
Copy link

codecov bot commented Oct 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.20%. Comparing base (e783ae4) to head (beaa4e6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #2795   +/-   ##
=========================================
  Coverage     52.20%   52.20%           
- Complexity     4142     4144    +2     
=========================================
  Files           149      149           
  Lines         34306    34306           
  Branches       5723     5723           
=========================================
+ Hits          17908    17909    +1     
+ Misses        13906    13905    -1     
  Partials       2492     2492           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Ananya2 Ananya2 requested a review from machavan October 13, 2025 10:34
@Ananya2 Ananya2 added this to the 13.3.0 milestone Oct 13, 2025
@Ananya2 Ananya2 merged commit 7f4a3a3 into main Oct 14, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bulk insert does not handle " properly

4 participants