Fix bulk insert parsing of isolated quotes in tab-delimited data (#2792) #2795

Ananya2 · 2025-10-08T09:58:09Z

Overview
This PR fixes bulk insert parsing of isolated quotes in tab-delimited data by removing problematic global quote state tracking from the parseString method in SQLServerBulkCSVFileRecord. The fix ensures that isolated quote characters are treated as literal data rather than field boundary markers, resolving IndexOutOfBoundsException errors during bulk copy operations.

Problem Description
The current implementation uses a global quoted boolean state in the parseString method that toggles on every quote character encounter. This causes issues when tab-delimited data contains isolated quotes within fields:
if (buffer.charAt(i) == doubleQuoteChar) { quoted = !quoted; } else if (!quoted && /* delimiter found */) { // Process delimiter }

When parsing data like "Do you wish to remove the product "\t22451\t1", the isolated quote incorrectly toggles the quoted state, causing subsequent tab delimiters to be ignored. This results in:
Expected: 5 fields parsed correctly
Actual: 3 fields parsed, causing IndexOutOfBoundsException

Root Cause
PR #2434 introduced quote handling logic to fix stack overflow issues in CSV parsing. While the fix successfully resolved the stack overflow problem for CSV files, it created a new issue where isolated quotes in tab-delimited data are treated as field boundary markers instead of literal characters.

Solution
Reverted to using currentLine.split(delimiter, -1) instead of parseString(currentLine, delimiter) for simple delimiter-based parsing.
Maintained stack overflow fix from PR #2434 while fixing the quote parsing regression.
Added comprehensive test coverage with the exact problematic data patterns from issue #2792

Testing

Added testBulkCopyTabDelimitedWithQuotes() test case with problematic data from issue Bulk insert does not handle " properly #2792

Closes #2792

codecov · 2025-10-08T10:25:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.20%. Comparing base (e783ae4) to head (beaa4e6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##               main    #2795   +/-   ##
=========================================
  Coverage     52.20%   52.20%           
- Complexity     4142     4144    +2     
=========================================
  Files           149      149           
  Lines         34306    34306           
  Branches       5723     5723           
=========================================
+ Hits          17908    17909    +1     
+ Misses        13906    13905    -1     
  Partials       2492     2492

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/main/java/com/microsoft/sqlserver/jdbc/SQLServerBulkCSVFileRecord.java

Ananya2 added 2 commits October 8, 2025 15:12

Fix bulk insert parsing of isolated quotes in tab-delimited data (#2792)

6861de4

removed comments

1b76bf9

Ananya2 self-assigned this Oct 8, 2025

Ananya2 requested review from David-Engel, divang, machavan and muskan124947 October 8, 2025 09:58

machavan reviewed Oct 8, 2025

View reviewed changes

src/main/java/com/microsoft/sqlserver/jdbc/SQLServerBulkCSVFileRecord.java Show resolved Hide resolved

fixed CI check failure

beaa4e6

Ananya2 requested a review from machavan October 13, 2025 10:34

Ananya2 added this to the 13.3.0 milestone Oct 13, 2025

machavan approved these changes Oct 13, 2025

View reviewed changes

muskan124947 approved these changes Oct 13, 2025

View reviewed changes

Ananya2 merged commit 7f4a3a3 into main Oct 14, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix bulk insert parsing of isolated quotes in tab-delimited data (#2792) #2795

Fix bulk insert parsing of isolated quotes in tab-delimited data (#2792) #2795

Uh oh!

Ananya2 commented Oct 8, 2025

Uh oh!

codecov bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix bulk insert parsing of isolated quotes in tab-delimited data (#2792) #2795

Fix bulk insert parsing of isolated quotes in tab-delimited data (#2792) #2795

Uh oh!

Conversation

Ananya2 commented Oct 8, 2025

Uh oh!

codecov bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Oct 8, 2025 •

edited

Loading