Fix bulk insert parsing of isolated quotes in tab-delimited data (#2792) #2795
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR fixes bulk insert parsing of isolated quotes in tab-delimited data by removing problematic global quote state tracking from the
parseStringmethod inSQLServerBulkCSVFileRecord. The fix ensures that isolated quote characters are treated as literal data rather than field boundary markers, resolving IndexOutOfBoundsException errors during bulk copy operations.Problem Description
The current implementation uses a global
quotedboolean state in theparseStringmethod that toggles on every quote character encounter. This causes issues when tab-delimited data contains isolated quotes within fields:if (buffer.charAt(i) == doubleQuoteChar){quoted = !quoted; } else if (!quoted && /* delimiter found */) { // Process delimiter }When parsing data like "Do you wish to remove the product "\t22451\t1", the isolated quote incorrectly toggles the quoted state, causing subsequent tab delimiters to be ignored. This results in:
Expected: 5 fields parsed correctly
Actual: 3 fields parsed, causing
IndexOutOfBoundsExceptionRoot Cause
PR #2434 introduced quote handling logic to fix stack overflow issues in CSV parsing. While the fix successfully resolved the stack overflow problem for CSV files, it created a new issue where isolated quotes in tab-delimited data are treated as field boundary markers instead of literal characters.
Solution
Reverted to using
currentLine.split(delimiter, -1)instead ofparseString(currentLine, delimiter)for simple delimiter-based parsing.Maintained stack overflow fix from PR #2434 while fixing the quote parsing regression.
Added comprehensive test coverage with the exact problematic data patterns from issue #2792
Testing
Closes #2792