Skip to content

Conversation

hsiang-c
Copy link
Contributor

@hsiang-c hsiang-c commented Oct 16, 2025

Which issue does this PR close?

Closes #1890
Partially closes #2314

Rationale for this change

What changes are included in this PR?

  • Implemented Spark's ANSI mode that throws org.apache.spark.SparkArithmeticException on the MIN_VALUE of Spark's IntegralType, see doc.
  • In CometTestBase, changed the types of column _9, _10, _11 and _12 from UINT_8/16/32/64 to INT_8/16/32/64 b/c we actually have negative values in test data.

How are these changes tested?

  • unit tests w/ MIN_VALUE and decimal values with different precision and scale.
  • SparkSQL tests

@codecov-commenter
Copy link

codecov-commenter commented Oct 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.33%. Comparing base (f09f8af) to head (1ddff98).
⚠️ Report is 621 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2595      +/-   ##
============================================
+ Coverage     56.12%   59.33%   +3.20%     
- Complexity      976     1444     +468     
============================================
  Files           119      146      +27     
  Lines         11743    13758    +2015     
  Branches       2251     2353     +102     
============================================
+ Hits           6591     8163    +1572     
- Misses         4012     4373     +361     
- Partials       1140     1222      +82     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Seq(2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 15, 16, 17).foreach { col =>
checkSparkAnswerAndOperator(s"SELECT abs(_${col}) FROM tbl")
test("abs") {
Seq(true, false).foreach { ansi_enabled =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the diff, test with ANSI mode on/off.

| optional int32 _10(UINT_16);
| optional int32 _11(UINT_32);
| optional int64 _12(UINT_64);
| optional int32 _9(INT_8);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We store negative values in these columns, I think the schema should not be unsigned int.

// CometTestBase.scala
          record.add(8, (-i).toByte)
          record.add(9, (-i).toShort)
          record.add(10, -i)
          record.add(11, (-i).toLong)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have UINT here to make sure we cover all types that parquet has. The data files created here are specifically designed to test whether parquet readers can handle all types correctly. Negative values stored in a UINT parquet type test the values around the boundary of allowed values.
To illustrate with an example, when you store the value -1 in a UINT_8 field what gets stored is the bit pattern 0xff. On reading, this is read back as the value 255 which is the maximum value for a UINT_8.
This is both correct and desirable.

@hsiang-c hsiang-c marked this pull request as ready for review October 17, 2025 20:31
@comphead
Copy link
Contributor

Thanks @hsiang-c WDYT of implementing abs with spark flavor in DF? Like I did recently for concat apache/datafusion#18128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unsupported expressions found in spark sql unit test Add support for abs

4 participants