Skip to content

[SPARK-53235][CORE][TESTS] Use Java Set.of instead of ImmutableSet.of #51961

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

LuciferYang
Copy link
Contributor

What changes were proposed in this pull request?

This PR aims to use Java 9+ Set.of API instead of ImmutableSet.of API.

Why are the changes needed?

Java's native API is shorter and faster.

scala> spark.version
val res0: String = 4.1.0-preview1

scala> spark.time((1 until 100_000_000).foreach(_ => java.util.Set.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)))
Time taken: 4287 ms

scala> spark.time((1 until 100_000_000).foreach(_ => com.google.common.collect.ImmutableSet.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)))
Time taken: 15024 ms

Does this PR introduce any user-facing change?

No behavior change.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun
dongjoon-hyun previously approved these changes Aug 10, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang .

assertEquals(7, store.count(ArrayKeyIndexType.class));

assertTrue(store.removeAllByIndexValues(
ArrayKeyIndexType.class,
"id",
ImmutableSet.of(new String [] { "things" })));
Set.of(new String [] { "things" })));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks correct but it seems to fails. Could you double-check this, @LuciferYang ?

$ build/sbt "kvstore/test"
....
[error] Test org.apache.spark.util.kvstore.InMemoryStoreSuite.testRemoveAll failed: org.opentest4j.AssertionFailedError: expected: <true> but was: <false>, took 0.004s
[error]     at org.apache.spark.util.kvstore.InMemoryStoreSuite.testRemoveAll(InMemoryStoreSuite.java:159)
[error]     at java.lang.reflect.Method.invoke(Method.java:580)
[error]     at java.util.ArrayList.forEach(ArrayList.java:1596)
[error]     at java.util.ArrayList.forEach(ArrayList.java:1596)
[info] Test run finished: 1 failed, 0 ignored, 8 total, 0.009s

assertEquals(7, store.count(ArrayKeyIndexType.class));

assertTrue(store.removeAllByIndexValues(
ArrayKeyIndexType.class,
"id",
ImmutableSet.of(new String [] { "things" })));
Set.<String[]>of(new String [] { "things" })));
Copy link
Contributor Author

@LuciferYang LuciferYang Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set<String> s0 = Set.of(new String[]{"1"});
Set<String[]> s1 = Set.<String[]>of(new String[]{"1"});
ImmutableSet<String[]> s2 = ImmutableSet.of(new String[]{"1"});

Due to method overloading, there are some differences here:

  1. For s0, the following method will be called:
   static <E> Set<E> of(E... elements) {
        switch (elements.length) { // Implicit null check of elements
            case 0:
                @SuppressWarnings("unchecked")
                var set = (Set<E>) ImmutableCollections.EMPTY_SET;
                return set;
            case 1:
                return new ImmutableCollections.Set12<>(elements[0]);
            case 2:
                return new ImmutableCollections.Set12<>(elements[0], elements[1]);
            default:
                return new ImmutableCollections.SetN<>(elements);
        }
    }

So, the result of s0 becomes a set containing the string element "1".

  1. For s1, the following method will be called:
static <E> Set<E> of(E e1) {
  return new ImmutableCollections.Set12<>(e1);
}

Thus, the result of s1 becomes a set containing the array {"1"}.

  1. For s2, the following method will be called:
  public static <E> ImmutableSet<E> of(E e1) {
    return new SingletonImmutableSet<>(e1);
  }

So, the result of s2 also becomes a set containing the array {"1"}.

Therefore, s1 and s2 are equivalent, while s0 and s2 are not equivalent.

assertEquals(7, db.count(ArrayKeyIndexType.class));

db.removeAllByIndexValues(
ArrayKeyIndexType.class,
"id",
ImmutableSet.of(new String[] { "things" }));
Set.<String[]>of(new String[] { "things" }));
Copy link
Contributor Author

@LuciferYang LuciferYang Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simply to align the testing approach for InMemoryStore, due to the issue mentioned in https://github.com/apache/spark/pull/51961/files#r2265383429, which appears to affect only InMemoryStore.

For RocksDBStore, even if Set.of(new String[] { "things" })); is used here, the RocksDBSuite test will still succeed. This seems like something worth investigating further—for example, are there any differences in the implementations of InMemoryStore and RocksDBStore?

It's getting late in my time zone today, so I'll look into why this discrepancy exists at a later time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting symptom. Thank you for taking a look at this. Have a good night!

@dongjoon-hyun dongjoon-hyun dismissed their stale review August 10, 2025 18:15

According to the discussion, we need to take another round of reviews.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

I verified that in the same way. For a single Array argument, Set.of thinks it's a collection to make a set. So, we need a type information to distinguish Array argument is an element.

jshell> Set.of(new String[]{"a", "a"})
|  Exception java.lang.IllegalArgumentException: duplicate element: a
|        at ImmutableCollections$Set12.<init> (ImmutableCollections.java:798)
|        at Set.of (Set.java:704)
|        at (#4:1)

jshell> Set.<String[]>of(new String[]{"a", "a"})
$5 ==> [[Ljava.lang.String;@3a883ce7]

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.1.0.

@LuciferYang
Copy link
Contributor Author

Thank you @dongjoon-hyun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants