Skip to content

Conversation

@henrymai
Copy link
Contributor

@henrymai henrymai commented Apr 20, 2023

Rationale for this change

Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures.

Currently we are using our own custom Arrow Dataset build with GCS turned on, but we would rather this be enabled in the official releases from Arrow.

GCS support is already enabled for cpp, python, ruby, python, and r already, so there should be no reason not to enable this on java as well.

What changes are included in this PR?

  • Changes to enable GCS for Java Arrow Dataset on just Linux for now.

  • Fixes to flight-sql-jdbc-driver/pom.xml. Without these fixes the flight-sql-jdbc-driver build will fail with the following errors:

    [WARNING] Used undeclared dependencies found:
    [WARNING]    org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime
    [WARNING]    org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.hamcrest:hamcrest:jar:2.2:runtime
    [WARNING]    org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.mockito:mockito-core:jar:2.25.1:test
    [WARNING]    org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.slf4j:slf4j-api:jar:1.7.25:runtime
    [WARNING]    io.netty:netty-common:jar:4.1.82.Final:runtime
    [WARNING]    joda-time:joda-time:jar:2.10.14:runtime
    [WARNING]    org.apache.calcite.avatica:avatica:jar:1.18.0:runtime
    [WARNING]    com.google.protobuf:protobuf-java:jar:3.21.6:runtime
    [WARNING]    org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    com.google.guava:guava:jar:31.1-jre:runtime

    [...]

    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1]
    Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot.
      at java.util.Objects.requireNonNull(Objects.java:228)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60)
      at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59)

Are these changes tested?

I've tested the build by running:

$HOME/.local/bin/archery docker run java-jni-manylinux-2014

I've also tested the resulting ./java/dataset/target/arrow-dataset-12.0.0-SNAPSHOT.jar from running the command and have verified that GCS support is enabled.

Are there any user-facing changes?

Yes, Java Arrow Dataset will now work with GCS.

@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #35245 has been automatically assigned in GitHub to PR creator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you keep this list in alphabetical order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I just noticed that this is out of order (before my change):

: ${ARROW_RPATH_ORIGIN:=ON}
: ${ARROW_ORC:=ON}

@kou
Copy link
Member

kou commented Apr 20, 2023

@github-actions crossbow submit java-jars

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Apr 20, 2023
@github-actions
Copy link

Revision: 9f4d80c8b0239aeee6a85234832b0b2d29a5dc46

Submitted crossbow builds: ursacomputing/crossbow @ actions-f4323dd042

Task Status
java-jars Github Actions

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 20, 2023
@henrymai
Copy link
Contributor Author

Looks like the macOS builds are failing (which I kind of expected).
I'll remove the macOS changes from my patch and let someone else that has a Mac take that on in a follow on patch.

@henrymai henrymai changed the title GH-35245: [Java][Arrow Dataset] Enable GCS GH-35245: [Java][Arrow Dataset][Linux] Enable GCS Apr 20, 2023
Enables GCS when building the Arrow Dataset for Java and also fixes various
java build failures.

Without the changes to flight-sql-jdbc-driver/pom.xml the flight-sql-jdbc-driver
build will fail with the following errors:

    [WARNING] Used undeclared dependencies found:
    [WARNING]    org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime
    [WARNING]    org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.hamcrest:hamcrest:jar:2.2:runtime
    [WARNING]    org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.mockito:mockito-core:jar:2.25.1:test
    [WARNING]    org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.slf4j:slf4j-api:jar:1.7.25:runtime
    [WARNING]    io.netty:netty-common:jar:4.1.82.Final:runtime
    [WARNING]    joda-time:joda-time:jar:2.10.14:runtime
    [WARNING]    org.apache.calcite.avatica:avatica:jar:1.18.0:runtime
    [WARNING]    com.google.protobuf:protobuf-java:jar:3.21.6:runtime
    [WARNING]    org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    com.google.guava:guava:jar:31.1-jre:runtime

    [...]

    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1]

And also fail with:

    Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot.
      at java.util.Objects.requireNonNull(Objects.java:228)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60)
      at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59)
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The Java changes look fine. We can punt on macOS for now. @davisusanibar might you be able to follow up there?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Apr 20, 2023
@kou
Copy link
Member

kou commented Apr 20, 2023

@github-actions crossbow submit java-jars

@github-actions
Copy link

Revision: 98d47c5

Submitted crossbow builds: ursacomputing/crossbow @ actions-b74b79c10c

Task Status
java-jars Github Actions

@kou kou changed the title GH-35245: [Java][Arrow Dataset][Linux] Enable GCS GH-35245: [Java][Dataset][Linux] Enable GCS Apr 20, 2023
@henrymai
Copy link
Contributor Author

It looks like there are two unrelated test failures:

@kou
Copy link
Member

kou commented Apr 20, 2023

Yes. They are unrelated.

Could you check the built artifacts at https://github.com/ursacomputing/crossbow/releases/tag/actions-b74b79c10c-github-java-jars ?

@davisusanibar
Copy link
Contributor

Thanks. The Java changes look fine. We can punt on macOS for now. @davisusanibar might you be able to follow up there?

Sure, let me also consider Windows changes needed.

@henrymai
Copy link
Contributor Author

henrymai commented Apr 20, 2023

Yes. They are unrelated.

Could you check the built artifacts at https://github.com/ursacomputing/crossbow/releases/tag/actions-b74b79c10c-github-java-jars ?

Works fine. See this screenshot for proof:
works_fine

For context, I just replaced our existing custom 10.0.1 build with the 12.0.1-SNAPSHOT from the built artifacts link that you sent and verified it with a simple test to a public gcs repo.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Thanks.

@kou kou merged commit 7ea2d98 into apache:main Apr 20, 2023
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Apr 20, 2023
@henrymai
Copy link
Contributor Author

Thanks @kou @lidavidm and @davisusanibar for the very fast turn around.

@henrymai henrymai deleted the enable_gcs branch April 20, 2023 14:38
@davisusanibar
Copy link
Contributor

Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures.

liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this pull request May 11, 2023
### Rationale for this change

Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures.

Currently we are using our own custom Arrow Dataset build with GCS turned on, but we would rather this be enabled in the official releases from Arrow.

GCS support is already enabled for cpp, python, ruby, python, and r already, so there should be no reason not to enable this on java as well.

### What changes are included in this PR?

- Changes to enable GCS for Java Arrow Dataset on just Linux for now.

- Fixes to flight-sql-jdbc-driver/pom.xml. Without these fixes the flight-sql-jdbc-driver build will fail with the following errors:

```
    [WARNING] Used undeclared dependencies found:
    [WARNING]    org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime
    [WARNING]    org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.hamcrest:hamcrest:jar:2.2:runtime
    [WARNING]    org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.mockito:mockito-core:jar:2.25.1:test
    [WARNING]    org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.slf4j:slf4j-api:jar:1.7.25:runtime
    [WARNING]    io.netty:netty-common:jar:4.1.82.Final:runtime
    [WARNING]    joda-time:joda-time:jar:2.10.14:runtime
    [WARNING]    org.apache.calcite.avatica:avatica:jar:1.18.0:runtime
    [WARNING]    com.google.protobuf:protobuf-java:jar:3.21.6:runtime
    [WARNING]    org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    com.google.guava:guava:jar:31.1-jre:runtime

    [...]

    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1]
```

```
    Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot.
      at java.util.Objects.requireNonNull(Objects.java:228)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60)
      at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59)
```

### Are these changes tested?

I've tested the build by running:
```
$HOME/.local/bin/archery docker run java-jni-manylinux-2014
```

I've also tested the resulting `./java/dataset/target/arrow-dataset-12.0.0-SNAPSHOT.jar` from running the command and have verified that GCS support is enabled.

### Are there any user-facing changes?

Yes, Java Arrow Dataset will now work with GCS.

* Closes: apache#35245

Authored-by: Henry Mai <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
### Rationale for this change

Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures.

Currently we are using our own custom Arrow Dataset build with GCS turned on, but we would rather this be enabled in the official releases from Arrow.

GCS support is already enabled for cpp, python, ruby, python, and r already, so there should be no reason not to enable this on java as well.

### What changes are included in this PR?

- Changes to enable GCS for Java Arrow Dataset on just Linux for now.

- Fixes to flight-sql-jdbc-driver/pom.xml. Without these fixes the flight-sql-jdbc-driver build will fail with the following errors:

```
    [WARNING] Used undeclared dependencies found:
    [WARNING]    org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime
    [WARNING]    org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.hamcrest:hamcrest:jar:2.2:runtime
    [WARNING]    org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.mockito:mockito-core:jar:2.25.1:test
    [WARNING]    org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.slf4j:slf4j-api:jar:1.7.25:runtime
    [WARNING]    io.netty:netty-common:jar:4.1.82.Final:runtime
    [WARNING]    joda-time:joda-time:jar:2.10.14:runtime
    [WARNING]    org.apache.calcite.avatica:avatica:jar:1.18.0:runtime
    [WARNING]    com.google.protobuf:protobuf-java:jar:3.21.6:runtime
    [WARNING]    org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    com.google.guava:guava:jar:31.1-jre:runtime

    [...]

    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1]
```

```
    Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot.
      at java.util.Objects.requireNonNull(Objects.java:228)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60)
      at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59)
```

### Are these changes tested?

I've tested the build by running:
```
$HOME/.local/bin/archery docker run java-jni-manylinux-2014
```

I've also tested the resulting `./java/dataset/target/arrow-dataset-12.0.0-SNAPSHOT.jar` from running the command and have verified that GCS support is enabled.

### Are there any user-facing changes?

Yes, Java Arrow Dataset will now work with GCS.

* Closes: apache#35245

Authored-by: Henry Mai <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
### Rationale for this change

Enables GCS when building the Arrow Dataset for Java and also fixes various java build failures.

Currently we are using our own custom Arrow Dataset build with GCS turned on, but we would rather this be enabled in the official releases from Arrow.

GCS support is already enabled for cpp, python, ruby, python, and r already, so there should be no reason not to enable this on java as well.

### What changes are included in this PR?

- Changes to enable GCS for Java Arrow Dataset on just Linux for now.

- Fixes to flight-sql-jdbc-driver/pom.xml. Without these fixes the flight-sql-jdbc-driver build will fail with the following errors:

```
    [WARNING] Used undeclared dependencies found:
    [WARNING]    org.bouncycastle:bcpkix-jdk15on:jar:1.61:runtime
    [WARNING]    org.apache.arrow:arrow-memory-core:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.hamcrest:hamcrest:jar:2.2:runtime
    [WARNING]    org.apache.arrow:flight-sql:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.mockito:mockito-core:jar:2.25.1:test
    [WARNING]    org.apache.arrow:flight-core:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    org.slf4j:slf4j-api:jar:1.7.25:runtime
    [WARNING]    io.netty:netty-common:jar:4.1.82.Final:runtime
    [WARNING]    joda-time:joda-time:jar:2.10.14:runtime
    [WARNING]    org.apache.calcite.avatica:avatica:jar:1.18.0:runtime
    [WARNING]    com.google.protobuf:protobuf-java:jar:3.21.6:runtime
    [WARNING]    org.apache.arrow:arrow-vector:jar:12.0.0-SNAPSHOT:runtime
    [WARNING]    com.google.guava:guava:jar:31.1-jre:runtime

    [...]

    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.0.1:analyze-only (analyze) on project flight-sql-jdbc-driver: Dependency problems found -> [Help 1]
```

```
    Caused by: java.lang.NullPointerException: Could not find test data path. Set the environment variable ARROW_TEST_DATA or the JVM property arrow.test.dataRoot.
      at java.util.Objects.requireNonNull(Objects.java:228)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getTestDataRoot(FlightSqlTestCertificates.java:40)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.getFlightTestDataRoot(FlightSqlTestCertificates.java:51)
      at org.apache.arrow.driver.jdbc.utils.FlightSqlTestCertificates.exampleTlsCerts(FlightSqlTestCertificates.java:60)
      at org.apache.arrow.driver.jdbc.ConnectionTlsTest.<clinit>(ConnectionTlsTest.java:59)
```

### Are these changes tested?

I've tested the build by running:
```
$HOME/.local/bin/archery docker run java-jni-manylinux-2014
```

I've also tested the resulting `./java/dataset/target/arrow-dataset-12.0.0-SNAPSHOT.jar` from running the command and have verified that GCS support is enabled.

### Are there any user-facing changes?

Yes, Java Arrow Dataset will now work with GCS.

* Closes: apache#35245

Authored-by: Henry Mai <[email protected]>
Signed-off-by: Sutou Kouhei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Java][Linux] Enable GCS for Arrow Dataset

4 participants