NIFI-15062 Add PutIcebergRecord Processor and Services #10400
+4,633
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
NIFI-15062 Adds a
PutIcebergRecord
Processor and several Controller Services that provide initial integration for storing records in Apache Iceberg tables.The Apache Iceberg ecosystem supports a wide variety of catalogs, storage providers, and file formats. The purpose of this pull request is to provide several Controller Service abstractions that enable extensible integration, with a specific implementation of each Controller Service. With the number of potential integration options, Apache NiFi should not necessarily implement support for every possible solution, but should provide extension points that enable focused types of integration. The
iceberg-api
library is the foundation for this approach.The
nifi-iceberg-bundle
includes multiple modules that have the following dependency hierarchy:nifi-iceberg-shared-nar
nifi-iceberg-services-api-nar
nifi-iceberg-processors-nar
nifi-iceberg-rest-catalog-nar
nifi-iceberg-aws-nar
nifi-iceberg-parquet-writer-nar
The
nifi-iceberg-shared-nar
contains theiceberg-api
andiceberg-core
libraries along with transitive dependencies.The
nifi-iceberg-services-api-nar
depends oniceberg-api
and incorporates the Apache NiFi Controller Service interfaces that align withiceberg-api
interfaces.The
nifi-iceberg-processors-nar
contains thePutIcebergRecord
Processor, which references properties for the following Controller Services:IcebergCatalog
IcebergWriter
IcebergFileIOProvider
These three interfaces define the primary extension points for external integration.
The
nifi-iceberg-rest-catalog-nar
contains theRESTIcebergCatalog
implementation of theIcebergCatalog
Controller Service. This implementation configures theRESTSessionCatalog
from theiceberg-core
library and supports Catalog Authentication using OAuth2 with Client Credentials or Bearer Tokens. Building on theiceberg-core
library provided innifi-iceberg-shared-nar
, thenifi-iceberg-rest-catalog-nar
does not have any additional dependencies. TheRESTIcebergCatalog
defines aFileIO Provider
property that supports configurable Controller Services for IcebergFileIO
implementations.The
nifi-iceberg-aws-nar
contains theS3IcebergFileIOProvider
which configures and returns the IcebergS3FileIO
class. Support for S3 requires a number of AWS SDK 2 libraries, which is one of the primary reasons for separate packaging ofFileIOProvider
implementations. The S3 implementation supports configurable authentication using Basic or Session Credentials, as well as Vended Credentials, where the REST Catalog is expected to provide the required credentials.The
nifi-iceberg-parquet-writer-nar
contains theParquetIcebergWriter
Controller Service, supporting Apache Parquet serialization. Apache Parquet has a number of transitive dependencies, including a dependency on thehadoop-common
library. The NAR packaging excludes many unnecessary transitive dependencies and has an explicit list of dependencies required at runtime for Parquet serialization.This implementation structure and Controller Service design strategy should serve as the basis for additional storage provider implementations. With Apache Parquet being the predominant format for Apache Iceberg, direct support for other file formats may not be necessary. The variety of Iceberg REST Catalog implementations may require additional configuration options in the future, but the core
IcebergCatalog
Controller Service abstraction provides a decoupled strategy for future implementation.Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000
NIFI-00000
Pull Request Formatting
main
branchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
./mvnw clean install -P contrib-check
Licensing
LICENSE
andNOTICE
filesDocumentation