pixels-flink-source is a custom Apache Flink Source Connector designed to ingest data in real-time from a Pixels-sink server using gRPC polling. It enables seamless data synchronization from Pixels to modern data lakes, specifically supporting Apache Iceberg (with AWS Glue Catalog) and Apache Paimon, backed by S3 storage.
- RPC Source: High-performance data fetching via gRPC polling mechanism.
- Schema Mapping: Configurable source schema definition to map Pixels data types to Flink RowData.
- Multi-Sink Support: Integrated support for writing to:
- Apache Iceberg: Supports AWS Glue Catalog and S3 storage.
- Apache Paimon: Supports S3 filesystem catalog.
- Cloud Native: Built with AWS SDK v2, ready for cloud deployments with S3 and Glue integration.
- Configurable: Fully driven by
pixels-client.propertiesfor easy management of connections, schemas, and catalogs.
- Java 11+
- Maven 3.6+
- AWS Credentials (for S3 and Glue access)
Clone the repository and build the project using Maven:
mvn clean packageThe build artifact will be located at target/pixels-flink-source-1.0-SNAPSHOT.jar.
The application is configured via src/main/resources/pixels-client.properties. You can modify this file before building, or provide a modified version at runtime on the classpath.
# Pixels Server Connection
pixels.server.host=localhost
pixels.server.port=50051
schema.name=public
table.name=orders
# Source Schema Definition
# Format: col1:TYPE,col2:TYPE
# Supported Types: INT, BIGINT, STRING, FLOAT, DOUBLE, BOOLEAN
source.schema=id:INT,data:STRING,price:DOUBLE
# Flink Checkpoint Interval (Required for streaming sinks)
checkpoint.interval.ms=10000Select the target sink type:
# Options: iceberg, paimon
sink.type=iceberg# Catalog Type: glue (Recommended for AWS), hadoop, rest
iceberg.catalog.type=glue
# S3 Warehouse Path
iceberg.catalog.warehouse=s3://your-bucket/iceberg/
iceberg.table.name=iceberg_orders
# Optional: AWS Region (if not set in ~/.aws/config)
# iceberg.catalog.glue.region=us-east-1# Catalog Type
paimon.catalog.type=paimon
# S3 Warehouse Path
paimon.catalog.warehouse=s3://your-bucket/paimon/
paimon.table.name=paimon_ordersTo use S3 and AWS Glue, you must configure AWS credentials. The application uses the default AWS credential provider chain, looking for credentials in the following order:
- Environment Variables:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION - System Properties:
aws.accessKeyId,aws.secretKey - Credentials File:
~/.aws/credentials - Config File:
~/.aws/config - Instance Profile: EC2/EKS IAM Role
Ensure your IAM identity has permissions for:
- S3:
GetObject,PutObject,ListBucket,DeleteObjecton your warehouse bucket. - Glue:
CreateDatabase,CreateTable,GetTable,UpdateTable, etc., if using Iceberg with Glue Catalog.