-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add Pinot Proxy, unified secureMode, and comprehensive gRPC support to Spark-3 connector #16666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #16666 +/- ##
============================================
+ Coverage 63.40% 63.42% +0.01%
- Complexity 1381 1384 +3
============================================
Files 3050 3053 +3
Lines 178254 178581 +327
Branches 27306 27367 +61
============================================
+ Hits 113021 113262 +241
- Misses 56530 56614 +84
- Partials 8703 8705 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
...spark-common/src/main/scala/org/apache/pinot/connector/spark/common/PinotClusterClient.scala
Show resolved
Hide resolved
…nnector - Add HTTPS/TLS support with configurable SSL settings - useHttps: Enable HTTPS connections - keystorePath/keystorePassword: Client certificate configuration - truststorePath/truststorePassword: Trust store configuration - Automatic HTTP/HTTPS client selection based on URI scheme - Add authentication header support for secure API access - authToken: Authentication token with auto Bearer prefix - authHeader: Custom authentication header name - Support for Bearer tokens, API keys, and custom auth headers - Smart defaults: authToken alone uses 'Authorization: Bearer <token>' - Update HttpUtils with SSL/TLS context configuration - Separate HTTP and HTTPS clients with connection pooling - Trust-all mode when no truststore provided (with warning) - Comprehensive error handling and validation - Update PinotClusterClient API methods to support HTTPS and auth - getTableSchema, getBrokerInstances, getTimeBoundaryInfo - getRoutingTable, getInstanceInfo with auth header passthrough - Backward compatible with optional parameters - Add server-side TLS configuration for PinotServerDataFetcher - Configure QueryRouter with TLS settings - Support TLS port configuration for server instances - Comprehensive test coverage - HTTPS configuration parsing and validation tests - Authentication header configuration tests - SSL/TLS client configuration tests - Error handling and edge case tests - Update documentation with usage examples - HTTPS configuration examples with certificates - Authentication examples for Bearer tokens, API keys - Security best practices and production recommendations All tests passing (30/30) with backward compatibility maintained.
0759493
to
431aa66
Compare
431aa66
to
51606a0
Compare
51606a0
to
dcf07d2
Compare
dcf07d2
to
c1aab66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive Pinot Proxy and gRPC support to the pinot-spark-3-connector, enabling secure production deployments where proxy is the only exposed endpoint. The implementation provides feature parity with Trino's Pinot connector and includes extensive configuration options for HTTPS, authentication, and gRPC TLS.
Key Changes
- Proxy Support: Added
proxy.enabled
configuration and proxy forwarding headers for all Pinot API requests - gRPC Configuration: Complete gRPC setup with TLS support, proxy forwarding, and configurable connection parameters
- Security Features: HTTPS support with SSL/TLS configuration, authentication headers, and unified
secureMode
option
Reviewed Changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
PinotDataSourceReadOptions.scala |
Added 19 new configuration fields for proxy, HTTPS, authentication, and gRPC settings |
PinotClusterClient.scala |
Updated all API methods to support HTTPS, authentication, and proxy parameters |
GrpcUtils.scala |
New utility class for gRPC channel creation with TLS and proxy metadata support |
HttpUtils.scala |
Enhanced with HTTPS client configuration and proxy header support |
AuthUtils.scala |
New utility for consistent authentication header handling |
NetUtils.scala |
New helper for host:port parsing with secure defaults |
PinotGrpcServerDataFetcher.scala |
Updated to support gRPC proxy forwarding and authentication |
PinotServerDataFetcher.scala |
Enhanced with comprehensive TLS and proxy configuration |
DataExtractor.scala |
Added support for BIG_DECIMAL and JSON data types |
Test files | Comprehensive test coverage for new proxy and gRPC functionality |
...c/main/scala/org/apache/pinot/connector/spark/common/reader/PinotGrpcServerDataFetcher.scala
Show resolved
Hide resolved
...n/src/main/scala/org/apache/pinot/connector/spark/common/reader/PinotServerDataFetcher.scala
Outdated
Show resolved
Hide resolved
...mmon/src/main/scala/org/apache/pinot/connector/spark/common/PinotDataSourceReadOptions.scala
Show resolved
Hide resolved
...c/main/scala/org/apache/pinot/connector/spark/common/reader/PinotGrpcServerDataFetcher.scala
Outdated
Show resolved
Hide resolved
…-connector 🔧 Pinot Proxy Support: - Add proxy.enabled configuration option (default: false) - Implement HTTP proxy forwarding with FORWARD_HOST and FORWARD_PORT headers - Support proxy routing for all controller and broker API requests - Enable proxy-based secure cluster access where proxy is the only exposed endpoint 🚀 Comprehensive gRPC Configuration: - Add grpc.port configuration (default: 8090) - Add grpc.max-inbound-message-size configuration (default: 128MB) - Add grpc.use-plain-text configuration (default: true) - Support grpc.tls.keystore-type, grpc.tls.keystore-path, grpc.tls.keystore-password - Support grpc.tls.truststore-type, grpc.tls.truststore-path, grpc.tls.truststore-password - Add grpc.tls.ssl-provider configuration (default: JDK) - Add grpc.proxy-uri for gRPC proxy endpoint configuration 🔒 gRPC Proxy Integration: - Implement gRPC proxy support with FORWARD_HOST and FORWARD_PORT metadata - Create comprehensive GrpcUtils for channel management and proxy headers - Support secure gRPC communication through proxy infrastructure - Enable TLS/SSL configuration for gRPC connections 🏗️ Architecture Updates: - Update PinotDataSourceReadOptions with all new proxy and gRPC fields - Enhance PinotClusterClient with proxy-aware API methods - Add HttpUtils.sendGetRequestWithProxyHeaders() for proxy HTTP requests - Update PinotServerDataFetcher with gRPC proxy configuration support - Modify all Spark DataSource V2 components to pass proxy parameters 🧪 Comprehensive Testing: - Add 8 new test cases for proxy and gRPC configuration parsing - Create GrpcUtilsTest for gRPC channel creation and proxy metadata - Update existing tests to include new configuration parameters - Achieve 39/39 passing tests with full backward compatibility 📚 Enhanced Documentation: - Add comprehensive Pinot Proxy Support section with examples - Add detailed gRPC Configuration section with TLS examples - Include Security Best Practices for production deployments - Provide proxy + gRPC + HTTPS + authentication integration examples 🎯 Production Features: - Full backward compatibility - existing code works unchanged - Based on Trino PR #13015 reference implementation - Supports secure production deployments with proxy-only access - Comprehensive error handling and validation - Performance optimizations for gRPC connections All tests passing (39/39) with complete feature parity to Trino's implementation.
c1aab66
to
3ea2ccc
Compare
Overview
Key highlights
New/updated configuration
Usage examples
Architecture changes
Documentation
Backward compatibility
Security best practices