You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Add comprehensive Pinot Proxy and gRPC support to pinot-spark-3-connector
🔧 Pinot Proxy Support:
- Add proxy.enabled configuration option (default: false)
- Implement HTTP proxy forwarding with FORWARD_HOST and FORWARD_PORT headers
- Support proxy routing for all controller and broker API requests
- Enable proxy-based secure cluster access where proxy is the only exposed endpoint
🚀 Comprehensive gRPC Configuration:
- Add grpc.port configuration (default: 8090)
- Add grpc.max-inbound-message-size configuration (default: 128MB)
- Add grpc.use-plain-text configuration (default: true)
- Support grpc.tls.keystore-type, grpc.tls.keystore-path, grpc.tls.keystore-password
- Support grpc.tls.truststore-type, grpc.tls.truststore-path, grpc.tls.truststore-password
- Add grpc.tls.ssl-provider configuration (default: JDK)
- Add grpc.proxy-uri for gRPC proxy endpoint configuration
🔒 gRPC Proxy Integration:
- Implement gRPC proxy support with FORWARD_HOST and FORWARD_PORT metadata
- Create comprehensive GrpcUtils for channel management and proxy headers
- Support secure gRPC communication through proxy infrastructure
- Enable TLS/SSL configuration for gRPC connections
🏗️ Architecture Updates:
- Update PinotDataSourceReadOptions with all new proxy and gRPC fields
- Enhance PinotClusterClient with proxy-aware API methods
- Add HttpUtils.sendGetRequestWithProxyHeaders() for proxy HTTP requests
- Update PinotServerDataFetcher with gRPC proxy configuration support
- Modify all Spark DataSource V2 components to pass proxy parameters
🧪 Comprehensive Testing:
- Add 8 new test cases for proxy and gRPC configuration parsing
- Create GrpcUtilsTest for gRPC channel creation and proxy metadata
- Update existing tests to include new configuration parameters
- Achieve 39/39 passing tests with full backward compatibility
📚 Enhanced Documentation:
- Add comprehensive Pinot Proxy Support section with examples
- Add detailed gRPC Configuration section with TLS examples
- Include Security Best Practices for production deployments
- Provide proxy + gRPC + HTTPS + authentication integration examples
🎯 Production Features:
- Full backward compatibility - existing code works unchanged
- Based on Trino PR #13015 reference implementation
- Supports secure production deployments with proxy-only access
- Comprehensive error handling and validation
- Performance optimizations for gRPC connections
All tests passing (39/39) with complete feature parity to Trino's implementation.
// Explicit HTTPS only (gRPC remains plaintext by default)
82
+
valdata= spark.read
83
+
.format("pinot")
84
+
.option("table", "airlineStats")
85
+
.option("tableType", "offline")
86
+
.option("useHttps", "true")
87
+
.load()
88
+
89
+
// Explicit gRPC TLS only (REST remains HTTP by default)
90
+
valdata= spark.read
91
+
.format("pinot")
92
+
.option("table", "airlineStats")
93
+
.option("tableType", "offline")
94
+
.option("grpc.use-plain-text", "false")
95
+
.load()
96
+
```
97
+
98
+
### HTTPS Configuration
99
+
100
+
When HTTPS is enabled (either via `secureMode=true` or `useHttps=true`), you can configure keystore/truststore as needed:
66
101
67
102
```scala
68
103
valdata= spark.read
@@ -81,7 +116,8 @@ val data = spark.read
81
116
82
117
| Option | Description | Required | Default |
83
118
|--------|-------------|----------|---------|
84
-
|`useHttps`| Enable HTTPS connections | No |`false`|
119
+
|`secureMode`| Unified switch to enable HTTPS and gRPC TLS | No |`false`|
120
+
|`useHttps`| Enable HTTPS connections (overrides `secureMode` for REST) | No |`false`|
85
121
|`keystorePath`| Path to client keystore file (JKS format) | No | None |
86
122
|`keystorePassword`| Password for the keystore | No | None |
87
123
|`truststorePath`| Path to truststore file (JKS format) | No | None |
@@ -130,6 +166,107 @@ val data = spark.read
130
166
131
167
**Note:** If only `authToken` is provided without `authHeader`, the connector will automatically use `Authorization: Bearer <token>`.
132
168
169
+
## Pinot Proxy Support
170
+
171
+
The connector supports Pinot Proxy for secure cluster access where the proxy is the only exposed endpoint. When proxy is enabled, all HTTP requests to controllers/brokers and gRPC requests to servers are routed through the proxy.
| controller | Pinot controller url and port. Input should be `url:port` format without schema. Connector does not support `https` schema for now.| No | localhost:9000 |
136
-
| broker | Pinot broker url and port. Input should be `url:port` format without schema. If not specified, connector will find broker instances of table automatically. Connector does not support `https` schema for now| No | Fetch broker instances of table from Pinot Controller |
135
+
| controller | Pinot controller `host:port` (schema inferred from `useHttps`/`secureMode`) | No | localhost:9000 |
136
+
| broker | Pinot broker `host:port` (schema inferred from `useHttps`/`secureMode`) | No | Fetch broker instances of table from Pinot Controller |
137
137
| usePushDownFilters | Push filters to pinot servers or not. If true, data exchange between pinot server and spark will be minimized. | No | true |
138
138
| segmentsPerSplit | Represents the maximum segment count that will be scanned by pinot server in one connection | No | 3 |
139
139
| pinotServerTimeoutMs | The maximum timeout(ms) to get data from pinot server | No | 10 mins |
140
140
| useGrpcServer | Boolean value to enable reads via gRPC. This option is more memory efficient both on Pinot server and Spark executor side because it utilizes streaming. Requires gRPC to be enabled on Pinot server. | No | false |
141
141
| queryOptions | Comma separated list of Pinot query options (e.g. "enableNullHandling=true,skipUpsert=true") | No | "" |
142
142
| failOnInvalidSegments | Fail the read operation if response metadata indicates invalid segments | No | false |
143
+
| secureMode | Unified switch to enable HTTPS and gRPC TLS (explicit `useHttps`/`grpc.use-plain-text` take precedence) | No | false |
144
+
| useHttps | Enable HTTPS for REST calls (overrides `secureMode` for REST) | No | false |
145
+
| grpc.use-plain-text | Use plaintext for gRPC (overrides `secureMode` for gRPC) | No | true |
Copy file name to clipboardExpand all lines: pinot-connectors/pinot-spark-3-connector/src/main/scala/org/apache/pinot/connector/spark/v3/datasource/DataExtractor.scala
Copy file name to clipboardExpand all lines: pinot-connectors/pinot-spark-3-connector/src/main/scala/org/apache/pinot/connector/spark/v3/datasource/PinotDataSource.scala
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,7 @@ class PinotDataSource extends TableProvider with DataSourceRegister {
Copy file name to clipboardExpand all lines: pinot-connectors/pinot-spark-3-connector/src/main/scala/org/apache/pinot/connector/spark/v3/datasource/PinotScan.scala
Copy file name to clipboardExpand all lines: pinot-connectors/pinot-spark-3-connector/src/main/scala/org/apache/pinot/connector/spark/v3/datasource/PinotScanBuilder.scala
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -45,7 +45,7 @@ class PinotScanBuilder(readParameters: PinotDataSourceReadOptions)
Copy file name to clipboardExpand all lines: pinot-connectors/pinot-spark-3-connector/src/main/scala/org/apache/pinot/connector/spark/v3/datasource/SparkToPinotTypeTranslator.scala
0 commit comments