Skip to content

Conversation

xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Aug 22, 2025

Overview

  • Adds full Pinot Proxy routing for controller/broker HTTP and server gRPC requests.
  • Introduces unified secureMode to enable HTTPS and gRPC TLS together, with explicit overrides.
  • Centralizes auth header handling and proxy URI parsing for consistency.
  • Aligns BIG_DECIMAL value conversion with declared Decimal(38,18) schema.
  • Updates docs and examples.

Key highlights

  • Pinot Proxy
    • HTTP requests include FORWARD_HOST/FORWARD_PORT headers.
    • gRPC requests include forward_host/forward_port metadata.
    • Works with HTTPS and authentication.
  • Unified security
    • New option: secureMode=true ⇒ HTTPS on, gRPC TLS on by default.
    • Explicit overrides still win: useHttps for REST; grpc.use-plain-text for gRPC.
  • Reliability and consistency
    • AuthUtils ensures idempotent “Authorization: Bearer …” header handling across HTTP and gRPC.
    • NetUtils.parseHostPort(host[:port], secure) used by HTTP and gRPC paths to parse proxy URIs consistently.
    • Fixed redundant targetUrl logic in PinotClusterClient.getInstanceInfo.

New/updated configuration

  • Proxy
    • proxy.enabled: route controller/broker/gRPC via Pinot Proxy (default: false)
    • grpc.proxy-uri: Pinot Proxy host[:port] for gRPC
  • Security
    • secureMode: unified switch for HTTPS + gRPC TLS (default: false)
    • useHttps: REST HTTPS override (default: false)
    • grpc.use-plain-text: gRPC plaintext override (default: true)
  • gRPC TLS
    • grpc.port
    • grpc.max-inbound-message-size
    • grpc.tls.keystore-type|path|password
    • grpc.tls.truststore-type|path|password
    • grpc.tls.ssl-provider

Usage examples

  • Proxy-only
val df = spark.read
  .format("pinot")
  .option("table", "airlineStats")
  .option("tableType", "offline")
  .option("controller", "pinot-proxy:8080")
  .option("proxy.enabled", "true")
  .load()
  • Unified secure mode (HTTPS + gRPC TLS)
val df = spark.read
  .format("pinot")
  .option("table", "airlineStats")
  .option("tableType", "offline")
  .option("secureMode", "true")
  .load()
  • Explicit gRPC TLS + proxy
val df = spark.read
  .format("pinot")
  .option("table", "airlineStats")
  .option("tableType", "offline")
  .option("proxy.enabled", "true")
  .option("grpc.proxy-uri", "pinot-proxy:8094")
  .option("grpc.use-plain-text", "false")
  .option("grpc.tls.keystore-path", "/path/to/grpc-keystore.jks")
  .option("grpc.tls.keystore-password", "keystore-password")
  .load()

Architecture changes

  • New
    • AuthUtils: normalized Bearer auth header construction for HTTP and gRPC.
    • NetUtils: parseHostPort(host[:port], secure) with secure defaults (443/80).
  • Enhanced
    • HttpUtils: HTTPS client configuration + proxy forwarding headers.
    • PinotServerDataFetcher: gRPC proxy URI support via NetUtils; TLS properties propagated.
    • PinotGrpcServerDataFetcher: uses NetUtils for proxy, forwards auth + proxy metadata.
    • PinotClusterClient: simplified targetUrl logic; proxy-aware HTTP calls.
    • PinotDataSourceReadOptions: adds secureMode; derives gRPC plaintext from secureMode unless explicitly overridden.
    • DataExtractor (Spark 3): BIG_DECIMAL values mapped to Decimal(38,18) with HALF_UP rounding to match schema.

Documentation

  • Spark 3 README
    • New “Security Configuration” with secureMode and explicit override guidance.
    • gRPC/Proxy examples and options tables updated.
    • Data Types note: BIG_DECIMAL → Decimal(38,18) with rounding.
  • Read model doc updated for controller/broker schema inference and new options.
  • Spark 2 README: security notes aligned with Spark 3 (where applicable).

Backward compatibility

  • Existing configurations continue to work unchanged.
  • secureMode is additive and optional.
  • Explicit useHttps/grpc.use-plain-text continue to override defaults.

Security best practices

  • Prefer secureMode in production to enable HTTPS + gRPC TLS.
  • Provide keystore/truststore and avoid trust-all in production.
  • Use least-privilege tokens and rotate regularly.

@codecov-commenter
Copy link

codecov-commenter commented Aug 22, 2025

Codecov Report

❌ Patch coverage is 38.75740% with 207 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.42%. Comparing base (48de81a) to head (3ea2ccc).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...ot/connector/spark/common/PinotClusterClient.scala 0.00% 58 Missing ⚠️
...r/spark/common/reader/PinotServerDataFetcher.scala 0.00% 41 Missing ⚠️
...pache/pinot/connector/spark/common/HttpUtils.scala 29.82% 34 Missing and 6 partials ⚠️
...ark/common/reader/PinotGrpcServerDataFetcher.scala 0.00% 34 Missing ⚠️
...pache/pinot/connector/spark/common/GrpcUtils.scala 67.64% 5 Missing and 6 partials ⚠️
...pache/pinot/connector/spark/common/AuthUtils.scala 0.00% 10 Missing ⚠️
...apache/pinot/connector/spark/common/NetUtils.scala 0.00% 9 Missing ⚠️
...ctor/spark/common/PinotDataSourceReadOptions.scala 96.80% 1 Missing and 2 partials ⚠️
...k/common/reader/PinotAbstractPartitionReader.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #16666      +/-   ##
============================================
+ Coverage     63.40%   63.42%   +0.01%     
- Complexity     1381     1384       +3     
============================================
  Files          3050     3053       +3     
  Lines        178254   178581     +327     
  Branches      27306    27367      +61     
============================================
+ Hits         113021   113262     +241     
- Misses        56530    56614      +84     
- Partials       8703     8705       +2     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.36% <38.75%> (-0.03%) ⬇️
java-21 63.39% <38.75%> (+30.07%) ⬆️
temurin 63.42% <38.75%> (+0.01%) ⬆️
unittests 63.41% <38.75%> (+0.01%) ⬆️
unittests1 56.50% <ø> (+0.05%) ⬆️
unittests2 33.35% <38.75%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…nnector

- Add HTTPS/TLS support with configurable SSL settings
  - useHttps: Enable HTTPS connections
  - keystorePath/keystorePassword: Client certificate configuration
  - truststorePath/truststorePassword: Trust store configuration
  - Automatic HTTP/HTTPS client selection based on URI scheme

- Add authentication header support for secure API access
  - authToken: Authentication token with auto Bearer prefix
  - authHeader: Custom authentication header name
  - Support for Bearer tokens, API keys, and custom auth headers
  - Smart defaults: authToken alone uses 'Authorization: Bearer <token>'

- Update HttpUtils with SSL/TLS context configuration
  - Separate HTTP and HTTPS clients with connection pooling
  - Trust-all mode when no truststore provided (with warning)
  - Comprehensive error handling and validation

- Update PinotClusterClient API methods to support HTTPS and auth
  - getTableSchema, getBrokerInstances, getTimeBoundaryInfo
  - getRoutingTable, getInstanceInfo with auth header passthrough
  - Backward compatible with optional parameters

- Add server-side TLS configuration for PinotServerDataFetcher
  - Configure QueryRouter with TLS settings
  - Support TLS port configuration for server instances

- Comprehensive test coverage
  - HTTPS configuration parsing and validation tests
  - Authentication header configuration tests
  - SSL/TLS client configuration tests
  - Error handling and edge case tests

- Update documentation with usage examples
  - HTTPS configuration examples with certificates
  - Authentication examples for Bearer tokens, API keys
  - Security best practices and production recommendations

All tests passing (30/30) with backward compatibility maintained.
@xiangfu0 xiangfu0 force-pushed the http-support-spark-connector branch 5 times, most recently from 0759493 to 431aa66 Compare September 4, 2025 14:54
@xiangfu0 xiangfu0 requested a review from Copilot September 4, 2025 15:08
@xiangfu0 xiangfu0 force-pushed the http-support-spark-connector branch from 431aa66 to 51606a0 Compare September 4, 2025 15:15
Copilot

This comment was marked as outdated.

@xiangfu0 xiangfu0 force-pushed the http-support-spark-connector branch from 51606a0 to dcf07d2 Compare September 4, 2025 16:40
@xiangfu0 xiangfu0 requested a review from Copilot September 4, 2025 16:44
Copilot

This comment was marked as outdated.

@xiangfu0 xiangfu0 force-pushed the http-support-spark-connector branch from dcf07d2 to c1aab66 Compare September 4, 2025 17:06
@xiangfu0 xiangfu0 requested a review from Copilot September 4, 2025 17:10
@xiangfu0 xiangfu0 changed the title Add comprehensive Pinot Proxy and gRPC support to pinot-spark-3-connector Add Pinot Proxy, unified secureMode, and comprehensive gRPC support to Spark-3 connector Sep 4, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive Pinot Proxy and gRPC support to the pinot-spark-3-connector, enabling secure production deployments where proxy is the only exposed endpoint. The implementation provides feature parity with Trino's Pinot connector and includes extensive configuration options for HTTPS, authentication, and gRPC TLS.

Key Changes

  • Proxy Support: Added proxy.enabled configuration and proxy forwarding headers for all Pinot API requests
  • gRPC Configuration: Complete gRPC setup with TLS support, proxy forwarding, and configurable connection parameters
  • Security Features: HTTPS support with SSL/TLS configuration, authentication headers, and unified secureMode option

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
PinotDataSourceReadOptions.scala Added 19 new configuration fields for proxy, HTTPS, authentication, and gRPC settings
PinotClusterClient.scala Updated all API methods to support HTTPS, authentication, and proxy parameters
GrpcUtils.scala New utility class for gRPC channel creation with TLS and proxy metadata support
HttpUtils.scala Enhanced with HTTPS client configuration and proxy header support
AuthUtils.scala New utility for consistent authentication header handling
NetUtils.scala New helper for host:port parsing with secure defaults
PinotGrpcServerDataFetcher.scala Updated to support gRPC proxy forwarding and authentication
PinotServerDataFetcher.scala Enhanced with comprehensive TLS and proxy configuration
DataExtractor.scala Added support for BIG_DECIMAL and JSON data types
Test files Comprehensive test coverage for new proxy and gRPC functionality

…-connector

🔧 Pinot Proxy Support:
- Add proxy.enabled configuration option (default: false)
- Implement HTTP proxy forwarding with FORWARD_HOST and FORWARD_PORT headers
- Support proxy routing for all controller and broker API requests
- Enable proxy-based secure cluster access where proxy is the only exposed endpoint

🚀 Comprehensive gRPC Configuration:
- Add grpc.port configuration (default: 8090)
- Add grpc.max-inbound-message-size configuration (default: 128MB)
- Add grpc.use-plain-text configuration (default: true)
- Support grpc.tls.keystore-type, grpc.tls.keystore-path, grpc.tls.keystore-password
- Support grpc.tls.truststore-type, grpc.tls.truststore-path, grpc.tls.truststore-password
- Add grpc.tls.ssl-provider configuration (default: JDK)
- Add grpc.proxy-uri for gRPC proxy endpoint configuration

🔒 gRPC Proxy Integration:
- Implement gRPC proxy support with FORWARD_HOST and FORWARD_PORT metadata
- Create comprehensive GrpcUtils for channel management and proxy headers
- Support secure gRPC communication through proxy infrastructure
- Enable TLS/SSL configuration for gRPC connections

🏗️ Architecture Updates:
- Update PinotDataSourceReadOptions with all new proxy and gRPC fields
- Enhance PinotClusterClient with proxy-aware API methods
- Add HttpUtils.sendGetRequestWithProxyHeaders() for proxy HTTP requests
- Update PinotServerDataFetcher with gRPC proxy configuration support
- Modify all Spark DataSource V2 components to pass proxy parameters

🧪 Comprehensive Testing:
- Add 8 new test cases for proxy and gRPC configuration parsing
- Create GrpcUtilsTest for gRPC channel creation and proxy metadata
- Update existing tests to include new configuration parameters
- Achieve 39/39 passing tests with full backward compatibility

📚 Enhanced Documentation:
- Add comprehensive Pinot Proxy Support section with examples
- Add detailed gRPC Configuration section with TLS examples
- Include Security Best Practices for production deployments
- Provide proxy + gRPC + HTTPS + authentication integration examples

🎯 Production Features:
- Full backward compatibility - existing code works unchanged
- Based on Trino PR #13015 reference implementation
- Supports secure production deployments with proxy-only access
- Comprehensive error handling and validation
- Performance optimizations for gRPC connections

All tests passing (39/39) with complete feature parity to Trino's implementation.
@xiangfu0 xiangfu0 force-pushed the http-support-spark-connector branch from c1aab66 to 3ea2ccc Compare September 4, 2025 17:24
@xiangfu0 xiangfu0 merged commit 9ead8a9 into master Sep 4, 2025
18 checks passed
@xiangfu0 xiangfu0 deleted the http-support-spark-connector branch September 4, 2025 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants