Skip to content

Conversation

Aaronontheweb
Copy link
Member

@Aaronontheweb Aaronontheweb commented Oct 2, 2025

Problem

Akka.Remote server starts successfully even when the application lacks permissions to access the SSL certificate's private key. The server appears healthy but fails when clients attempt to connect, causing:

  • Hard-to-diagnose TLS handshake failures during runtime
  • Silent failures that only appear when connections arrive
  • Poor operational experience for administrators

Customer Impact

A production customer reported this issue where:

  • Node with certificate but no private key permissions can join an existing cluster
  • Server appears to start successfully
  • TLS errors only appear when other nodes try to connect
  • Misleading error messages make diagnosis difficult

Root Cause

Certificate loading in DotNettyTransportSettings only validates that the certificate EXISTS in the Windows certificate store, not whether the application can ACCESS the private key.

// Current code - vulnerable
private SslSettings(string certificateThumbprint, ...)
{
    var find = store.Certificates.Find(X509FindType.FindByThumbprint, ...);
    Certificate = find[0];  // ← NO PRIVATE KEY VALIDATION!
}

Key insight: Windows separates certificate storage from private key storage with individual ACL permissions. Certificate.HasPrivateKey may return true even when the application lacks ACL permissions to use the private key.

Solution

1. Add ValidateCertificate() method to SslSettings

  • Checks Certificate.HasPrivateKey
  • Actually tests private key access with GetRSAPrivateKey() (not just presence)
  • Throws ConfigurationException with clear error message on failure

2. Call validation in Listen() before server bind

  • Ensures fail-fast behavior at startup
  • Prevents server from running in broken state
  • Provides clear error messages for administrators

3. Comprehensive test coverage

  • Server fails at startup with inaccessible private key ✅
  • Server starts successfully with valid certificate ✅
  • Server starts successfully without SSL ✅
  • Updated existing tests to validate fail-fast (removed incorrect runtime failure tests)

Changes

Files Modified

  1. Akka.Remote/Transport/DotNetty/DotNettyTransportSettings.cs

    • Added ValidateCertificate() method to SslSettings class
  2. Akka.Remote/Transport/DotNetty/DotNettyTransport.cs

    • Added validation call in Listen() before server socket binds
  3. Akka.Remote.Tests/Transport/DotNettyCertificateValidationSpec.cs

    • New test file with comprehensive validation tests
  4. Akka.Remote.Tests/Transport/DotNettyTlsHandshakeFailureSpec.cs

    • Updated to test fail-fast behavior
    • Removed tests that validated incorrect behavior

Impact

Breaking Change (Expected)

Existing misconfigured deployments will now fail at startup instead of silently starting with broken TLS. This is correct behavior - fail-fast is better than silent failure.

Migration Guide

If ActorSystem fails to start with:

ConfigurationException: SSL certificate private key exists but cannot be accessed.
Verify application user has permissions to the private key in certificate store.

Fix: Grant the application user read permissions to the certificate's private key:

# Find the private key file
$cert = Get-ChildItem Cert:\LocalMachine\My\<thumbprint>
$keyPath = $cert.PrivateKey.CspKeyContainerInfo.UniqueKeyContainerName
$keyFile = Get-ChildItem "$env:ProgramData\Microsoft\Crypto\RSA\MachineKeys\$keyPath"

# Grant permissions (replace DOMAIN\AppUser with your app identity)
icacls $keyFile.FullName /grant "DOMAIN\AppUser:R"

Test Results

DotNettyCertificateValidationSpec:
✅ Server_should_fail_at_startup_with_certificate_without_private_key
✅ Server_should_start_successfully_with_valid_certificate
✅ Server_should_start_successfully_without_ssl

DotNettyTlsHandshakeFailureSpec:
✅ Server_should_fail_at_startup_with_certificate_without_private_key (updated)
✅ Client_side_tls_handshake_failure_should_shutdown_client

Related

Checklist

  • Code changes implemented
  • Tests added/updated
  • All tests passing
  • Clear error messages for administrators
  • Documentation in commit messages
  • Backward compatible (fail-fast only affects misconfigured systems)

Next Steps

Optional enhancement (separate PR):

  • Mutual TLS enforcement to require client certificates with private keys
  • Defense-in-depth security configuration

**Problem**: Akka.Remote server starts successfully even when the application
lacks permissions to access the SSL certificate's private key. The server appears
healthy but fails when clients attempt to connect, making issues hard to diagnose.

**Root Cause**: Certificate loading in DotNettyTransportSettings only validates
that the certificate EXISTS in the Windows certificate store, not whether the
application can ACCESS the private key. Private key access is checked separately
by Windows ACL, which can fail even when Certificate.HasPrivateKey returns true.

**Solution**:
1. Add ValidateCertificate() method to SslSettings class that:
   - Checks Certificate.HasPrivateKey
   - Actually tests private key access with GetRSAPrivateKey() (not just presence)
   - Throws ConfigurationException with clear error message on failure

2. Call validation in Listen() method before server socket binds:
   - Ensures fail-fast behavior at startup
   - Prevents server from running in broken state
   - Provides clear error message for administrators

3. Add comprehensive tests:
   - Server should fail at startup with inaccessible private key
   - Server should start successfully with valid certificate
   - Server should start successfully without SSL

**Impact**:
- Existing misconfigured deployments will now fail at startup (correct behavior)
- Clear error messages guide administrators to fix permissions
- No breaking changes for correctly configured systems
- Related to Freshdesk akkadotnet#538 (BNSF Railway)

Fixes akkadotnet#538
**Changes**:
1. Renamed first test to `Server_should_fail_at_startup_with_certificate_without_private_key`
   - Now validates that server FAILS AT STARTUP with bad certificate
   - Tests fail-fast behavior instead of runtime TLS handshake failure

2. Removed redundant `Server_side_tls_handshake_failure_should_shutdown_server` test
   - This test validated the OLD (incorrect) behavior where server starts successfully
   - Now impossible with fail-fast validation in place
   - Scenario already covered by the updated first test

3. Kept `Client_side_tls_handshake_failure_should_shutdown_client` unchanged
   - Still valid - tests client-side validation failure
   - Not affected by server startup validation

**Result**: Tests now validate correct fail-fast behavior at server startup
Copy link
Member Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detailed my changes

// This ensures fail-fast behavior if private key is inaccessible
if (Settings.EnableSsl)
{
Settings.Ssl.ValidateCertificate();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate the private key during server binding, which should trigger shutdown of the ActorSystem if it fails

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec is either somewhat redundant or unnecessary given the fail fast implementation - so it's been simplified.

Copy link
Contributor

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be tightened.

Addresses review feedback from @Arkatufus:

**Changes**:
1. Check both RSA and ECDSA private keys
   - SslStream supports both RSA and ECDSA certificates
   - GetRSAPrivateKey() returns null for ECDSA certs (and vice versa)
   - Validation now checks both key types to match TLS handler behavior

2. Use `using` statements for proper disposal
   - Prevents resource leaks if exception is thrown
   - Both rsaKey and ecdsaKey are properly disposed
   - Exception-safe resource management

**TLS Handler Relationship**:
The TLS handler uses `TlsHandler.Server(Settings.Ssl.Certificate)` which
internally extracts either RSA or ECDSA private keys via SslStream. Our
validation now matches this behavior by checking both key types.

**Behavior**:
- RSA certificate: GetRSAPrivateKey() succeeds, GetECDsaPrivateKey() returns null ✅
- ECDSA certificate: GetECDsaPrivateKey() succeeds, GetRSAPrivateKey() returns null ✅
- Neither accessible: Both return null, validation fails with clear error ✅
- Permission denied: CryptographicException caught, clear error message ✅
@Aaronontheweb
Copy link
Member Author

@Arkatufus Good catch on the ECDSA keys! I've updated the validation to check both RSA and ECDSA private keys.

Changes Made

1. Added ECDSA Key Validation

Now checking both key types that SslStream supports:

using (var rsaKey = Certificate.GetRSAPrivateKey())
using (var ecdsaKey = Certificate.GetECDsaPrivateKey())
{
    if (rsaKey == null && ecdsaKey == null)
        throw new ConfigurationException(...);
}

Behavior:

  • RSA cert: GetRSAPrivateKey() succeeds, GetECDsaPrivateKey() returns null ✅
  • ECDSA cert: GetECDsaPrivateKey() succeeds, GetRSAPrivateKey() returns null ✅
  • Neither accessible: Both return null → validation fails with clear error ✅
  • Permission error: CryptographicException caught → clear error message ✅

2. Improved Disposal Pattern

Changed from manual Dispose() calls to using statements to prevent resource leaks if an exception is thrown.

Relationship to TLS Handler

The TLS handler uses TlsHandler.Server(Settings.Ssl.Certificate) (line 378 of DotNettyTransport.cs), which internally uses SslStream. SslStream will extract either RSA or ECDSA private keys depending on the certificate type. Our validation now matches this behavior exactly.

Note on Future Key Types

Per .NET documentation, GetRSAPrivateKey() and GetECDsaPrivateKey() return null (not throw) when the key type doesn't match. If a certificate uses a different key type (like deprecated DSA), both methods return null and our validation correctly fails. If .NET adds new key types in the future, we'll need to update this validation.

Tests still passing ✅

Copy link
Contributor

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can call this "best effort" for a fail fast detection.

@Aaronontheweb Aaronontheweb removed this from the 1.5.52 milestone Oct 2, 2025
@Aaronontheweb
Copy link
Member Author

This is going into the dev branch but we'll need to back port it to v1.5

@Aaronontheweb Aaronontheweb merged commit 6115b67 into akkadotnet:dev Oct 2, 2025
7 of 11 checks passed
@Aaronontheweb Aaronontheweb deleted the fix-tls-certificate-validation branch October 2, 2025 19:57
Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this pull request Oct 2, 2025
…kkadotnet#7847)

* Fix: Validate SSL certificate private key access at server startup

**Problem**: Akka.Remote server starts successfully even when the application
lacks permissions to access the SSL certificate's private key. The server appears
healthy but fails when clients attempt to connect, making issues hard to diagnose.

**Root Cause**: Certificate loading in DotNettyTransportSettings only validates
that the certificate EXISTS in the Windows certificate store, not whether the
application can ACCESS the private key. Private key access is checked separately
by Windows ACL, which can fail even when Certificate.HasPrivateKey returns true.

**Solution**:
1. Add ValidateCertificate() method to SslSettings class that:
   - Checks Certificate.HasPrivateKey
   - Actually tests private key access with GetRSAPrivateKey() (not just presence)
   - Throws ConfigurationException with clear error message on failure

2. Call validation in Listen() method before server socket binds:
   - Ensures fail-fast behavior at startup
   - Prevents server from running in broken state
   - Provides clear error message for administrators

3. Add comprehensive tests:
   - Server should fail at startup with inaccessible private key
   - Server should start successfully with valid certificate
   - Server should start successfully without SSL

**Impact**:
- Existing misconfigured deployments will now fail at startup (correct behavior)
- Clear error messages guide administrators to fix permissions
- No breaking changes for correctly configured systems
- Related to Freshdesk akkadotnet#538 (BNSF Railway)

Fixes akkadotnet#538

* Update DotNettyTlsHandshakeFailureSpec to validate fail-fast behavior

**Changes**:
1. Renamed first test to `Server_should_fail_at_startup_with_certificate_without_private_key`
   - Now validates that server FAILS AT STARTUP with bad certificate
   - Tests fail-fast behavior instead of runtime TLS handshake failure

2. Removed redundant `Server_side_tls_handshake_failure_should_shutdown_server` test
   - This test validated the OLD (incorrect) behavior where server starts successfully
   - Now impossible with fail-fast validation in place
   - Scenario already covered by the updated first test

3. Kept `Client_side_tls_handshake_failure_should_shutdown_client` unchanged
   - Still valid - tests client-side validation failure
   - Not affected by server startup validation

**Result**: Tests now validate correct fail-fast behavior at server startup

* Add ECDSA private key validation and improve disposal pattern

Addresses review feedback from @Arkatufus:

**Changes**:
1. Check both RSA and ECDSA private keys
   - SslStream supports both RSA and ECDSA certificates
   - GetRSAPrivateKey() returns null for ECDSA certs (and vice versa)
   - Validation now checks both key types to match TLS handler behavior

2. Use `using` statements for proper disposal
   - Prevents resource leaks if exception is thrown
   - Both rsaKey and ecdsaKey are properly disposed
   - Exception-safe resource management

**TLS Handler Relationship**:
The TLS handler uses `TlsHandler.Server(Settings.Ssl.Certificate)` which
internally extracts either RSA or ECDSA private keys via SslStream. Our
validation now matches this behavior by checking both key types.

**Behavior**:
- RSA certificate: GetRSAPrivateKey() succeeds, GetECDsaPrivateKey() returns null ✅
- ECDSA certificate: GetECDsaPrivateKey() succeeds, GetRSAPrivateKey() returns null ✅
- Neither accessible: Both return null, validation fails with clear error ✅
- Permission denied: CryptographicException caught, clear error message ✅
Aaronontheweb added a commit that referenced this pull request Oct 2, 2025
…7847) (#7848)

* Fix: Validate SSL certificate private key access at server startup

**Problem**: Akka.Remote server starts successfully even when the application
lacks permissions to access the SSL certificate's private key. The server appears
healthy but fails when clients attempt to connect, making issues hard to diagnose.

**Root Cause**: Certificate loading in DotNettyTransportSettings only validates
that the certificate EXISTS in the Windows certificate store, not whether the
application can ACCESS the private key. Private key access is checked separately
by Windows ACL, which can fail even when Certificate.HasPrivateKey returns true.

**Solution**:
1. Add ValidateCertificate() method to SslSettings class that:
   - Checks Certificate.HasPrivateKey
   - Actually tests private key access with GetRSAPrivateKey() (not just presence)
   - Throws ConfigurationException with clear error message on failure

2. Call validation in Listen() method before server socket binds:
   - Ensures fail-fast behavior at startup
   - Prevents server from running in broken state
   - Provides clear error message for administrators

3. Add comprehensive tests:
   - Server should fail at startup with inaccessible private key
   - Server should start successfully with valid certificate
   - Server should start successfully without SSL

**Impact**:
- Existing misconfigured deployments will now fail at startup (correct behavior)
- Clear error messages guide administrators to fix permissions
- No breaking changes for correctly configured systems
- Related to Freshdesk #538 (BNSF Railway)

Fixes #538

* Update DotNettyTlsHandshakeFailureSpec to validate fail-fast behavior

**Changes**:
1. Renamed first test to `Server_should_fail_at_startup_with_certificate_without_private_key`
   - Now validates that server FAILS AT STARTUP with bad certificate
   - Tests fail-fast behavior instead of runtime TLS handshake failure

2. Removed redundant `Server_side_tls_handshake_failure_should_shutdown_server` test
   - This test validated the OLD (incorrect) behavior where server starts successfully
   - Now impossible with fail-fast validation in place
   - Scenario already covered by the updated first test

3. Kept `Client_side_tls_handshake_failure_should_shutdown_client` unchanged
   - Still valid - tests client-side validation failure
   - Not affected by server startup validation

**Result**: Tests now validate correct fail-fast behavior at server startup

* Add ECDSA private key validation and improve disposal pattern

Addresses review feedback from @Arkatufus:

**Changes**:
1. Check both RSA and ECDSA private keys
   - SslStream supports both RSA and ECDSA certificates
   - GetRSAPrivateKey() returns null for ECDSA certs (and vice versa)
   - Validation now checks both key types to match TLS handler behavior

2. Use `using` statements for proper disposal
   - Prevents resource leaks if exception is thrown
   - Both rsaKey and ecdsaKey are properly disposed
   - Exception-safe resource management

**TLS Handler Relationship**:
The TLS handler uses `TlsHandler.Server(Settings.Ssl.Certificate)` which
internally extracts either RSA or ECDSA private keys via SslStream. Our
validation now matches this behavior by checking both key types.

**Behavior**:
- RSA certificate: GetRSAPrivateKey() succeeds, GetECDsaPrivateKey() returns null ✅
- ECDSA certificate: GetECDsaPrivateKey() succeeds, GetRSAPrivateKey() returns null ✅
- Neither accessible: Both return null, validation fails with clear error ✅
- Permission denied: CryptographicException caught, clear error message ✅
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants