A simulator for TigerBeetle's Viewstamped Replication (VSR) consensus protocol with comprehensive fault injection, event logging, and automated testing capabilities.
- vsr-sim
This project implements a comprehensive discrete-event simulator for the VSR consensus protocol in Zig. It provides a complete testing and analysis environment for understanding consensus behavior under various network conditions, fault scenarios, and system configurations.
- Core VSR Protocol: Full prepare/commit consensus with quorum tracking
- Fault Tolerance: Primary crash detection, view changes, automatic recovery
- Network Simulation: Configurable delays, drops, and message reordering
- Safety Verification: Validates consensus safety properties
- Event Logging: Timeline visualization and JSON export
- Scenario Automation: Batch testing with shell scripts
- Protocol Comparison: VSR vs Raft comparison with performance metrics
- Network Partitions: Split-brain, isolation, and healing scenarios
- Performance Benchmarking: Comprehensive throughput and efficiency analysis
- Zig 0.14.1 (stable)
# Run with default configuration
zig build run
# Run tests
zig build test
# Run automated scenarios
./run_scenarios.sh
# Run with different modes
zig build run # Normal VSR simulation
zig build run -- help # Show help and available modes
zig build run -- benchmark # Performance benchmarks
zig build run -- comparison # VSR vs Raft comparison
zig build run -- partition # Network partition testing
zig build run -- all # Complete test suite
# Run specific examples
zig run examples/basic_scenario.zig
zig run examples/crash_recovery_test.zig
zig run examples/network_faults_scenario.zigEdit src/main.zig to customize scenarios:
const config = Config{
.replica_count = 3, // Number of replicas
.operations_per_client = 10, // Operations to submit
.drop_probability = 0.0, // Network drop rate (0.0-1.0)
.reorder_probability = 0.0, // Message reordering (0.0-1.0)
.crash_primary_at_op = null, // Inject crash at op N
.primary_timeout_us = 10000, // Failure detection timeout (μs)
.restart_delay_us = 15000, // Crashed replica restart delay (μs)
};Replica: VSR replica with state machine, log, and protocol logicNetwork: Simulated network with configurable delays, drops, and reorderingStateMachine: Simple ledger supporting credit/debit/transfer operationsSimulator: Event-driven orchestrator coordinating replicas and network
The simulator implements key VSR messages:
Prepare: Primary → Backups (propose operation)PrepareOK: Backup → Primary (acknowledge operation)Commit: Primary → Backups (finalize operation)StartViewChange: Backup → All (suspect primary failure)DoViewChange: Replica → New Primary (transfer log state)StartView: New Primary → All (announce new view)
VSR Simulator v0.1.0
====================
Processed 60 messages (10 commits)
Final Replica States:
Replica 0: commit_number=10, op_number=10, status=PRIMARY, view=0
Replica 1: commit_number=10, op_number=10, status=BACKUP, view=0
Replica 2: commit_number=8, op_number=10, status=BACKUP, view=0
Safety check: PASSED
The project includes several example scenarios:
examples/basic_scenario.zig- Simple consensus with no faultsexamples/crash_recovery_test.zig- Primary failure and recoveryexamples/network_faults_scenario.zig- Network delays and message drops
# Run all predefined scenarios
./run_scenarios.sh
# Run unit tests
zig build test
# Test specific scenarios
./test_scenarios.shThe simulator accepts various configuration options (see src/config.zig):
const config = Config{
.replica_count = 3,
.client_count = 1,
.operations_per_client = 10,
.enable_faults = false,
.random_seed = 0, // For deterministic runs
.min_delay_us = 1000, // Min network delay (μs)
.max_delay_us = 5000, // Max network delay (μs)
.drop_probability = 0.0, // 0.0 - 1.0
.reorder_probability = 0.0, // 0.0 - 1.0
};The simulator includes comprehensive event logging:
// Enable event logging in configuration
const config = Config{
.enable_event_logging = true,
.export_timeline_json = true,
.event_log_file = "simulation_events.json",
// ... other options
};- Client operations (submit, commit)
- Replica state changes (view changes, role transitions)
- Message passing (prepare, commit, view change messages)
- Network events (delays, drops, reordering)
- Safety violations (if any)
The simulator supports comprehensive network partition scenarios:
- Split-Brain Partitions: Network splits into two isolated groups
- Replica Isolation: Single replica isolated from the cluster
- Asymmetric Partitions: Partial connectivity with high packet loss
- Healing Partitions: Automatic recovery after partition resolution
Comprehensive benchmarking suite includes:
- Throughput Scaling: Performance across different cluster sizes
- Network Conditions: Impact of delays, drops, and reordering
- Fault Tolerance: Performance degradation during failures
- Protocol Comparison: VSR vs Raft head-to-head analysis
Built-in comparison between VSR and Raft protocols:
- Message efficiency analysis (commits per message)
- Execution time comparison
- Throughput measurement under various conditions
- Network fault tolerance comparison
See LICENSE file.
- Viewstamped Replication Revisited - Liskov & Cowling, 2012
- TigerBeetle Documentation