⚠️ Alpha Software: This project is currently in alpha stage (v0.x.x). The API is unstable and may change significantly before v1.0 release. See PROJECT_STATUS.md for current implementation status.
Arbor is an ambitious distributed AI agent orchestration system built on the rock-solid foundation of Elixir/OTP. Designed from the ground up for fault-tolerance, scalability, and security, Arbor aims to enable sophisticated multi-agent AI workflows with enterprise-grade reliability.
Current State: Core infrastructure is implemented including distributed agent supervision, capability-based security, and event sourcing. The CLI and AI integrations are still under development.
- BEAM VM Foundation: Built on Erlang/OTP for legendary fault-tolerance and concurrency
- Umbrella Project: Modular design with clear separation of concerns
- Horde Integration: Dynamic process distribution across cluster nodes
- Defensive Programming: "Let it crash" philosophy with comprehensive supervision trees
- Multi-Agent Coordination: Orchestrate diverse AI agents with different capabilities
- Dynamic Agent Spawning: Create and manage agents based on workload demands
- Inter-Agent Communication: Robust message passing with trace correlation
- Task Delegation: Intelligent work distribution across agent types
- Fine-Grained Permissions: Resource-specific access controls with time-based expiration
- Zero-Trust Architecture: Every operation requires explicit capability grants
- Audit Trail: Complete security event logging for compliance
- Principle of Least Privilege: Minimal permission grants with automatic revocation
- Event Sourcing: Immutable event streams for complete state reconstruction
- CQRS Pattern: Optimized read/write models for performance
- Automatic Recovery: Self-healing systems with state restoration
- Distributed State: Consistent state management across cluster nodes
- Three Pillars: Comprehensive metrics, structured logs, and distributed traces
- OpenTelemetry Integration: Industry-standard telemetry and tracing
- Real-Time Monitoring: Prometheus metrics with Grafana dashboards
- Performance Analytics: Detailed insights into agent behavior and system health
- Elixir 1.15.7+ and OTP 26.1+
- Git for version control
- Docker & Docker Compose (optional, for observability stack)
# Clone the repository
git clone https://github.com/azmaveth/arbor.git
cd arbor
# Run one-time setup (installs dependencies, builds PLT files)
./scripts/setup.sh
# Start development server with distributed node capabilities
./scripts/dev.sh
# Run comprehensive test suite
./scripts/test.sh
# Quick feedback loop (skip slow checks)
./scripts/test.sh --fast
# Generate coverage report
./scripts/test.sh --coverage
# Connect to running development node (in another terminal)
./scripts/console.sh
# Performance benchmarks
./scripts/benchmark.sh
# Run distributed tests (multi-node cluster tests)
./scripts/test-distributed.sh
Arbor uses a hybrid testing approach that balances fast feedback with comprehensive distributed system verification:
- Single-node tests (default): Fast unit and integration tests that run in isolation
- Distributed tests (
@tag :distributed
): Multi-node tests that verify cluster behavior
The distributed test suite validates critical distributed system behaviors:
- CRDT Synchronization: Ensures distributed data structures converge correctly across nodes
- Failover Scenarios: Verifies agents migrate properly when nodes crash
- Race Condition Handling: Tests concurrent operations maintain consistency
- Split-Brain Recovery: Validates cluster healing after network partitions
Test helpers in test/support/
provide utilities for:
- Multi-node cluster orchestration
- Network partition simulation
- Cascading failure scenarios
- CRDT convergence verification
- Race condition detection
# Development environment with full observability stack
docker-compose up -d
# Access services:
# - Arbor: http://localhost:4000
# - Grafana: http://localhost:3000 (admin/admin)
# - Prometheus: http://localhost:9090
# - Jaeger: http://localhost:16686
# Build production image
docker build -t arbor:latest .
Arbor follows a contracts-first, defensive architecture with clear dependency boundaries:
┌─────────────────────────────────────────────────────────────┐
│ Arbor System │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Arbor Core │ │ Arbor │ │ Arbor │ │
│ │ │◄─┤ Security │◄─┤ Persistence │ │
│ │ • Agents │ │ │ │ │ │
│ │ • Tasks │ │ • Capabilities│ │ • Event Store │ │
│ │ • Sessions │ │ • Audit │ │ • State Mgmt │ │
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
│ │ │ │ │
│ └─────────────────┼─────────────────────┘ │
│ │ │
│ ┌──────────────┐ │
│ │ Arbor │ │
│ │ Contracts │ │
│ │ │ │
│ │ • Schemas │ │
│ │ • Types │ │
│ │ • Protocols │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
arbor/
├── apps/ # Umbrella applications
│ ├── arbor_contracts/ # 🔗 Zero-dependency contracts
│ │ ├── lib/arbor/contracts/ # Schema definitions and types
│ │ └── test/ # Contract validation tests
│ ├── arbor_security/ # 🛡️ Capability-based security
│ │ ├── lib/arbor/security/ # Authentication & authorization
│ │ └── test/ # Security validation tests
│ ├── arbor_persistence/ # 💾 State management
│ │ ├── lib/arbor/persistence/ # Event sourcing & CQRS
│ │ └── test/ # Persistence tests
│ └── arbor_core/ # 🧠 Core business logic
│ ├── lib/arbor/core/ # Agent orchestration
│ └── test/ # Integration tests
├── config/ # Configuration files
│ ├── config.exs # Base configuration
│ ├── dev.exs # Development settings
│ ├── test.exs # Test environment
│ └── prod.exs # Production settings
├── scripts/ # 🛠️ Development automation
│ ├── setup.sh # One-time project setup
│ ├── dev.sh # Development server
│ ├── test.sh # Test suite runner
│ ├── console.sh # Remote console connection
│ ├── release.sh # Production builds
│ └── benchmark.sh # Performance testing
├── .github/ # 🔄 CI/CD workflows
│ ├── workflows/ # GitHub Actions
│ │ ├── ci.yml # Continuous integration
│ │ ├── nightly.yml # Comprehensive testing
│ │ └── release.yml # Automated releases
│ └── README.md # CI/CD documentation
├── observability/ # 📊 Monitoring configuration
│ ├── prometheus.yml # Metrics collection
│ ├── grafana/ # Dashboard definitions
│ └── postgres/ # Database initialization
├── docs/ # 📚 Documentation
│ ├── development.md # Development guide
│ └── architecture.md # System design
├── Dockerfile # 🐳 Container definition
├── docker-compose.yml # Development environment
└── README.md # This file
Autonomous AI entities with specific capabilities and responsibilities. Each agent:
- Runs in its own supervised process
- Has a unique identity and capability set
- Can communicate with other agents via message passing
- Maintains its own state and execution context
Granular permissions that agents must acquire to access resources:
- Resource-Specific: Access to files, APIs, databases, etc.
- Time-Limited: Automatic expiration for security
- Auditable: Complete grant/revoke/usage logging
- Hierarchical: Capabilities can delegate sub-capabilities
Multi-agent coordination contexts that manage:
- Agent lifecycle and task distribution
- Shared context and memory
- Resource allocation and cleanup
- Performance monitoring and optimization
# Development
export MIX_ENV=dev
export ARBOR_NODE_NAME=arbor@localhost
export ARBOR_COOKIE=arbor_dev_cookie
# Observability
export PROMETHEUS_ENDPOINT=http://localhost:9090
export JAEGER_ENDPOINT=http://localhost:14250
export GRAFANA_ENDPOINT=http://localhost:3000
# Security
export ARBOR_SECRET_KEY_BASE="your-secret-key-base"
export ARBOR_CAPABILITY_ENCRYPTION_KEY="your-encryption-key"
config/config.exs
- Base configurationconfig/dev.exs
- Development overridesconfig/prod.exs
- Production settingscoveralls.json
- Test coverage thresholds
- Unit Tests: Individual module testing with mocks
- Integration Tests: Component interaction testing
- Property-Based Tests: Comprehensive input validation
- Performance Tests: Benchmarking and load testing
- Coverage: ≥80% test coverage across all apps
- Static Analysis: Credo compliance with strict checks
- Type Safety: Dialyzer verification with success typing
- Security: Dependency vulnerability scanning
# Full test suite with coverage
./scripts/test.sh --coverage
# Quick feedback loop
./scripts/test.sh --fast
# Specific test files
mix test test/arbor/core/agent_test.exs
# Property-based tests only
mix test --only property
# Integration tests
mix test --only integration
# Build optimized release
./scripts/release.sh
# Build with specific version
./scripts/release.sh --version 1.0.0
# Skip tests for faster builds
./scripts/release.sh --skip-tests
# Build production image
docker build -t arbor:v1.0.0 .
# Run with clustering
docker run -d \
--name arbor-node-1 \
-p 4000:4000 \
-e [email protected] \
-e ERLANG_COOKIE=secure_cluster_cookie \
arbor:v1.0.0
See docs/deployment/kubernetes.md for detailed Kubernetes configuration.
- Agent Lifecycle: Spawn/termination rates, lifetime distributions
- Performance: Operation latency, throughput, error rates
- Security: Capability grants/revokes, security violations
- System Health: Memory usage, process counts, cluster status
Pre-configured Grafana dashboards for:
- System Overview: Cluster health, resource utilization
- Agent Performance: Operation metrics, communication patterns
- Security Monitoring: Capability usage, audit events
- Distributed Tracing: Request flows across services
Production-ready alerts for:
- Critical: Service outages, security breaches
- High: Performance degradation, high error rates
- Medium: Resource constraints, operational issues
We welcome contributions! Please see our Contributing Guidelines for:
- Code of Conduct: Community standards and expectations
- Development Setup: Detailed environment configuration
- Pull Request Process: Code review and merge requirements
- Issue Templates: Bug reports and feature requests
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes following our conventions
- Test thoroughly (
./scripts/test.sh
) - Commit using conventional commits
- Push to your branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- System Architecture - Comprehensive system design
- Development Guide - Detailed development setup
- CI/CD Pipeline - Automated workflows
- Scripts Reference - Development automation
Generate comprehensive API docs:
mix docs
open doc/index.html
This project is licensed under the MIT License - see the LICENSE file for details.
- Elixir/OTP Community for the incredible platform
- BEAM Ecosystem for fault-tolerant distributed systems
- OpenTelemetry Project for observability standards
- All Contributors who make this project possible
- Documentation: Complete docs
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: [email protected]
Built with ❤️ using Elixir/OTP - The best platform for distributed, fault-tolerant systems