Skip to content

Commit cad53ea

Browse files
committed
improving simulator
1 parent 276267a commit cad53ea

File tree

24 files changed

+2899
-140
lines changed

24 files changed

+2899
-140
lines changed

tools/simulator/AGENTS.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Repository Guidelines
2+
3+
## Project Structure & Module Organization
4+
- `core/` holds the scheduling engines and memory planners; start with `core/global_engine.py` or `core/node_global_engine.py` when altering execution flow.
5+
- `cli/` provides runnable entry points such as `run_simulator.py` (simulation driver) and `plot_roofline.py` (visualization helper).
6+
- `api/` and `utils/` supply service adapters and shared helpers—trace loading, hardware math, serializers—so prefer adding reusable logic there instead of duplicating it.
7+
- `internal/` packages the analyzer toolkit plus canonical hardware configs; treat these files as reference data.
8+
- `examples/` stores sample traces and environment JSON/JSONL fixtures for smoke tests; generated outputs belong in `.local/` and stay untracked.
9+
10+
## Build, Test, and Development Commands
11+
- `python -m venv .venv && source .venv/bin/activate` to isolate dependencies.
12+
- `python -m pip install -r requirements.txt` installs runtime packages (`humanize`, `transformers`).
13+
- `python cli/run_simulator.py --input examples/trace.jsonl --n-engines 2 --arrival-rate 1.5 --trace-output .local/trace.json --stats-output .local/stats.json` runs the canonical workload and surfaces performance statistics.
14+
- `python cli/plot_roofline.py --input .local/stats.json --out plots/roofline.png` turns stats into an image; create the destination directory first.
15+
16+
## Coding Style & Naming Conventions
17+
- Target Python 3.10+, four-space indentation, and PEP 8 naming (snake_case for modules/functions, UpperCase for enums like `REQ_STATUS`).
18+
- Use type hints and concise docstrings similar to `core/request.py:GenerationRequest` to clarify intent.
19+
- Group imports stdlib → third-party → local, and expose public symbols explicitly in `__init__.py` files when it improves discoverability.
20+
21+
## Testing Guidelines
22+
- No automated suite exists yet; replay `cli/run_simulator.py` with `examples` fixtures and inspect `.local/stats.json` for regressions after each change.
23+
- New tests should rely on `pytest` under `tests/` mirroring the module structure (e.g., `tests/core/test_memory_planner.py`) with descriptive names like `test_allocates_kv_cache`.
24+
- Capture before/after throughput or latency figures when altering performance-sensitive code and share them in the review thread.
25+
26+
## Commit & Pull Request Guidelines
27+
- Follow the history style: imperative subjects (`Add README for LLM Simulator`) and optional issue references in parentheses (e.g., `(#60)`).
28+
- Keep commits focused and avoid checking in artifacts from `.local/` or large trace files.
29+
- Pull requests need a short motivation, the commands you ran (build/test), and links or screenshots for visualization changes.
30+
31+
## Security & Configuration Tips
32+
- Review `internal/configs/hardware_params.py` and `examples/env.json` before adding hardware profiles; never commit production-specific credentials.
33+
- Treat environment-change JSONL fixtures as append-only—add new files for new scenarios instead of rewriting shared samples.

tools/simulator/CLAUDE.md

Lines changed: 66 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,18 +9,36 @@ This is an LLM inference simulator that models the performance of Large Language
99
## Key Commands
1010

1111
### Running Simulations
12+
13+
#### Node-based Architecture (Recommended)
1214
```bash
13-
# Run the main simulator with default parameters
14-
python cli/start_simulator.py --input <trace_file> --n-engines <num_engines> --arrival-rate <rate>
15+
# Run with node-based environment configuration
16+
python cli/run_simulator.py --input <trace_file> --environment examples/env.json --arrival-rate <rate>
17+
18+
# Example with node-based configuration
19+
python cli/run_simulator.py --input examples/trace.jsonl --environment examples/env.json --arrival-rate 1.0
1520

16-
# Example with typical parameters
17-
python cli/start_simulator.py --input trace.json --n-engines 4 --arrival-rate 1.0
21+
# Example with environment changes (dynamic GPU provisioning)
22+
python cli/run_simulator.py --input examples/trace.jsonl --environment examples/env.json --environment-change-file examples/env_changes.jsonl --arrival-rate 1.0
1823

19-
# Output files are generated in .local/replay_results/ by default:
20-
# - trace.json: Chrome trace format events for visualization
21-
# - stats.json: Request statistics and performance metrics
24+
# Limit number of requests for testing
25+
python cli/run_simulator.py --input examples/trace.jsonl --environment examples/env.json --arrival-rate 1.0 --limit 100
2226
```
2327

28+
#### Legacy Engine-based Architecture
29+
```bash
30+
# Run with legacy engine configuration (backward compatibility)
31+
python cli/run_simulator.py --input <trace_file> --n-engines <num_engines> --arrival-rate <rate>
32+
33+
# Example with legacy configuration
34+
python cli/run_simulator.py --input trace.json --n-engines 4 --arrival-rate 1.0
35+
```
36+
37+
#### Output Files
38+
Output files are generated in `.local/replay_results/` by default:
39+
- `trace.json`: Chrome trace format events for visualization
40+
- `stats.json`: Request statistics and performance metrics
41+
2442
### Roofline Analysis
2543
```bash
2644
# Generate roofline plots for different hardware
@@ -40,15 +58,28 @@ pip install -r requirements.txt
4058

4159
### Core Components
4260

43-
1. **LLMGlobalEngine** (`core/global_engine.py`): Central orchestrator that manages multiple LLM engines, handles request scheduling, and tracks global simulation state.
61+
#### Node-based Architecture (New)
62+
1. **NodeGlobalEngine** (`core/node_global_engine.py`): Central orchestrator that manages multiple compute nodes, handles request scheduling, and supports dynamic node re-provisioning.
63+
64+
2. **ComputeNode** (`core/node.py`): Represents a physical server with multiple GPUs, managing resource allocation, model loading, and request scheduling across the GPUs in the node.
65+
66+
3. **NodeMemoryPlanner** (`core/node.py`): Manages memory allocation across multiple GPUs in a node, considering both GPU memory and node-level constraints.
4467

45-
2. **LLMEngine** (`core/engine.py`): Individual inference engine that processes requests through prefill and decode phases, manages memory allocation, and generates trace events.
68+
4. **Node Routing Policies** (`core/policies/routing/node_based.py`): Determines how requests are assigned to nodes. Includes random, least-loaded, round-robin, and best-fit policies.
4669

47-
3. **GenerationRequest** (`core/request.py`): Represents a single inference request with metadata like input/output lengths, arrival time, and current status.
70+
5. **Node Re-provisioning Policies** (`core/policies/node_reprovisioning/`): Handles dynamic re-provisioning of nodes between different models when no suitable nodes are available.
4871

49-
4. **ModelAnalyzer** (`internal/analyzer/model_analyzer.py`): Performs roofline analysis to estimate inference times based on hardware parameters and model configurations.
72+
#### Legacy Engine-based Architecture
73+
6. **LLMGlobalEngine** (`core/global_engine.py`): Central orchestrator that manages multiple LLM engines, handles request scheduling, and tracks global simulation state.
5074

51-
5. **Routing Policies** (`core/policies/`): Determines how requests are assigned to engines. Currently implements random routing, with extensible base class for other policies.
75+
7. **LLMEngine** (`core/engine.py`): Individual inference engine that processes requests through prefill and decode phases, manages memory allocation, and generates trace events.
76+
77+
#### Shared Components
78+
8. **GenerationRequest** (`core/request.py`): Represents a single inference request with metadata like input/output lengths, arrival time, and current status.
79+
80+
9. **ModelAnalyzer** (`internal/analyzer/model_analyzer.py`): Performs roofline analysis to estimate inference times based on hardware parameters and model configurations.
81+
82+
10. **Environment Configuration** (`core/env.py`): Supports both node-based and legacy GPU-based environment configurations with infrastructure constraints.
5283

5384
### Key Data Flow
5485

@@ -107,9 +138,29 @@ The roofline analysis calculates:
107138

108139
## Testing and Validation
109140

141+
### Example Files
142+
The simulator includes example configuration files in the `examples/` directory:
143+
- `env.json`: Node-based environment configuration with A100 and H100 clusters
144+
- `trace.jsonl`: Sample request trace for testing
145+
- `env_changes.jsonl`: Example dynamic environment changes for testing re-provisioning
146+
147+
### Example Commands
148+
```bash
149+
# Test with node-based configuration (10 requests)
150+
python cli/run_simulator.py --input examples/trace.jsonl --environment examples/env.json --arrival-rate 1.0 --limit 10
151+
152+
# Test with dynamic environment changes
153+
python cli/run_simulator.py --input examples/trace.jsonl --environment examples/env.json --environment-change-file examples/env_changes.jsonl --arrival-rate 0.5
154+
155+
# Test legacy engine mode
156+
python cli/run_simulator.py --input examples/trace.jsonl --n-engines 2 --arrival-rate 1.0 --limit 10
157+
```
158+
159+
### Output Analysis
110160
The simulator outputs:
111-
- Chrome trace format files for performance visualization
161+
- Chrome trace format files for performance visualization (load into `chrome://tracing`)
112162
- JSON statistics with latency, throughput, and queue metrics
113-
- SLO pass rates for multi-stage request processing
163+
- Node-level utilization and re-provisioning events
164+
- Model loading and unloading trace events
114165

115-
Typical validation involves comparing simulated latencies against real hardware measurements for known models and hardware configurations.
166+
Typical validation involves comparing simulated latencies against real hardware measurements for known models and hardware configurations.

tools/simulator/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@ The LLM Simulator is a comprehensive performance modeling and simulation tool de
22

33
The simulator consists of several key components that work together to model the complete lifecycle of LLM inference requests:
44

5+
## Contributor Guide
6+
7+
See [`AGENTS.md`](AGENTS.md) for repository guidelines covering project layout, development workflows, testing expectations, and pull request conventions.
8+
59
- **Request Modeling**: Simulates incoming generation requests with configurable arrival patterns
610
- **Engine Simulation**: Models LLM inference engines with prefill and decode phases
711
- **Performance Analysis**: Provides roofline analysis and hardware-specific performance metrics

tools/simulator/__init__.py

Whitespace-only changes.
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
import os
2+
import json
3+
from dataclasses import asdict
4+
from rich.console import Console
5+
from core.global_engine import LLMGlobalEngine
6+
from core.node_global_engine import NodeGlobalEngine
7+
from utils.loader import load_trace
8+
from core.env import load_environment_config, load_environment_changes
9+
10+
console = Console()
11+
12+
13+
def run_simulation(args):
14+
print(args)
15+
workload = load_trace(
16+
args.input,
17+
float(args.arrival_rate),
18+
)
19+
if args.limit > 0:
20+
workload = workload[: args.limit]
21+
22+
# Load environment configuration
23+
environment_config = None
24+
if hasattr(args, "environment") and args.environment:
25+
environment_config = load_environment_config(args.environment)
26+
27+
# Load environment changes if provided
28+
environment_changes = None
29+
if hasattr(args, "environment_change_file") and args.environment_change_file:
30+
environment_changes = load_environment_changes(args.environment_change_file)
31+
32+
# Choose engine type based on whether environment config is provided
33+
if environment_config:
34+
# Use NodeGlobalEngine when environment config is provided
35+
print("Using Node-based Global Engine")
36+
server = NodeGlobalEngine(
37+
environment_config=environment_config,
38+
environment_changes=environment_changes,
39+
print_interval=args.print_interval,
40+
)
41+
else:
42+
# Fallback to legacy LLMGlobalEngine for backward compatibility
43+
print("Using Legacy Engine-based Global Engine")
44+
server = LLMGlobalEngine(
45+
environment_config=environment_config,
46+
environment_changes=environment_changes,
47+
print_interval=args.print_interval,
48+
)
49+
50+
# If no environment config is provided, use the old method
51+
for _ in range(args.n_engines):
52+
server.add_engine(
53+
"meta-llama/Meta-Llama-3-70B-Instruct", "nvidia_A100", 4, 4, 4
54+
)
55+
56+
server.load_requests(workload)
57+
print(f"--" * 10 + " Simulation Started " + "--" * 10)
58+
server.start()
59+
60+
# Collect stats (works for both legacy and node-based engines)
61+
if hasattr(server, "requests_stats"):
62+
summary = server.requests_stats
63+
else:
64+
summary = []
65+
66+
if hasattr(server, "failed_requests"):
67+
failed = server.failed_requests
68+
else:
69+
failed = []
70+
71+
if hasattr(server, "config"):
72+
config = server.config
73+
else:
74+
config = {"engine_type": "node_based" if environment_config else "legacy"}
75+
76+
stats = {
77+
"summary": summary,
78+
"failed": failed,
79+
"config": config,
80+
}
81+
os.makedirs(os.path.dirname(args.trace_output), exist_ok=True)
82+
os.makedirs(os.path.dirname(args.stats_output), exist_ok=True)
83+
with open(args.trace_output, "w") as f:
84+
data = {"traceEvents": [asdict(x) for x in server.trace]}
85+
f.write(json.dumps(data, indent=4))
86+
with open(args.stats_output, "w") as f:
87+
f.write(json.dumps(stats, indent=4))
88+
89+
print(end="\n")
90+
print(f"--" * 10 + " Simulation Done " + "--" * 10)
91+
92+
93+
if __name__ == "__main__":
94+
import argparse
95+
96+
parser = argparse.ArgumentParser()
97+
parser.add_argument("--input", type=str, help="Input file")
98+
parser.add_argument("--n-engines", type=int, help="Number of engines")
99+
parser.add_argument("--arrival-rate", help="Arrival rate", default=None)
100+
parser.add_argument(
101+
"--trace-output",
102+
type=str,
103+
help="Trace file",
104+
default=".local/replay_results/trace.json",
105+
)
106+
parser.add_argument(
107+
"--stats-output",
108+
type=str,
109+
help="Stats file",
110+
default=".local/replay_results/stats.json",
111+
)
112+
parser.add_argument(
113+
"--limit",
114+
type=int,
115+
help="Limit the number of requests",
116+
default=-1,
117+
)
118+
parser.add_argument(
119+
"--environment",
120+
type=str,
121+
help="JSON file containing initial environment configuration (nodes, GPUs, bandwidth, etc.)",
122+
default=None,
123+
)
124+
parser.add_argument(
125+
"--environment-change-file",
126+
type=str,
127+
help="JSONL file containing dynamic environment changes (timestamp, gpu_name, amount)",
128+
default=None,
129+
)
130+
parser.add_argument(
131+
"--print-interval",
132+
type=float,
133+
help="Print interval for progress updates in seconds (default: 0.1)",
134+
default=0.1,
135+
)
136+
args = parser.parse_args()
137+
run_simulation(args)

tools/simulator/cli/start_simulator.py

Lines changed: 0 additions & 70 deletions
This file was deleted.

0 commit comments

Comments
 (0)