You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`core/` holds the scheduling engines and memory planners; start with `core/global_engine.py` or `core/node_global_engine.py` when altering execution flow.
5
+
-`cli/` provides runnable entry points such as `run_simulator.py` (simulation driver) and `plot_roofline.py` (visualization helper).
6
+
-`api/` and `utils/` supply service adapters and shared helpers—trace loading, hardware math, serializers—so prefer adding reusable logic there instead of duplicating it.
7
+
-`internal/` packages the analyzer toolkit plus canonical hardware configs; treat these files as reference data.
8
+
-`examples/` stores sample traces and environment JSON/JSONL fixtures for smoke tests; generated outputs belong in `.local/` and stay untracked.
9
+
10
+
## Build, Test, and Development Commands
11
+
-`python -m venv .venv && source .venv/bin/activate` to isolate dependencies.
-`python cli/run_simulator.py --input examples/trace.jsonl --n-engines 2 --arrival-rate 1.5 --trace-output .local/trace.json --stats-output .local/stats.json` runs the canonical workload and surfaces performance statistics.
14
+
-`python cli/plot_roofline.py --input .local/stats.json --out plots/roofline.png` turns stats into an image; create the destination directory first.
15
+
16
+
## Coding Style & Naming Conventions
17
+
- Target Python 3.10+, four-space indentation, and PEP 8 naming (snake_case for modules/functions, UpperCase for enums like `REQ_STATUS`).
18
+
- Use type hints and concise docstrings similar to `core/request.py:GenerationRequest` to clarify intent.
19
+
- Group imports stdlib → third-party → local, and expose public symbols explicitly in `__init__.py` files when it improves discoverability.
20
+
21
+
## Testing Guidelines
22
+
- No automated suite exists yet; replay `cli/run_simulator.py` with `examples` fixtures and inspect `.local/stats.json` for regressions after each change.
23
+
- New tests should rely on `pytest` under `tests/` mirroring the module structure (e.g., `tests/core/test_memory_planner.py`) with descriptive names like `test_allocates_kv_cache`.
24
+
- Capture before/after throughput or latency figures when altering performance-sensitive code and share them in the review thread.
25
+
26
+
## Commit & Pull Request Guidelines
27
+
- Follow the history style: imperative subjects (`Add README for LLM Simulator`) and optional issue references in parentheses (e.g., `(#60)`).
28
+
- Keep commits focused and avoid checking in artifacts from `.local/` or large trace files.
29
+
- Pull requests need a short motivation, the commands you ran (build/test), and links or screenshots for visualization changes.
30
+
31
+
## Security & Configuration Tips
32
+
- Review `internal/configs/hardware_params.py` and `examples/env.json` before adding hardware profiles; never commit production-specific credentials.
33
+
- Treat environment-change JSONL fixtures as append-only—add new files for new scenarios instead of rewriting shared samples.
1.**LLMGlobalEngine** (`core/global_engine.py`): Central orchestrator that manages multiple LLM engines, handles request scheduling, and tracks global simulation state.
61
+
#### Node-based Architecture (New)
62
+
1.**NodeGlobalEngine** (`core/node_global_engine.py`): Central orchestrator that manages multiple compute nodes, handles request scheduling, and supports dynamic node re-provisioning.
63
+
64
+
2.**ComputeNode** (`core/node.py`): Represents a physical server with multiple GPUs, managing resource allocation, model loading, and request scheduling across the GPUs in the node.
65
+
66
+
3.**NodeMemoryPlanner** (`core/node.py`): Manages memory allocation across multiple GPUs in a node, considering both GPU memory and node-level constraints.
44
67
45
-
2.**LLMEngine** (`core/engine.py`): Individual inference engine that processes requests through prefill and decode phases, manages memory allocation, and generates trace events.
68
+
4.**Node Routing Policies** (`core/policies/routing/node_based.py`): Determines how requests are assigned to nodes. Includes random, least-loaded, round-robin, and best-fit policies.
46
69
47
-
3.**GenerationRequest** (`core/request.py`): Represents a single inference request with metadata like input/output lengths, arrival time, and current status.
70
+
5.**Node Re-provisioning Policies** (`core/policies/node_reprovisioning/`): Handles dynamic re-provisioning of nodes between different models when no suitable nodes are available.
48
71
49
-
4.**ModelAnalyzer** (`internal/analyzer/model_analyzer.py`): Performs roofline analysis to estimate inference times based on hardware parameters and model configurations.
72
+
#### Legacy Engine-based Architecture
73
+
6.**LLMGlobalEngine** (`core/global_engine.py`): Central orchestrator that manages multiple LLM engines, handles request scheduling, and tracks global simulation state.
50
74
51
-
5.**Routing Policies** (`core/policies/`): Determines how requests are assigned to engines. Currently implements random routing, with extensible base class for other policies.
75
+
7.**LLMEngine** (`core/engine.py`): Individual inference engine that processes requests through prefill and decode phases, manages memory allocation, and generates trace events.
76
+
77
+
#### Shared Components
78
+
8.**GenerationRequest** (`core/request.py`): Represents a single inference request with metadata like input/output lengths, arrival time, and current status.
79
+
80
+
9.**ModelAnalyzer** (`internal/analyzer/model_analyzer.py`): Performs roofline analysis to estimate inference times based on hardware parameters and model configurations.
81
+
82
+
10.**Environment Configuration** (`core/env.py`): Supports both node-based and legacy GPU-based environment configurations with infrastructure constraints.
52
83
53
84
### Key Data Flow
54
85
@@ -107,9 +138,29 @@ The roofline analysis calculates:
107
138
108
139
## Testing and Validation
109
140
141
+
### Example Files
142
+
The simulator includes example configuration files in the `examples/` directory:
143
+
-`env.json`: Node-based environment configuration with A100 and H100 clusters
144
+
-`trace.jsonl`: Sample request trace for testing
145
+
-`env_changes.jsonl`: Example dynamic environment changes for testing re-provisioning
146
+
147
+
### Example Commands
148
+
```bash
149
+
# Test with node-based configuration (10 requests)
Copy file name to clipboardExpand all lines: tools/simulator/README.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,10 @@ The LLM Simulator is a comprehensive performance modeling and simulation tool de
2
2
3
3
The simulator consists of several key components that work together to model the complete lifecycle of LLM inference requests:
4
4
5
+
## Contributor Guide
6
+
7
+
See [`AGENTS.md`](AGENTS.md) for repository guidelines covering project layout, development workflows, testing expectations, and pull request conventions.
8
+
5
9
-**Request Modeling**: Simulates incoming generation requests with configurable arrival patterns
6
10
-**Engine Simulation**: Models LLM inference engines with prefill and decode phases
7
11
-**Performance Analysis**: Provides roofline analysis and hardware-specific performance metrics
0 commit comments