Skip to content

Commit 9ad771c

Browse files
thomasywangfacebook-github-bot
authored andcommitted
Distance based latency (#858)
Summary: Pull Request resolved: #858 Now that the simnet has awareness of which compute resource each ProcId maps to, when messages are being sent we can simply look at the sender and destination ProcIds and compute the distance the message is being sent in order to determine the latency. Latency is randomly sample from a beta distribution where the min and max for each distance is configured Implementation details (follow along numbers in comments): 1. In the previous diff when Procs were allocated, their coordinates (region, dc, zone, rack, host, gpu) were registered to the Simnet 2. When SimTx posts a message, we can safely assume that it is a MessageEnvelope. MessageEnvelopes contain information about the sender and receiver so we can determine which ProcIds the message is being sent between, which in turn means we can identify which coordinates they are being sent between 3. We determine distance between 2 coordinates by identifying the most major dimension in which they differ 4. We create a struct called LatencyConfig which holds a distribution for sampling, as well as minimum and maximum values for each distance. 5. We use the identified distance to get a sample for what the latency should be for that send 6. We pass in that latency to the MessageDeliveryEvent to use as its duration 7. The old network configuration which was an all-to-all map of edges with latencies between nodes has been removed along with all related structs 8. Unit tests have been refactored such that when we need a particular message to be sent with a particular latency, we register the ProcIds with the appropriate coordinates, and configure the interdistance latency test_allocator_registers_resources in alloc/sim.rs demonstrates that when we allocate a ProcMesh using the sim allocator, our Procs are registered as compute resources and the latencies are computed based on distance Differential Revision: D80141665
1 parent 59217f0 commit 9ad771c

File tree

5 files changed

+390
-366
lines changed

5 files changed

+390
-366
lines changed

hyperactor/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ nix = { version = "0.30.1", features = ["dir", "event", "hostname", "inotify", "
4646
opentelemetry = "0.29"
4747
paste = "1.0.14"
4848
rand = { version = "0.8", features = ["small_rng"] }
49+
rand_distr = "0.4"
4950
regex = "1.11.1"
5051
rustls-pemfile = "1.0.0"
5152
serde = { version = "1.0.219", features = ["derive", "rc"] }

0 commit comments

Comments
 (0)