Skip to content

Profile conversion with large thread numbers #31

@vsarunas

Description

@vsarunas

Noticed that symbolification is taking a long time when using larger duration samples:

time curl -s -d '{"timeInterval":"10 ms","numberOfSamples":3000}' http://127.0.0.1:12345/sample > samples.perf
curl -s -d '{"timeInterval":"10 ms","numberOfSamples":3000}'  >   0.02s user 0.20s system 0% cpu 47.922 total

Default request of pprof duration, sampling takes 38.8 seconds and symbolication takes 8.3 seconds:

2025-10-07T09:11:49.423593 [...] ProfileRecorderSampleConversion/Sampler+Conversion.swift:61 : requesting raw samples ["raw-samples-path": /var/folders/zx/1np75sd15ygcssjxtfnq30xc0000gn/T/WYlcor8c/samples.raw, "symbolicated-samples-path": /var/folders/zx/1np75sd15ygcssjxtfnq30xc0000gn/T/WYlcor8c/samples.perf, "symbolizer": ProfileRecorderSampleConversion.CoreSymbolicationSymboliser, "sample-count": 3000, "time-between-samples": 10ms, "peer": [IPv4]127.0.0.1/127.0.0.1:64470]

2025-10-07T09:12:28.238699 [...] ProfileRecorderSampleConversion/Sampler+Conversion.swift:70 : raw samples complete ["raw-samples-path": /var/folders/zx/1np75sd15ygcssjxtfnq30xc0000gn/T/WYlcor8c/samples.raw, "symbolizer": ProfileRecorderSampleConversion.CoreSymbolicationSymboliser, "sample-count": 3000, "peer": [IPv4]127.0.0.1/127.0.0.1:64470, "duration": 38.814310791 seconds, "symbolicated-samples-path": /var/folders/zx/1np75sd15ygcssjxtfnq30xc0000gn/T/WYlcor8c/samples.perf, "time-between-samples": 10ms]

2025-10-07T09:12:36.587133 [...] ProfileRecorderSampleConversion/Sampler+Conversion.swift:97 : samples symbolicated duration ["raw-samples-path": /var/folders/zx/1np75sd15ygcssjxtfnq30xc0000gn/T/WYlcor8c/samples.raw, "symbolizer": ProfileRecorderSampleConversion.CoreSymbolicationSymboliser, "sample-count": 3000, "peer": [IPv4]127.0.0.1/127.0.0.1:64470, "duration": 8.348323542 seconds, "symbolicated-samples-path": /var/folders/zx/1np75sd15ygcssjxtfnq30xc0000gn/T/WYlcor8c/samples.perf, "time-between-samples": 10ms]

Repeated request to sample will take roughly the same amount of time.

Conversion specifically:

$ time ./.build/release/swipr-sample-conv --format pprof < swipr_3000_10ms > sample.pprof
[...]
./.build/release/swipr-sample-conv --format pprof < swipr_3000_10ms >   6.41s user 0.13s system 95% cpu 6.845 total

$ time ./.build/release/swipr-sample-conv < swipr_3000_10ms > /dev/null
[...]
./.build/release/swipr-sample-conv < swipr_3000_10ms > /dev/null  7.64s user 0.13s system 98% cpu 7.865 total

The sample is quite long at 2.7 million lines; mostly caused by 84 threads:

$ wc -l swipr_3000_10ms
 2711686 swipr_3000_10ms

A ~5 second processing time can be reproduced by:

$ ./.build/release/swipr-mini-demo --blocking --no-burn-cpu --sample-count 3000 --ms-between-samples 10 --threads 80 --iterations 80 --output large_sample.swipr

$ wc -l large_sample.swipr
 2022505 large_sample.swipr

$ ./.build/release/swipr-sample-conv --format pprof < large_sample.swipr > large_sample.pprof

I understand the 10ms sampling rate generates excessive samples, and are reducing the parameters to reduce the overhead.

Overhead is mostly from decoding JSON line.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions