Skip to content

Conversation

@VorpalBlade
Copy link

@VorpalBlade VorpalBlade commented Aug 24, 2025

  • Add support for tracepoints on Linux by upgrading to usdt 0.6.0 (semver bump of usdt).
  • Add bpftrace examples that mirror the dtrace examples. bpftrace is clearly inspired by dtrace, but sufficiently different that separate examples are needed.
  • Improve upon the unsafe union code converting tokio Ids to u64. There are two aspects to this:
    • Use static asserts rather than runtime checks. As you already use edition 2024, there is no reason to not use static asserts (static asserts was stabilised before the 2024 edition). This is a breaking change in that it removes an error variant.
    • Use u64 instead of NonZeroU64: I believe the old code was unsound as it would cause UB if tokio ever changed to allow the 0 value for tasks (unlikely but theoretically possible).
      The new code is however still unsound if tokio would change to something like (u32, u16) which keeps the same size but introduces padding. I'm looking for a solution to this (if one exists). See https://users.rust-lang.org/t/unsafe-unions-bit-pattern-validity/133366

This PR closes issue #3.

@VorpalBlade
Copy link
Author

VorpalBlade commented Aug 24, 2025

Note: it would be nice to set up basic CI: Build on Linux and presumably on Illumos too. While I could do the Linux CI, I have no clue about how you at Oxide have your Illumos CI set up (as that isn't a thing that Github normally supports, custom runners I presume?)

I did test RUSTFLAGS="--cfg tokio_unstable" cross build --target x86_64-unknown-freebsd but that is about as far as I can easily go with non-Linux testing.

@VorpalBlade
Copy link
Author

Looks like oxidecomputer/usdt#340 was recently merged that should provide support on Linux using usdt. Some of the other improvements are still relevant. I'll split those out into a separate PR in the coming days.

Before, if tokio started allowing for Ids that were 0, this code would be unsound. Now it
should be fully sound. Also the check happens at compile time, which is preferable to doing
it at runtime.
@VorpalBlade
Copy link
Author

VorpalBlade commented Sep 8, 2025

@hawkw I have now changed approach entirely, since Linux support was recently added to the usdt crate. I'm guessing you don't watch this repo, since you didn't give any feedback. Sorry if I'm pinging you when that wasn't needed.

Once this is merged it would be good to have a new release as well, which will be semver major due to removing one of the error cases:

cargo semver-checks
    Checking tokio-dtrace v0.1.1 -> v0.1.1 (no change; assume minor)
     Checked [   0.007s] 140 checks: 137 pass, 3 fail, 0 warn, 38 skip

--- failure enum_variant_missing: pub enum variant removed or renamed ---

Description:
A publicly-visible enum has at least one variant that is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.43.0/src/lints/enum_variant_missing.ron

Failed in:
  variant RegistrationError::InvalidCasts, previously in file /home/arvid/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-dtrace-0.1.1/src/lib.rs:228

--- failure function_missing: pub fn removed or renamed ---

Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.43.0/src/lints/function_missing.ron

Failed in:
  function tokio_dtrace::check_casts, previously in file /home/arvid/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-dtrace-0.1.1/src/lib.rs:266

--- failure struct_missing: pub struct removed or renamed ---

Description:
A publicly-visible struct cannot be imported by its prior path. A `pub use` may have been removed, or the struct itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.43.0/src/lints/struct_missing.ron

Failed in:
  struct tokio_dtrace::InvalidCasts, previously in file /home/arvid/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-dtrace-0.1.1/src/lib.rs:246

     Summary semver requires new major version: 3 major and 0 minor checks failed
    Finished [  26.089s] tokio-dtrace

@VorpalBlade
Copy link
Author

@hawkw (or perhaps some other Oxide employee like @bnaecker who has done work on the underlying usdt crate): I would love to see some progress on this so we can get Linux support in the tokio-dtrace crate.

@conradludgate
Copy link

Might be a skill issue, or maybe a problem with this/usdt but I'm observing:

> bpftrace -l 'usdt:mybin:tokio:*'
WARNING: invalid address 0x11 for probe (tokio,worker-thread-start) in binary mybin
WARNING: invalid address 0x11 for probe (tokio,worker-thread-stop) in binary mybin
WARNING: invalid address 0x11 for probe (tokio,worker-thread-park) in binary mybin
WARNING: invalid address 0x11 for probe (tokio,worker-thread-unpark) in binary mybin
usdt:mybin:tokio:task-poll-end
usdt:mybin:tokio:task-poll-start
usdt:mybin:tokio:task-spawn
usdt:mybin:tokio:task-terminate
usdt:mybin:tokio:worker-thread-park
usdt:mybin:tokio:worker-thread-start
usdt:mybin:tokio:worker-thread-stop
usdt:mybin:tokio:worker-thread-unpark

and when I try to attach the following program

usdt:mybin:tokio:task-poll-start { 
    @polling[tid] = nsecs; 
} 

usdt:mybin:tokio:task-poll-end 
/@polling[tid]/ { 
    @ns = hist(nsecs - @polling[tid]);
    delete(@polling[tid]); 
}

end {
    clear(@polling);
}

I see the error

FATAL: Invalid probe type made it to attachpoint parser
Aborted (core dumped)

It's most likely an issue with my setup/kernel version, but posting just in case. Will update if I find out more - for now I am going to fork and remove the worker probes for the sake of testing, as that seems to be the main source of issues atm (why?)

`bpftrace --info`
System
  OS: Linux 6.1.119-129.201.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Dec  3 21:07:35 UTC 2024
  Arch: x86_64

Build
  version: v0.17.0
  LLVM: 14.0.6
  unsafe uprobe: no
  bfd: no
  libdw (DWARF support): yes

Kernel helpers
  probe_read: yes
  probe_read_str: yes
  probe_read_user: yes
  probe_read_user_str: yes
  probe_read_kernel: yes
  probe_read_kernel_str: yes
  get_current_cgroup_id: yes
  send_signal: yes
  override_return: yes
  get_boot_ns: yes
  dpath: yes
  skboutput: no

Kernel features
  Instruction limit: 1000000
  Loop support: yes
  btf: yes
  map batch: yes
  uprobe refcount (depends on Build:bcc bpf_attach_uprobe refcount): yes

Map types
  hash: yes
  percpu hash: yes
  array: yes
  percpu array: yes
  stack_trace: yes
  perf_event_array: yes

Probe types
  kprobe: yes
  tracepoint: yes
  perf_event: yes
  kfunc: yes
  iter:task: yes
  iter:task_file: yes
  kprobe_multi: no
  raw_tp_special: yes

I am probably fighting a losing battle by running bpftrace inside EKS :D

@VorpalBlade
Copy link
Author

VorpalBlade commented Oct 29, 2025

@conradludgate Huh, when I tested this previously it worked. The only gotcha I could think of would be forgetting RUSTFLAGS="--cfg tokio_unstable", but I don't think (and certainly hope) that would result in such an error.

Here are some scattered thoughts and ideas:

  1. If you try with one of the examples from this repo, does it work then?
  2. You could also try with a newer bpftrace, the version you have is very old (from my point of view anyway, I guess you are on some LTS distro). Maybe the old version is buggy (it is also built with a very outdated version of LLVM)? I'm using version 0.24 which I believe is the latest release.
  3. One interesting thing to look at would be what eu-readelf -n think about the trace points. E.g looking at one of the examples from this repo:
eu-readelf -n target/debug/examples/basic             

Note section [ 2] '.note.ABI-tag' of 32 bytes at offset 0x2fc:
  Owner          Data size  Type
  GNU                   16  GNU_ABI_TAG
    OS: Linux, ABI: 4.4.0

Note section [ 3] '.note.gnu.build-id' of 36 bytes at offset 0x31c:
  Owner          Data size  Type
  GNU                   20  GNU_BUILD_ID
    Build ID: 4a893a3ee89360920be79c56d76c0a11c3fbe437

Note section [43] '.note.stapsdt' of 664 bytes at offset 0x1c1c634:
  Owner          Data size  Type
  stapsdt               69  Version: 3
    PC: 0xbcfab, Base: 0x6236a, Semaphore: 0x1ba7d0
    Provider: tokio, Name: task-spawn, Args: '8@%rdi 8@%rsi 4@%edx 4@%ecx'
  stapsdt               74  Version: 3
    PC: 0xbd23b, Base: 0x6236a, Semaphore: 0x1ba7d2
    Provider: tokio, Name: task-poll-start, Args: '8@%rdi 8@%rsi 4@%edx 4@%ecx'
  stapsdt               72  Version: 3
    PC: 0xbd4cb, Base: 0x6236a, Semaphore: 0x1ba7d4
    Provider: tokio, Name: task-poll-end, Args: '8@%rdi 8@%rsi 4@%edx 4@%ecx'
  stapsdt               73  Version: 3
    PC: 0xbd75b, Base: 0x6236a, Semaphore: 0x1ba7d6
    Provider: tokio, Name: task-terminate, Args: '8@%rdi 8@%rsi 4@%edx 4@%ecx'
  stapsdt               51  Version: 3
    PC: 0xbd7cc, Base: 0x6236a, Semaphore: 0x1ba7d8
    Provider: tokio, Name: worker-thread-start, Args: ''
  stapsdt               50  Version: 3
    PC: 0xbd7fc, Base: 0x6236a, Semaphore: 0x1ba7da
    Provider: tokio, Name: worker-thread-stop, Args: ''
  stapsdt               50  Version: 3
    PC: 0xbd82c, Base: 0x6236a, Semaphore: 0x1ba7dc
    Provider: tokio, Name: worker-thread-park, Args: ''
  stapsdt               52  Version: 3
    PC: 0xbd85c, Base: 0x6236a, Semaphore: 0x1ba7de
    Provider: tokio, Name: worker-thread-unpark, Args: ''

We here see reasonable semaphore addresses listed. If you see those, but bpftrace see weird addresses like 0x11, I would suspect the old bpftrace version. However if you do see 0x11 here too, something is up with your toolchain. Relevant questions then become: a) what rust version (and how was it installed) b) what system compiler is used as the linker driver (gcc/clang and which version?) and c) what linker and version?


For reference (I use Arch Linux, which is relatively bleeding edge):

sudo bpftrace --info
System
  OS: Linux 6.17.1-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Mon, 06 Oct 2025 18:48:15 +0000
  Arch: x86_64

Build
  version: v0.24.0
  LLVM: 21.1.4
  bfd: yes
  libdw (DWARF support): yes
  libsystemd (systemd notify support): yes
  blazesym (advanced symbolization): yes

Kernel helpers
  dpath: yes                       get_tai_ns: yes                    
  get_func_ip: yes                 lookup_percpu_elem: yes            

Kernel features
  Instruction limit: 1000000       btf: yes                           
  module btf: yes                  map batch: yes                     

Probe types
  kprobe_multi: yes                uprobe_multi: yes                  
  kprobe_session: yes              iter: yes                  

@VorpalBlade
Copy link
Author

As for differences between worker-* and other tracepoints: The worker tracepoints don't take arguments. I have no idea why that would be relevant though.

(I don't really see a point of having a sempahore for a tracepoint without arguments, but that is up to the underlying usdt crate. And it should work, just be slightly less optimal code.)

@conradludgate
Copy link

conradludgate commented Oct 30, 2025

Thanks for the tip. Updating bpftools did indeed do the trick and now the probes can attach. Although there's still the 0x11 issue.

`eu-readelf -n ...`
Note section [ 2] '.note.gnu.property' of 32 bytes at offset 0x300:
  Owner          Data size  Type
  GNU                   16  GNU_PROPERTY_TYPE_0
    X86 0xc0008002 data: 01 00 00 00

Note section [ 3] '.note.ABI-tag' of 32 bytes at offset 0x320:
  Owner          Data size  Type
  GNU                   16  GNU_ABI_TAG
    OS: Linux, ABI: 3.2.0

Note section [ 4] '.note.gnu.build-id' of 36 bytes at offset 0x340:
  Owner          Data size  Type
  GNU                   20  GNU_BUILD_ID
    Build ID: 149d74e0f8ebc4dff7e5f162a3018ba5d503fafb

Note section [47] '.note.stapsdt' of 1528 bytes at offset 0x249ece8:
  Owner          Data size  Type
  stapsdt               52  Version: 3
    PC: 0x11a3f31, Base: 0x91beac, Semaphore: 0x24a0cd8
    Provider: tokio, Name: worker-thread-unpark, Args: ''
  stapsdt               50  Version: 3
    PC: 0x11a3f81, Base: 0x91beac, Semaphore: 0x24a0cda
    Provider: tokio, Name: worker-thread-stop, Args: ''
  stapsdt               51  Version: 3
    PC: 0x11a3fb1, Base: 0x91beac, Semaphore: 0x24a0cdc
    Provider: tokio, Name: worker-thread-start, Args: ''
  stapsdt               50  Version: 3
    PC: 0x11a3fd1, Base: 0x91beac, Semaphore: 0x24a0cde
    Provider: tokio, Name: worker-thread-park, Args: ''
  stapsdt               51  Version: 3
    PC: 0x11a3ff1, Base: 0x91beac, Semaphore: 0x24a0cdc
    Provider: tokio, Name: worker-thread-start, Args: ''
  stapsdt               52  Version: 3
    PC: 0x11a4011, Base: 0x91beac, Semaphore: 0x24a0cd8
    Provider: tokio, Name: worker-thread-unpark, Args: ''
  stapsdt               50  Version: 3
    PC: 0x11a4031, Base: 0x91beac, Semaphore: 0x24a0cde
    Provider: tokio, Name: worker-thread-park, Args: ''
  stapsdt               50  Version: 3
    PC: 0x11a4051, Base: 0x91beac, Semaphore: 0x24a0cda
    Provider: tokio, Name: worker-thread-stop, Args: ''
  stapsdt               52  Version: 3
    PC: 0x11a40b1, Base: 0x91beac, Semaphore: 0x24a0cd8
    Provider: tokio, Name: worker-thread-unpark, Args: ''
  stapsdt               50  Version: 3
    PC: 0x11a40e1, Base: 0x91beac, Semaphore: 0x24a0cde
    Provider: tokio, Name: worker-thread-park, Args: ''
  stapsdt               51  Version: 3
    PC: 0x11a4111, Base: 0x91beac, Semaphore: 0x24a0cdc
    Provider: tokio, Name: worker-thread-start, Args: ''
  stapsdt               50  Version: 3
    PC: 0x11a4151, Base: 0x91beac, Semaphore: 0x24a0cda
    Provider: tokio, Name: worker-thread-stop, Args: ''
  stapsdt               69  Version: 3
    PC: 0x11a46a3, Base: 0x91beac, Semaphore: 0x24a0ce0
    Provider: tokio, Name: task-spawn, Args: '8@%rdi 8@%rsi 4@%edx 4@%ecx'
  stapsdt               74  Version: 3
    PC: 0x11a47d3, Base: 0x91beac, Semaphore: 0x24a0ce2
    Provider: tokio, Name: task-poll-start, Args: '8@%rdi 8@%rsi 4@%edx 4@%ecx'
  stapsdt               72  Version: 3
    PC: 0x11a4903, Base: 0x91beac, Semaphore: 0x24a0ce4
    Provider: tokio, Name: task-poll-end, Args: '8@%rdi 8@%rsi 4@%edx 4@%ecx'
  stapsdt               73  Version: 3
    PC: 0x11a4a33, Base: 0x91beac, Semaphore: 0x24a0ce6
    Provider: tokio, Name: task-terminate, Args: '8@%rdi 8@%rsi 4@%edx 4@%ecx'
  stapsdt               51  Version: 3
    PC: 0x11, Base: 0x91beac, Semaphore: 0x24a0cdc
    Provider: tokio, Name: worker-thread-start, Args: ''
  stapsdt               50  Version: 3
    PC: 0x11, Base: 0x91beac, Semaphore: 0x24a0cda
    Provider: tokio, Name: worker-thread-stop, Args: ''
  stapsdt               50  Version: 3
    PC: 0x11, Base: 0x91beac, Semaphore: 0x24a0cde
    Provider: tokio, Name: worker-thread-park, Args: ''
  stapsdt               52  Version: 3
    PC: 0x11, Base: 0x91beac, Semaphore: 0x24a0cd8
    Provider: tokio, Name: worker-thread-unpark, Args: ''

@VorpalBlade
Copy link
Author

That's odd, in your output (please use proper markdown syntax to make it easier to read next time) I see PC: 0x11 (not the semaphore though). Still that looks really wrong.

  1. Does this happen on the examples in this repo as well?
  2. What is the version of tokio, rustc, gcc, and your linker (ld or lld for example)?

@conradludgate
Copy link

(please use proper markdown syntax to make it easier to read next time)

Not sure what github is doing. The syntax looks correct to me

@VorpalBlade
Copy link
Author

VorpalBlade commented Nov 2, 2025

Seems you figured out the formatting. Still I would like to know the other things I mentioned: 1. Does it happen on the examples? 2. What about the versions?

These would help narrow down the cause of the issue you are seeing. I could start to try to reproduce it with that info.

@conradludgate
Copy link

I was able to find the root cause of the issue I was facing. It's nothing to do with this crate in particular but it's a combination of the asm that usdt crate produces on linux and a weird interaction with monomorphisation, codegen-units, and linker --gc-sections. I'll open an issue on the usdt crate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants