Anyone have B200 Tok/s numbers they can share? I swear I had this working well at some point, but now I'm getting numbers far lower than I would expect.
For olmocr I'm seeing something like
Avg prompt throughput: 3431.0 tokens/s
Avg generation throughput: 1062.0 tokens/s
but with a text only model like Qwen3-30B-A3B-Instruct-2507 I'm getting closer to 45,000 Tok/s on prefill and 5k Tok/s on generation.