-
Notifications
You must be signed in to change notification settings - Fork 5.7k
fix(inputs.chrony): Prevent race condition in concurrent gather calls #17765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The chrony plugin was experiencing intermittent panics with error runtime error: index out of range [256] with length 256 when multiple Gather() calls executed concurrently.
759450f
to
dfe1e78
Compare
Hi, @skartikey. I have nearly an identical change as this but for the CSV parser in #17573 that's still not been addressed in over a month. A review would be much appreciated. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @skartikey! Just one small suggestion...
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @skartikey!
Summary
The chrony plugin was experiencing intermittent panics with error
runtime error: index out of range [256] with length 256
when multipleGather()
calls executed concurrently.Root Cause:
The plugin creates a single
fbchrony.Client
instance shared across allGather()
calls. When Telegraf triggers a newGather()
before the previous one completes (e.g., when collecting sourcestats with multiple sources), both goroutinesaccess the shared client concurrently:
client.Communicate()
simultaneouslybinary.Write()
concurrently (Concurrent writes corrupt memory)index out of range [256]
This race condition is intermittent because it only manifests when gather operations overlap, typically when sourcestats collection takes longer than the configured interval.
Checklist
Related issues
resolves #17757