- 
                Notifications
    
You must be signed in to change notification settings  - Fork 470
 
fix(engine): feedback deadlock issues #4803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
The PR disables the feedback functionality within the engine to prevent a deadlock as explained in the commit message.
- The feedback-related code has been commented out to avoid potential deadlock scenarios.
 - The overall engine behavior is modified to stop feeding back events.
 
9cfd2e5    to
    77d211e      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
0905327    to
    d867f28      
    Compare
  
    d867f28    to
    048db32      
    Compare
  
    Feedback from findings back into the rules engine could cause a deadlock. This is because the engine would eventually block on trying to to send a new event to the feedbacking signature. This would cause a deadlock there - propagating back to the engine and pipeline in general. This does not occur in analyze mode - likely due to less stress in that mode. Introduce a mode field to the engine config to allow distinction between tracee-rules, single binary and analyze modes. The feedback logic which is implemented in the engine is only relevant for analyze mode. In single binary mode, we rely on the pipeline to handle the feedback.
Avoid deadlocks by writing first into a from file channel buffer, then into the engine.
048db32    to
    d71aff0      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Let's port this fix into v0.23.1 as well
| 
           /fast-forward  | 
    
Feedback from findings back into the rules engine could cause a deadlock. This is because the engine would eventually block on trying to to send a new event to the feedbacking signature. This would cause a deadlock there - propagating back to the engine and pipeline in general. This does not occur in analyze mode - likely due to less stress in that mode. Introduce a mode field to the engine config to allow distinction between tracee-rules, single binary and analyze modes. The feedback logic which is implemented in the engine is only relevant for analyze mode. In single binary mode, we rely on the pipeline to handle the feedback. commit a86e656 (main), cherry-pick
Avoid deadlocks by writing first into a from file channel buffer, then into the engine. commit d71aff0 (main), cherry-pick
Feedback from findings back into the rules engine could cause a deadlock. This is because the engine would eventually block on trying to to send a new event to the feedbacking signature. This would cause a deadlock there - propagating back to the engine and pipeline in general. This does not occur in analyze mode - likely due to less stress in that mode. Introduce a mode field to the engine config to allow distinction between tracee-rules, single binary and analyze modes. The feedback logic which is implemented in the engine is only relevant for analyze mode. In single binary mode, we rely on the pipeline to handle the feedback. commit a86e656 (main), cherry-pick
Avoid deadlocks by writing first into a from file channel buffer, then into the engine. commit d71aff0 (main), cherry-pick
1. Explain what the PR does
048db32 fix(analyze): split input channel
316ddfe fix(engine): restrict internal feedback to analyze
048db32 fix(analyze): split input channel
316ddfe fix(engine): restrict internal feedback to analyze
2. Explain how to test it
Confirmed in internal e2e - submit failures with mass verification failure stopped.
3. Other comments