-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
feat(aci milestone 3): delete data in Seer when AD detector is deleted #101843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
❌ 8 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
src/sentry/workflow_engine/endpoints/organization_detector_details.py
Outdated
Show resolved
Hide resolved
saponifi3d
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like there's a bit of code we can cleanup before merging, generally the setup for the hooks etc all looks good though!
| return True | ||
|
|
||
|
|
||
| def delete_rule_in_seer_legacy(alert_rule: "AlertRule") -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this method just invoke return delete_rule_in_seer(alert_rule.id)? (that way we can reduce code replication)
and
| def delete_rule_in_seer_legacy(alert_rule: "AlertRule") -> bool: | |
| def delete_rule_in_seer_legacy(alert_rule: AlertRule) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two methods are slightly different in what they pass to Seer. I think we've done this code duplication for all anomaly detection methods so we can easily delete the legacy code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we need to maintain this code in the future, it might be a cool time to decompose the method and reuse what we can. That would also mean we don't need to maintain two code paths while we're waiting to remove the legacy code. Instead we cold take the opportunity to make this a bit easier to manage, and a lot easier for us to debug any issues in the mean time.
The way i tend to decompose things with two examples is to just look at those differences and try to determine where it would make the most sense to expose new methods. For example, maybe we could wrap the send to seer and handle errors as a method, then delete_rule_in_seer_legacy could compose those shared methods and make any tweaks it might need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we can delete the legacy code outright if every alert rule has a detector. The call only needs to happen once.
I see your point about decomp, will look into it.
src/sentry/workflow_engine/endpoints/organization_detector_details.py
Outdated
Show resolved
Hide resolved
src/sentry/workflow_engine/endpoints/organization_detector_details.py
Outdated
Show resolved
Hide resolved
| ) | ||
| return False | ||
|
|
||
| if response.status >= 400: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Inconsistent HTTP Error Handling Between Functions
There's an inconsistency in HTTP error handling between delete_rule_in_seer and delete_rule_in_seer_legacy. The new function treats response.status >= 400 as an error, but the legacy one uses > 400. This means the legacy function won't flag HTTP 400 responses from Seer as errors, leading to different behavior for new detectors and legacy alert rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, I don't think Seer returns any 400s
saponifi3d
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think biggest feedback would be to make delete_rule_in_seer a little more debug friendly; we can either decompose the method and share it between alert rule / detector deletion or we can update the logs so we can easily differentiate (i think that'd just be changing rule to detector).
src/sentry/workflow_engine/endpoints/organization_detector_details.py
Outdated
Show resolved
Hide resolved
| return True | ||
|
|
||
|
|
||
| def delete_rule_in_seer_legacy(alert_rule: "AlertRule") -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we need to maintain this code in the future, it might be a cool time to decompose the method and reuse what we can. That would also mean we don't need to maintain two code paths while we're waiting to remove the legacy code. Instead we cold take the opportunity to make this a bit easier to manage, and a lot easier for us to debug any issues in the mean time.
The way i tend to decompose things with two examples is to just look at those differences and try to determine where it would make the most sense to expose new methods. For example, maybe we could wrap the send to seer and handle errors as a method, then delete_rule_in_seer_legacy could compose those shared methods and make any tweaks it might need.
| results = json.loads(decoded_data) | ||
| except JSONDecodeError: | ||
| logger.exception( | ||
| "Failed to parse Seer delete rule data response", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to keep these methods separate, let's update the exception text so we can determine if it was triggered by delete_rule_in_seer_legacy or this method.
src/sentry/incidents/logic.py
Outdated
| success = delete_rule_in_seer( | ||
| alert_rule=alert_rule, | ||
| organization=alert_rule.organization, | ||
| source_id=source_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Unhandled exception leads to undefined variable usage crash
If QuerySubscription.DoesNotExist exception is raised, the variable source_id is never assigned, but the code continues execution and tries to use source_id in the delete_rule_in_seer() call on line 933, which will raise a NameError. The code should either return early after logging the exception or assign a default value to source_id before the try block.
|
@saponifi3d removed the legacy code. Let me know if you think we should still break down the error handling into a separate method. |
| extra={ | ||
| "rule_id": alert_rule.id, | ||
| }, | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When are we still hitting this code path? is it only for people using the legacy API? If so, don't we need to call this on delete too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the delete code in the detector lifecycle hook, so any time a detector is deleted the code will be called. If all alert rules are dual written, then we don't need the call in multiple places.
I didn't hook deletion into updates, however, so this is for users of the legacy API. Good callout that I should create an update hook as well.
When a detector is deleted, we need to delete the rule data in Seer. Similar implementation to the legacy alert rule code, but uses detector ID instead of alert rule ID.