Skip to content

AnalysisRun succeeds even when Job fails #4393

@marzlarz

Description

@marzlarz

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

We have setup an AnalysisTemplate that will kick off a Job to determine the rollout health. The Job will exit with 0 if successful or 1 if failure.

  • The Job status is working properly and showing "Failed"
  • The parent AnalysisRun is not working properly, as its showing "Successful" instead of "Failed"
  • This causes the rollout to continue to progress and become "stable"

To Reproduce

Example configuration:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: container-exit-code-check
spec:
  metrics:
  - name: container-check
    # The 'count' field ensures the metric is evaluated only once.
    count: 1
    # The 'timeout' ensures the job does not run indefinitely.
    timeout: 5m
    failureLimit: 1
    provider:
      job:
        # Define the Job specification to run your container
        spec:
          # Set backoffLimit to 0 to prevent retries for failed jobs.
          # This ensures the Job is marked as failed immediately.
          backoffLimit: 0
          template:
            spec:
              restartPolicy: Never
              containers:
              - name: check-container
                image: busybox
                imagePullPolicy: IfNotPresent
                # This command is configured to fail for testing purposes.
                command: ["sh", "-c", "echo 'Running analysis container...'; exit 1;"]
    # The failure condition now checks the aggregated metric results.
    failureCondition: "metricResults[0].failed > 0"

Expected behavior

We would expect the AnalysisRun to be marked as "Failed" since the child Job is marked as "Failed"

Screenshots

Image

Version

v1.8.1+1ad2c6a

Logs

# Paste the logs from the rollout controller

time="2025-08-05T16:37:06Z" level=info msg="rollout enqueue due to update event" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:06Z" level=info msg="Patched: {\"status\":{\"canary\":{\"currentStepAnalysisRunStatus\":{\"status\":\"Running\"}}}}" generation=71 namespace=test resourceVersion=126396616 rollout=testapp-rollout
time="2025-08-05T16:37:06Z" level=info msg="rollout enqueue due to update event" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:06Z" level=info msg="persisted to informer" generation=71 namespace=test resourceVersion=126396619 rollout=testapp-rollout
time="2025-08-05T16:37:06Z" level=info msg="Reconciliation completed" generation=71 namespace=test resourceVersion=126396616 rollout=testapp-rollout time_ms=24.470204
time="2025-08-05T16:37:06Z" level=info msg="Started syncing rollout" generation=71 namespace=test resourceVersion=126396619 rollout=testapp-rollout
time="2025-08-05T16:37:06Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:06Z" level=info msg="Reconciling TrafficRouting with type 'Nginx'" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:06Z" level=info msg="No changes to canary ingress - skipping patch" ingress=testapp-rollout-app-qa-canary namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:06Z" level=info msg="No changes to canary ingress - skipping patch" ingress=testapp-rollout-app-qa-gslb-canary namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:06Z" level=info msg="Reconciling analysis step (stepIndex: 2)" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:06Z" level=info msg="No status changes. Skipping patch" generation=71 namespace=test resourceVersion=126396619 rollout=testapp-rollout
time="2025-08-05T16:37:06Z" level=info msg="Reconciliation completed" generation=71 namespace=test resourceVersion=126396619 rollout=testapp-rollout time_ms=3.630396
time="2025-08-05T16:37:25Z" level=info msg="Started syncing rollout" generation=71 namespace=test resourceVersion=126396619 rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="Reconciling TrafficRouting with type 'Nginx'" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="No changes to canary ingress - skipping patch" ingress=testapp-rollout-app-qa-canary namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="No changes to canary ingress - skipping patch" ingress=testapp-rollout-app-qa-gslb-canary namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="Reconciling analysis step (stepIndex: 2)" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="Step Analysis Run 'testapp-rollout-954566f4d-34-2' Status New: 'Successful' Previous: 'Running'" event_reason=AnalysisRunSuccessful namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="Rollout step 3/5 completed (analysis)" event_reason=RolloutStepCompleted namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="Patched: {\"status\":{\"canary\":{\"currentStepAnalysisRunStatus\":null},\"conditions\":[{\"lastTransitionTime\":\"2025-07-21T10:52:44Z\",\"lastUpdateTime\":\"2025-07-21T10:52:44Z\",\"message\":\"Rollout has minimum availability\",\"reason\":\"AvailableReason\",\"status\":\"True\",\"type\":\"Available\"},{\"lastTransitionTime\":\"2025-08-05T16:36:40Z\",\"lastUpdateTime\":\"2025-08-05T16:36:40Z\",\"message\":\"Rollout is not healthy\",\"reason\":\"RolloutHealthy\",\"status\":\"False\",\"type\":\"Healthy\"},{\"lastTransitionTime\":\"2025-08-05T16:36:40Z\",\"lastUpdateTime\":\"2025-08-05T16:36:40Z\",\"message\":\"RolloutCompleted\",\"reason\":\"RolloutCompleted\",\"status\":\"False\",\"type\":\"Completed\"},{\"lastTransitionTime\":\"2025-08-05T16:37:06Z\",\"lastUpdateTime\":\"2025-08-05T16:37:06Z\",\"message\":\"Rollout is paused\",\"reason\":\"RolloutPaused\",\"status\":\"False\",\"type\":\"Paused\"},{\"lastTransitionTime\":\"2025-08-05T16:37:06Z\",\"lastUpdateTime\":\"2025-08-05T16:37:25Z\",\"message\":\"ReplicaSet \\\"testapp-rollout-954566f4d\\\" is progressing.\",\"reason\":\"ReplicaSetUpdated\",\"status\":\"True\",\"type\":\"Progressing\"}],\"currentStepIndex\":3}}" generation=71 namespace=test resourceVersion=126396619 rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="rollout enqueue due to update event" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="persisted to informer" generation=71 namespace=test resourceVersion=126396852 rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="Reconciliation completed" generation=71 namespace=test resourceVersion=126396619 rollout=testapp-rollout time_ms=25.30153
time="2025-08-05T16:37:25Z" level=info msg="Started syncing rollout" generation=71 namespace=test resourceVersion=126396852 rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="Reconciling TrafficRouting with type 'Nginx'" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="updating canary Ingress" desiredWeight=50 ingress=testapp-rollout-app-qa-canary namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="Updating Ingress `testapp-rollout-app-qa-canary` to desiredWeight '50'" event_reason=PatchingCanaryIngress namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="updating canary Ingress" desiredWeight=50 ingress=testapp-rollout-app-qa-gslb-canary namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:25Z" level=info msg="Updating Ingress `testapp-rollout-app-qa-gslb-canary` to desiredWeight '50'" event_reason=PatchingCanaryIngress namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Previous weights: &TrafficWeights{Canary:WeightDestination{Weight:0,ServiceName:app-qa-canary,PodTemplateHash:954566f4d,},Stable:WeightDestination{Weight:100,ServiceName:app-qa-stable,PodTemplateHash:58875b6bf5,},Additional:[]WeightDestination{},Verified:nil,}" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="New weights: &TrafficWeights{Canary:WeightDestination{Weight:50,ServiceName:app-qa-canary,PodTemplateHash:954566f4d,},Stable:WeightDestination{Weight:50,ServiceName:app-qa-stable,PodTemplateHash:58875b6bf5,},Additional:[]WeightDestination{},Verified:nil,}" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Traffic weight updated from 0 to 50" event_reason=TrafficWeightUpdated namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Rollout step 4/5 completed (setWeight: 50)" event_reason=RolloutStepCompleted namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Patched: {\"status\":{\"canary\":{\"weights\":{\"canary\":{\"weight\":50},\"stable\":{\"weight\":50}}},\"conditions\":[{\"lastTransitionTime\":\"2025-07-21T10:52:44Z\",\"lastUpdateTime\":\"2025-07-21T10:52:44Z\",\"message\":\"Rollout has minimum availability\",\"reason\":\"AvailableReason\",\"status\":\"True\",\"type\":\"Available\"},{\"lastTransitionTime\":\"2025-08-05T16:36:40Z\",\"lastUpdateTime\":\"2025-08-05T16:36:40Z\",\"message\":\"Rollout is not healthy\",\"reason\":\"RolloutHealthy\",\"status\":\"False\",\"type\":\"Healthy\"},{\"lastTransitionTime\":\"2025-08-05T16:36:40Z\",\"lastUpdateTime\":\"2025-08-05T16:36:40Z\",\"message\":\"RolloutCompleted\",\"reason\":\"RolloutCompleted\",\"status\":\"False\",\"type\":\"Completed\"},{\"lastTransitionTime\":\"2025-08-05T16:37:06Z\",\"lastUpdateTime\":\"2025-08-05T16:37:06Z\",\"message\":\"Rollout is paused\",\"reason\":\"RolloutPaused\",\"status\":\"False\",\"type\":\"Paused\"},{\"lastTransitionTime\":\"2025-08-05T16:37:06Z\",\"lastUpdateTime\":\"2025-08-05T16:37:26Z\",\"message\":\"ReplicaSet \\\"testapp-rollout-954566f4d\\\" is progressing.\",\"reason\":\"ReplicaSetUpdated\",\"status\":\"True\",\"type\":\"Progressing\"}],\"currentStepIndex\":4}}" generation=71 namespace=test resourceVersion=126396852 rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="persisted to informer" generation=71 namespace=test resourceVersion=126396889 rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Reconciliation completed" generation=71 namespace=test resourceVersion=126396852 rollout=testapp-rollout time_ms=208.38198699999998
time="2025-08-05T16:37:26Z" level=info msg="Started syncing rollout" generation=71 namespace=test resourceVersion=126396889 rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="rollout enqueue due to update event" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Reconciling TrafficRouting with type 'Nginx'" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="No changes to canary ingress - skipping patch" ingress=testapp-rollout-app-qa-canary namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="No changes to canary ingress - skipping patch" ingress=testapp-rollout-app-qa-gslb-canary namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Reconciling canary pause step (stepIndex: 4/5)" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Not finished reconciling Canary Pause" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Adding pause reason CanaryPauseStep with start time 2025-08-05T16:37:26Z" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Rollout is paused (CanaryPauseStep)" event_reason=RolloutPaused namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Patched: {\"status\":{\"controllerPause\":true,\"message\":\"CanaryPauseStep\",\"pauseConditions\":[{\"reason\":\"CanaryPauseStep\",\"startTime\":\"2025-08-05T16:37:26Z\"}],\"phase\":\"Paused\"}}" generation=71 namespace=test resourceVersion=126396889 rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="rollout enqueue due to update event" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="persisted to informer" generation=71 namespace=test resourceVersion=126396890 rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Reconciliation completed" generation=71 namespace=test resourceVersion=126396889 rollout=testapp-rollout time_ms=55.364008
time="2025-08-05T16:37:26Z" level=info msg="Started syncing rollout" generation=71 namespace=test resourceVersion=126396890 rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Patched conditions: {\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2025-07-21T10:52:44Z\",\"lastUpdateTime\":\"2025-07-21T10:52:44Z\",\"message\":\"Rollout has minimum availability\",\"reason\":\"AvailableReason\",\"status\":\"True\",\"type\":\"Available\"},{\"lastTransitionTime\":\"2025-08-05T16:36:40Z\",\"lastUpdateTime\":\"2025-08-05T16:36:40Z\",\"message\":\"Rollout is not healthy\",\"reason\":\"RolloutHealthy\",\"status\":\"False\",\"type\":\"Healthy\"},{\"lastTransitionTime\":\"2025-08-05T16:36:40Z\",\"lastUpdateTime\":\"2025-08-05T16:36:40Z\",\"message\":\"RolloutCompleted\",\"reason\":\"RolloutCompleted\",\"status\":\"False\",\"type\":\"Completed\"},{\"lastTransitionTime\":\"2025-08-05T16:37:26Z\",\"lastUpdateTime\":\"2025-08-05T16:37:26Z\",\"message\":\"Rollout is paused\",\"reason\":\"RolloutPaused\",\"status\":\"Unknown\",\"type\":\"Progressing\"},{\"lastTransitionTime\":\"2025-08-05T16:37:26Z\",\"lastUpdateTime\":\"2025-08-05T16:37:26Z\",\"message\":\"Rollout is paused\",\"reason\":\"RolloutPaused\",\"status\":\"True\",\"type\":\"Paused\"}]}}" generation=71 namespace=test resourceVersion=126396890 rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Reconciling TrafficRouting with type 'Nginx'" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="No changes to canary ingress - skipping patch" ingress=testapp-rollout-app-qa-canary namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="No changes to canary ingress - skipping patch" ingress=testapp-rollout-app-qa-gslb-canary namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Reconciling canary pause step (stepIndex: 4/5)" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Enqueueing Rollout in 14.913946381s seconds" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="rollout enqueue during wait" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Not finished reconciling Canary Pause" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="No status changes. Skipping patch" generation=71 namespace=test resourceVersion=126396890 rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="rollout enqueue due to update event" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="persisted to informer" generation=71 namespace=test resourceVersion=126396892 rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Reconciliation completed" generation=71 namespace=test resourceVersion=126396890 rollout=testapp-rollout time_ms=21.430397
time="2025-08-05T16:37:26Z" level=info msg="Started syncing rollout" generation=71 namespace=test resourceVersion=126396892 rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Reconciling TrafficRouting with type 'Nginx'" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="No changes to canary ingress - skipping patch" ingress=testapp-rollout-app-qa-canary namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="No changes to canary ingress - skipping patch" ingress=testapp-rollout-app-qa-gslb-canary namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Reconciling canary pause step (stepIndex: 4/5)" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Enqueueing Rollout in 14.904179881s seconds" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="rollout enqueue during wait" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Not finished reconciling Canary Pause" namespace=test rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="No status changes. Skipping patch" generation=71 namespace=test resourceVersion=126396892 rollout=testapp-rollout
time="2025-08-05T16:37:26Z" level=info msg="Reconciliation completed" generation=71 namespace=test resourceVersion=126396892 rollout=testapp-rollout time_ms=4.622091


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions