Skip to content

Airflow Datafusion Hook: Bug in CDAP Program Start Status Validation & API Usage #50387

@bhardwaj-priyanshu

Description

@bhardwaj-priyanshu

Apache Airflow version

3.0.0

If "Other Airflow 2 version" selected, which one?

No response

What happened?

There is a issue within the airflow/providers/google/cloud/hooks/datafusion.py concerning how it interacts with CDAP's Lifecycle Microservices for starting programs.

Full Error

[2025-05-05T12:19:53.950+0000] {taskinstance.py:2907} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/python3.11/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 465, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 432, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/airflow/models/baseoperator.py", line 401, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/airflow/providers/google/cloud/operators/datafusion.py", line 825, in execute
    pipeline_id = hook.start_pipeline(
                  ^^^^^^^^^^^^^^^^^^^^
  File "/opt/python3.11/lib/python3.11/site-packages/airflow/providers/google/cloud/hooks/datafusion.py", line 500, in start_pipeline
    return response_json[0]["runId"]
           ~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'runId'

Image

What you think should happen instead?

There are two issues within the airflow/providers/google/cloud/hooks/datafusion.py concerning how it interacts with CDAP's Lifecycle Microservices for starting programs.

1. Insufficient Status Code Checking

2. [Preferred Approach] Suboptimal API Usage for Single Program Starts

How to reproduce

Trigger the CDF start pipeline with wrong runtime args or incorrect program.

Operating System

I am not sure how is that related to the issue. NA

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions