Skip to content

Retrying in case of a broken pipe #12359

@potiuk

Description

@potiuk

Summary

Recently in our CI we started to experience somewhat frequent failures due to broken pipe errors when pulling big artifacts from PyPI. Particularly pyspark==3.5.5 which is 320 MB to pull: https://pypi.org/project/pyspark/#files

While this is of course a problem of the infrastructure (either PyPI or GH actions), it should be relatively easy (I guess) to introduce a retry mechanism in case of broken pipe - which is very likely transient error.

Example

https://github.com/apache/airflow/actions/runs/13987630354/job/39165182511?pr=47798#step:11:5191

Running command: uv pip install --no-sources -e '.[all-core]' ./airflow-core ./task-sdk --reinstall apache-airflow-providers-airbyte==5.0.1 apache-airflow-providers-alibaba==3.0.1 apache-airflow-providers-amazon==9.4.0 apache-airflow-providers-apache-beam==6.0.3 apache-airflow-providers-apache-cassandra==3.7.1 apache-airflow-providers-apache-drill==3.0.1 
  apache-airflow-providers-apache-druid==4.1.0 apache-airflow-providers-apache-flink==1.6.1 apache-airflow-providers-apache-hdfs==4.7.1 apache-airflow-providers-apache-hive==9.0.3 apache-airflow-providers-apache-iceberg==1.2.1 apache-airflow-providers-apache-impala==1.6.1 apache-airflow-providers-apache-kafka==1.7.0 apache-airflow-providers-apache-kylin==3.8.1 
  apache-airflow-providers-apache-livy==4.2.1 apache-airflow-providers-apache-pig==4.6.1 apache-airflow-providers-apache-pinot==4.7.0 apache-airflow-providers-apache-spark==5.0.1 apache-airflow-providers-apprise==2.0.1 apache-airflow-providers-arangodb==2.7.3 apache-airflow-providers-asana==2.9.1 apache-airflow-providers-atlassian-jira==3.0.1 apache-airflow-providers-celery==3.10.3 
  apache-airflow-providers-cloudant==4.1.1 apache-airflow-providers-cncf-kubernetes==10.3.1 apache-airflow-providers-cohere==1.4.3 apache-airflow-providers-common-compat==1.5.1 apache-airflow-providers-common-io==1.5.1 apache-airflow-providers-common-sql==1.24.0 apache-airflow-providers-databricks==7.2.1 apache-airflow-providers-datadog==3.8.3 apache-airflow-providers-dbt-cloud==4.2.1 
  apache-airflow-providers-dingding==3.7.3 apache-airflow-providers-discord==3.9.3 apache-airflow-providers-docker==4.2.1 apache-airflow-providers-elasticsearch==6.2.1 apache-airflow-providers-exasol==4.7.3 apache-airflow-providers-facebook==3.7.1 apache-airflow-providers-ftp==3.12.3 apache-airflow-providers-github==2.8.3 apache-airflow-providers-google==14.0.0 apache-airflow-providers-grpc==3.7.3 
  apache-airflow-providers-hashicorp==4.1.0 apache-airflow-providers-http==5.2.1 apache-airflow-providers-imap==3.8.3 apache-airflow-providers-influxdb==2.8.3 apache-airflow-providers-jdbc==5.0.1 apache-airflow-providers-jenkins==4.0.3 apache-airflow-providers-microsoft-azure==12.2.1 apache-airflow-providers-microsoft-mssql==4.2.1 apache-airflow-providers-microsoft-psrp==3.0.1 
  apache-airflow-providers-microsoft-winrm==3.9.1 apache-airflow-providers-mongo==5.0.2 apache-airflow-providers-mysql==6.2.0 apache-airflow-providers-neo4j==3.8.2 apache-airflow-providers-odbc==4.9.1 apache-airflow-providers-openai==1.5.2 apache-airflow-providers-openfaas==3.7.1 apache-airflow-providers-openlineage==2.1.1 apache-airflow-providers-opensearch==1.6.2 
  apache-airflow-providers-opsgenie==5.8.2 apache-airflow-providers-oracle==4.0.2 apache-airflow-providers-pagerduty==4.0.2 apache-airflow-providers-papermill==3.9.2 apache-airflow-providers-pgvector==1.4.1 apache-airflow-providers-pinecone==2.2.2 apache-airflow-providers-postgres==6.1.1 apache-airflow-providers-presto==5.8.2 apache-airflow-providers-qdrant==1.3.2 
  apache-airflow-providers-redis==4.0.2 apache-airflow-providers-salesforce==5.10.1 apache-airflow-providers-samba==4.9.2 apache-airflow-providers-segment==3.7.2 apache-airflow-providers-sendgrid==4.0.1 apache-airflow-providers-sftp==5.1.1 apache-airflow-providers-singularity==3.7.1 apache-airflow-providers-slack==9.0.2 apache-airflow-providers-smtp==2.0.1 apache-airflow-providers-snowflake==6.1.1 
  apache-airflow-providers-sqlite==4.0.1 apache-airflow-providers-ssh==4.0.1 apache-airflow-providers-standard==0.1.1 apache-airflow-providers-tableau==5.0.2 apache-airflow-providers-telegram==4.7.2 apache-airflow-providers-teradata==3.0.2 apache-airflow-providers-trino==6.1.0 apache-airflow-providers-vertica==4.0.1 apache-airflow-providers-weaviate==3.0.2 apache-airflow-providers-yandex==4.0.2 
  apache-airflow-providers-ydb==2.1.1 apache-airflow-providers-zendesk==4.9.1 --resolution highest
  Using Python 3.12.9 environment at: /usr/local
     Building apache-airflow @ file:///opt/airflow
        Built apache-airflow @ file:///opt/airflow
    × Failed to download and build `pyspark==3.5.5`
    ├─▶ Failed to extract archive
    ├─▶ failed to unpack
    │   `/root/.cache/uv/sdists-v9/.tmpK4m8P8/pyspark-3.5.5/deps/jars/hadoop-client-runtime-3.3.4.jar`
    ├─▶ failed to unpack
    │   `pyspark-3.5.5/deps/jars/hadoop-client-runtime-3.3.4.jar` into
    │   `/root/.cache/uv/sdists-v9/.tmpK4m8P8/pyspark-3.5.5/deps/jars/hadoop-client-runtime-3.3.4.jar`
    ├─▶ error decoding response body
    ├─▶ request or response body error
    ├─▶ error reading a body from connection
    ╰─▶ stream closed because of a broken pipe
    help: `pyspark` (v3.5.5) was included because
          `apache-airflow-providers-apache-spark` (v5.0.1) depends on
          `pyspark>=3.1.3`
  Traceback (most recent call last):
  File "/opt/airflow/scripts/in_container/run_generate_constraints.py", line 531, in <module>
    generate_constraints()
  File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/rich_click/rich_command.py", line 166, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/scripts/in_container/run_generate_constraints.py", line 517, in generate_constraints
    generate_constraints_pypi_providers(config_params)
  File "/opt/airflow/scripts/in_container/run_generate_constraints.py", line 396, in generate_constraints_pypi_providers
    run_command(
  File "/opt/airflow/scripts/in_container/in_container_utils.py", line 54, in run_command
    result = subprocess.run(cmd, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/subprocess.py", line 573, in run
    raise CalledProcessError(retcode, process.args,

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions