-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Summary
Recently in our CI we started to experience somewhat frequent failures due to broken pipe
errors when pulling big artifacts from PyPI. Particularly pyspark==3.5.5
which is 320 MB to pull: https://pypi.org/project/pyspark/#files
While this is of course a problem of the infrastructure (either PyPI or GH actions), it should be relatively easy (I guess) to introduce a retry mechanism in case of broken pipe - which is very likely transient error.
Example
Running command: uv pip install --no-sources -e '.[all-core]' ./airflow-core ./task-sdk --reinstall apache-airflow-providers-airbyte==5.0.1 apache-airflow-providers-alibaba==3.0.1 apache-airflow-providers-amazon==9.4.0 apache-airflow-providers-apache-beam==6.0.3 apache-airflow-providers-apache-cassandra==3.7.1 apache-airflow-providers-apache-drill==3.0.1
apache-airflow-providers-apache-druid==4.1.0 apache-airflow-providers-apache-flink==1.6.1 apache-airflow-providers-apache-hdfs==4.7.1 apache-airflow-providers-apache-hive==9.0.3 apache-airflow-providers-apache-iceberg==1.2.1 apache-airflow-providers-apache-impala==1.6.1 apache-airflow-providers-apache-kafka==1.7.0 apache-airflow-providers-apache-kylin==3.8.1
apache-airflow-providers-apache-livy==4.2.1 apache-airflow-providers-apache-pig==4.6.1 apache-airflow-providers-apache-pinot==4.7.0 apache-airflow-providers-apache-spark==5.0.1 apache-airflow-providers-apprise==2.0.1 apache-airflow-providers-arangodb==2.7.3 apache-airflow-providers-asana==2.9.1 apache-airflow-providers-atlassian-jira==3.0.1 apache-airflow-providers-celery==3.10.3
apache-airflow-providers-cloudant==4.1.1 apache-airflow-providers-cncf-kubernetes==10.3.1 apache-airflow-providers-cohere==1.4.3 apache-airflow-providers-common-compat==1.5.1 apache-airflow-providers-common-io==1.5.1 apache-airflow-providers-common-sql==1.24.0 apache-airflow-providers-databricks==7.2.1 apache-airflow-providers-datadog==3.8.3 apache-airflow-providers-dbt-cloud==4.2.1
apache-airflow-providers-dingding==3.7.3 apache-airflow-providers-discord==3.9.3 apache-airflow-providers-docker==4.2.1 apache-airflow-providers-elasticsearch==6.2.1 apache-airflow-providers-exasol==4.7.3 apache-airflow-providers-facebook==3.7.1 apache-airflow-providers-ftp==3.12.3 apache-airflow-providers-github==2.8.3 apache-airflow-providers-google==14.0.0 apache-airflow-providers-grpc==3.7.3
apache-airflow-providers-hashicorp==4.1.0 apache-airflow-providers-http==5.2.1 apache-airflow-providers-imap==3.8.3 apache-airflow-providers-influxdb==2.8.3 apache-airflow-providers-jdbc==5.0.1 apache-airflow-providers-jenkins==4.0.3 apache-airflow-providers-microsoft-azure==12.2.1 apache-airflow-providers-microsoft-mssql==4.2.1 apache-airflow-providers-microsoft-psrp==3.0.1
apache-airflow-providers-microsoft-winrm==3.9.1 apache-airflow-providers-mongo==5.0.2 apache-airflow-providers-mysql==6.2.0 apache-airflow-providers-neo4j==3.8.2 apache-airflow-providers-odbc==4.9.1 apache-airflow-providers-openai==1.5.2 apache-airflow-providers-openfaas==3.7.1 apache-airflow-providers-openlineage==2.1.1 apache-airflow-providers-opensearch==1.6.2
apache-airflow-providers-opsgenie==5.8.2 apache-airflow-providers-oracle==4.0.2 apache-airflow-providers-pagerduty==4.0.2 apache-airflow-providers-papermill==3.9.2 apache-airflow-providers-pgvector==1.4.1 apache-airflow-providers-pinecone==2.2.2 apache-airflow-providers-postgres==6.1.1 apache-airflow-providers-presto==5.8.2 apache-airflow-providers-qdrant==1.3.2
apache-airflow-providers-redis==4.0.2 apache-airflow-providers-salesforce==5.10.1 apache-airflow-providers-samba==4.9.2 apache-airflow-providers-segment==3.7.2 apache-airflow-providers-sendgrid==4.0.1 apache-airflow-providers-sftp==5.1.1 apache-airflow-providers-singularity==3.7.1 apache-airflow-providers-slack==9.0.2 apache-airflow-providers-smtp==2.0.1 apache-airflow-providers-snowflake==6.1.1
apache-airflow-providers-sqlite==4.0.1 apache-airflow-providers-ssh==4.0.1 apache-airflow-providers-standard==0.1.1 apache-airflow-providers-tableau==5.0.2 apache-airflow-providers-telegram==4.7.2 apache-airflow-providers-teradata==3.0.2 apache-airflow-providers-trino==6.1.0 apache-airflow-providers-vertica==4.0.1 apache-airflow-providers-weaviate==3.0.2 apache-airflow-providers-yandex==4.0.2
apache-airflow-providers-ydb==2.1.1 apache-airflow-providers-zendesk==4.9.1 --resolution highest
Using Python 3.12.9 environment at: /usr/local
Building apache-airflow @ file:///opt/airflow
Built apache-airflow @ file:///opt/airflow
× Failed to download and build `pyspark==3.5.5`
├─▶ Failed to extract archive
├─▶ failed to unpack
│ `/root/.cache/uv/sdists-v9/.tmpK4m8P8/pyspark-3.5.5/deps/jars/hadoop-client-runtime-3.3.4.jar`
├─▶ failed to unpack
│ `pyspark-3.5.5/deps/jars/hadoop-client-runtime-3.3.4.jar` into
│ `/root/.cache/uv/sdists-v9/.tmpK4m8P8/pyspark-3.5.5/deps/jars/hadoop-client-runtime-3.3.4.jar`
├─▶ error decoding response body
├─▶ request or response body error
├─▶ error reading a body from connection
╰─▶ stream closed because of a broken pipe
help: `pyspark` (v3.5.5) was included because
`apache-airflow-providers-apache-spark` (v5.0.1) depends on
`pyspark>=3.1.3`
Traceback (most recent call last):
File "/opt/airflow/scripts/in_container/run_generate_constraints.py", line 531, in <module>
generate_constraints()
File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/rich_click/rich_command.py", line 166, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/airflow/scripts/in_container/run_generate_constraints.py", line 517, in generate_constraints
generate_constraints_pypi_providers(config_params)
File "/opt/airflow/scripts/in_container/run_generate_constraints.py", line 396, in generate_constraints_pypi_providers
run_command(
File "/opt/airflow/scripts/in_container/in_container_utils.py", line 54, in run_command
result = subprocess.run(cmd, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/subprocess.py", line 573, in run
raise CalledProcessError(retcode, process.args,
shahar1
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working