Skip to content
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
9ff62d8
Introduce AWS Executors to the Amazon provider package. The EcsFargat…
aelzeiny Apr 10, 2023
226e6ff
Reducing scope to just ECS, following up with Batch executor later
ferruzzi Jun 20, 2023
aca382a
Initial static checks
ferruzzi Jun 20, 2023
5e02868
Convert provided unit tests from UnitTest to pytest
ferruzzi Jun 20, 2023
22274ea
Docstring fixes
ferruzzi Jun 22, 2023
ca2b7e1
Nested ECS Executor to allow future executors to be added easier
ferruzzi Jun 26, 2023
62b14cc
Remove references to "Fargate" where it wasn't necessary and correct …
ferruzzi Jun 27, 2023
b7457c0
Rewrite TestEcsTaskCollection Tests
ferruzzi Jun 28, 2023
575387c
Remove botocore helpers
ferruzzi Jul 25, 2023
20a0c7c
Break anything not explicitly Executor related into other modules
ferruzzi Jul 25, 2023
a5eef70
Add env var to configure logging for container executors
o-nikolas Jul 26, 2023
455296a
Changes to the executor config files
ferruzzi Jul 7, 2023
3bbc42a
Add unit test for config default values
ferruzzi Aug 10, 2023
99b5d7c
Optimize import speed of ECS Executor
o-nikolas Aug 14, 2023
466bcc7
First draft for ECS Executor README
o-nikolas Aug 10, 2023
7847c85
Improve unit test code coverage up to 98%
ferruzzi Aug 16, 2023
dca2837
Convert to using EcsHook
ferruzzi Aug 16, 2023
08089ea
Add a link to the airflow doc about setting config options
o-nikolas Aug 16, 2023
d0cdc26
Fix broken subnet config test
o-nikolas Aug 17, 2023
3e6a854
Fix ecs attribute on ECS Executor
o-nikolas Aug 17, 2023
94be3b6
Replace mock executor in unit tests
ferruzzi Aug 21, 2023
50e5404
fix typo in docstring
ferruzzi Aug 22, 2023
dd700bb
Add config options to the readme
ferruzzi Aug 23, 2023
7d02806
Add a Dockerfile to build the ECS image that will run Airflow tasks
syedahsn Aug 9, 2023
23db81f
Fixes for README Dockerfile static check failures
o-nikolas Aug 28, 2023
6dc7144
Update the order the options are applied: default < template < explicit
ferruzzi Aug 22, 2023
de69f17
Catch all exceptions in methods that interface with the scheduler
o-nikolas Aug 28, 2023
4b8365e
Add guide to set up ECS Executors
syedahsn Aug 21, 2023
5d6d233
Fixes to ecs executor config handling
o-nikolas Aug 30, 2023
5b02b5b
Add instructions on how to check Python version for Airflow image.
syedahsn Sep 5, 2023
16a1b82
Add intro section to the README
o-nikolas Aug 31, 2023
3a8895f
Setup guide fixes
ferruzzi Sep 6, 2023
2ced018
Add performance section to the README
o-nikolas Sep 8, 2023
c56c096
Update boto user agent to detect Executors as Callers
o-nikolas Sep 7, 2023
3766091
Allow the executor to fail tasks if they are consistently failing
syedahsn Sep 6, 2023
3c1d54e
fix typo
ferruzzi Sep 12, 2023
2d7d1fd
fix whitespace issues
ferruzzi Sep 12, 2023
4c553af
Allow python dependencies to be installed on the image
syedahsn Sep 11, 2023
29ff7eb
Fix whitespaces in ECS executor test module
o-nikolas Sep 14, 2023
603367a
Fixes for build failures
o-nikolas Sep 14, 2023
162b13b
Update airflow/providers/amazon/aws/executors/ecs/Setup_guide.md
o-nikolas Sep 15, 2023
2a3119f
Update airflow/providers/amazon/aws/executors/ecs/ecs_executor_config.py
o-nikolas Sep 25, 2023
3de720f
The max retries config is not a ECS run task kwarg
o-nikolas Sep 26, 2023
5fb00f0
Dockerfile bugfix
ferruzzi Sep 26, 2023
d2d8db3
Rename region to region_name
ferruzzi Sep 25, 2023
dcca29e
Update the version_added fields in config.yml
ferruzzi Oct 5, 2023
6c8cb37
fix string
ferruzzi Oct 10, 2023
0e3b9ea
Suggested changes from reviewers
syedahsn Oct 11, 2023
008695b
Merge branch 'main' into aws_executors/ecs
o-nikolas Oct 12, 2023
40f0458
Use inflection.camelize instead of custom helper
ferruzzi Oct 13, 2023
7abbb97
Move config into provider.yaml
ferruzzi Sep 25, 2023
d507073
Changes for converting docs to rs
syedahsn Oct 10, 2023
e25b084
Add top level Executors index page to amazon docs
o-nikolas Oct 20, 2023
5d42a93
Update version_added
ferruzzi Oct 20, 2023
8945379
Merge branch 'main' into aws_executors/ecs
ferruzzi Oct 21, 2023
b939730
typos
ferruzzi Oct 23, 2023
a8cdf1a
Update doc to clarify consistent configuration
o-nikolas Oct 23, 2023
bcaf9de
Fix config loading
ferruzzi Oct 24, 2023
59e1f6f
Merge branch 'main' into aws_executors/ecs
ferruzzi Oct 24, 2023
e5409c5
Missed a region --> region_name refactor in base aws hook tests
o-nikolas Oct 24, 2023
bdfd8be
Remove Setup_guide readme, the contents have been moved to a docs rst
o-nikolas Oct 24, 2023
5e44514
No ecs executor configs are sensitive, exclude auto detected kwargs conf
o-nikolas Oct 24, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions airflow/cli/commands/task_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
from airflow.models.operator import needs_expansion
from airflow.models.param import ParamsDict
from airflow.models.taskinstance import TaskReturnCode
from airflow.settings import IS_K8S_EXECUTOR_POD
from airflow.settings import IS_EXECUTOR_CONTAINER, IS_K8S_EXECUTOR_POD
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New mechanism to enable task log output to show up in task logs for containerized executors. A previous mechanism existed for K8s only (a source of executor coupling!) a new one was added, but K8s was not migrated as part of this PR to keep the size down. More changes in settings.py and logging_mixin.py below

from airflow.ti_deps.dep_context import DepContext
from airflow.ti_deps.dependencies_deps import SCHEDULER_QUEUED_DEPS
from airflow.typing_compat import Literal
Expand Down Expand Up @@ -325,7 +325,7 @@ def _move_task_handlers_to_root(ti: TaskInstance) -> Generator[None, None, None]
console_handler = next((h for h in root_logger.handlers if h.name == "console"), None)
with LoggerMutationHelper(root_logger), LoggerMutationHelper(ti.log) as task_helper:
task_helper.move(root_logger)
if IS_K8S_EXECUTOR_POD:
if IS_K8S_EXECUTOR_POD or IS_EXECUTOR_CONTAINER:
if console_handler and console_handler not in root_logger.handlers:
root_logger.addHandler(console_handler)
yield
Expand Down
16 changes: 16 additions & 0 deletions airflow/providers/amazon/aws/config_templates/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
131 changes: 131 additions & 0 deletions airflow/providers/amazon/aws/config_templates/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

---

aws_ecs_executor:
description: |
This section only applies if you are using the AwsEcsExecutor in
Airflow's ``[core]`` configuration.
For more information on any of these execution parameters, see the link below:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs/client/run_task.html
For boto3 credential management, see
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
options:
conn_id:
description: |
The Airflow connection (i.e. credentials) used by the ECS executor to make API calls to AWS ECS.
version_added: "2.8"
type: string
example: "aws_default"
default: "aws_default"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better set default to None, so in this case it will explicitly use boto3 creds strategy, e.g. ECS Task Role / Execution Task Role?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws_default will already fallback to boto3 strategy, no? This is the default we use in Operators, Sensors, etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it have a fallback, I think this is one of the dirtiest fallback. I guess only Amazon Provider has this behavior with connections: when Airflow can’t lookup connection from Metabase and secrets backend it just fallback to boto3 credentials strategy.

And there is cost for this fallback:

  1. need to lookup every time for non-existed connection
  2. It could be some misconfiguration with secrets backend, and fallback might hide this
  3. Annoying warning message

In case of None it use default boto3 strategy without even touch Connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I have no objections to keep this value as aws_default by default.

But since AWS executor intend to use in AWS Environment it might be a good idea to switch by default to boto3.

If users have default option to use single connection to everything: logging, operators, executor(s) in most case it would end up with IAM role which allow everything 😢

region:
description: |
The name of the AWS Region where Amazon ECS is configured. Required.
version_added: "2.8"
type: string
example: "us-east-1"
default: ~
assign_public_ip:
description: |
Whether to assign a public IP address to the containers launched by the ECS executor.
For more info see url to Boto3 docs above.
version_added: "2.8"
type: boolean
example: "True"
default: "False"
cluster:
description: |
Name of the Amazon ECS Cluster. Required.
version_added: "2.8"
type: string
example: "ecs_executor_cluster"
default: ~
container_name:
description: |
Name of the container that will be used to execute Airflow tasks via the ECS executor.
The container should be specified in the ECS Task Definition and will receive an airflow
CLI command as an additional parameter to its entrypoint. For more info see url to Boto3
docs above. Required.
version_added: "2.8"
type: string
example: "ecs_executor_container"
default: ~
launch_type:
description: |
Launch type can either be 'FARGATE' OR 'EC2'. For more info see url to
Boto3 docs above.
If the launch type is EC2, the executor will attempt to place tasks on
empty EC2 instances. If there are no EC2 instances available, no task
is placed and this function will be called again in the next heart-beat.
If the launch type is FARGATE, this will run the tasks on new AWS Fargate
instances.
version_added: "2.8"
type: string
example: "FARGATE"
default: "FARGATE"
platform_version:
description: |
The platform version the task uses. A platform version is only specified
for tasks hosted on Fargate. If one isn't specified, the LATEST platform
version is used.
version_added: "2.8"
type: string
example: "1.4.0"
default: "LATEST"
security_groups:
description: |
The comma-seperated IDs of the security groups associated with the task. If you
don't specify a security group, the default security group for the VPC is used.
There's a limit of 5 security groups. For more info see url to Boto3 docs above.
version_added: "2.8"
type: string
example: "sg-XXXX,sg-YYYY"
default: ~
subnets:
description: |
The comma-separated IDs of the subnets associated with the task or service.
There's a limit of 16 subnets. For more info see url to Boto3 docs above.
version_added: "2.8"
type: string
example: "subnet-XXXXXXXX,subnet-YYYYYYYY"
default: ~
task_definition:
description: |
The family and revision (family:revision) or full ARN of the task definition
to run. If a revision isn't specified, the latest ACTIVE revision is used.
For more info see url to Boto3 docs above.
version_added: "2.8"
type: string
example: executor_task_definition:LATEST
default: ~
max_run_task_attempts:
description: |
The maximum number of times the Ecs Executor should attempt to run a task.
version_added: "2.8"
type: int
example: "3"
default: "3"
run_task_kwargs:
description: |
A JSON string containing arguments to provide the ECS `run_task` API (see url above).
version_added: "2.8"
type: string
example: '{"tags": {"key": "schema", "value": "1.0"}}'
default: ~
16 changes: 16 additions & 0 deletions airflow/providers/amazon/aws/executors/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
86 changes: 86 additions & 0 deletions airflow/providers/amazon/aws/executors/ecs/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# hadolint ignore=DL3007
FROM apache/airflow:latest
USER root
RUN apt-get update \
&& apt-get install -y --no-install-recommends unzip \
# The below helps to keep the image size down
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
RUN unzip awscliv2.zip && ./aws/install

# Add a script to run the aws s3 sync command when the container is run
COPY <<"EOF" /entrypoint.sh
#!/bin/bash

echo "Downloading DAGs from S3 bucket"
aws s3 sync "$S3_URL" "$CONTAINER_DAG_PATH"

exec "$@"
EOF

RUN chmod +x /entrypoint.sh

USER airflow

## Installing Python Dependencies
# Python dependencies can be installed by providing a requirements.txt.
# If the file is in a different location, use the requirements_path build argument to specify
# the file path.
ARG requirements_path=./requirements.txt
ENV REQUIREMENTS_PATH=$requirements_path

# Uncomment the two lines below to copy the requirements.txt file to the container, and
# install the dependencies.
# COPY --chown=airflow:root $REQUIREMENTS_PATH /opt/airflow/requirements.txt
# RUN pip install --no-cache-dir -r /opt/airflow/requirements.txt


## AWS Authentication
# The image requires access to AWS services. This Dockerfile supports 2 ways to authenticate with AWS.
# The first is using build arguments where you can provide the AWS credentials as arguments
# passed when building the image. The other option is to copy the ~/.aws folder to the container,
# and authenticate using the credentials in that folder.
# If you would like to use an alternative method of authentication, feel free to make the
# necessary changes to this file.

# Use these arguments to provide AWS authentication information
ARG aws_access_key_id
ARG aws_secret_access_key
ARG aws_default_region
ARG aws_session_token

ENV AWS_ACCESS_KEY_ID=$aws_access_key_id
ENV AWS_SECRET_ACCESS_KEY=$aws_secret_access_key
ENV AWS_DEFAULT_REGION=$aws_default_region
ENV AWS_SESSION_TOKEN=$aws_session_token

# Uncomment the line below to authenticate to AWS using the ~/.aws folder
# Keep in mind the docker build context when placing .aws folder
# COPY --chown=airflow:root ./.aws /home/airflow/.aws


## Loading DAGs
# This Dockerfile supports 2 ways to load DAGs onto the container.
# One is to upload all the DAGs onto an S3 bucket, and then
# download them onto the container. The other is to copy a local folder with
# the DAGs onto the container.
# If you would like to use an alternative method of loading DAGs, feel free to make the
# necessary changes to this file.

ARG host_dag_path=./dags
ENV HOST_DAG_PATH=$host_dag_path
# Set host_dag_path to the path of the DAGs on the host
# COPY --chown=airflow:root $HOST_DAG_PATH $CONTAINER_DAG_PATH


# If using S3 bucket as source of DAGs, uncommenting the next ENTRYPOINT command will overwrite this one.
ENTRYPOINT []

# Use these arguments to load DAGs onto the container from S3
ARG s3_url
ENV S3_URL=$s3_url
ARG container_dag_path=/opt/airflow/dags
ENV CONTAINER_DAG_PATH=$container_dag_path
# Uncomment the line if using S3 bucket as the source of DAGs
# ENTRYPOINT ["/entrypoint.sh"]
Loading