Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Mar 25, 2023

The #30255 introduced "JobRunner" concept and decoupled the job logic
from the ORM polymorphic *Job objects. The change was implemented
in the way to minimise the review effort needed, so it avoided renaming
the modules for the runners (from _job to _job_runner).

Also BaseJob lost its "polymorphism" properties so the package, and class name
can be renamed to simply job.

This PR completes the JobRunner concept introduction by applying the
renames.

Closes: #30296


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues labels Mar 25, 2023
@potiuk potiuk changed the title Change module names for job runners Rename JobRunner modules to *_job_runner Mar 25, 2023
@potiuk
Copy link
Member Author

potiuk commented Mar 25, 2023

Based on #30255, so please check only the last commit.

The apache#30255 introduced "JobRunner" concept and decoupled the job logic
from the ORM polymorphic *Job objects. The change was implemented
in the way to minimise the review effort needed, so it avoided renaming
the modules for the runners (from `_job` to `_job_runner`).

Also BaseJob lost its "polymorphism" properties so the package, and class name
can be renamed to simply job.

This PR completes the JobRunner concept introduction by applying the
renames.

Closes: apache#30296
@potiuk potiuk force-pushed the change-module-names-for-job-runners branch from 1db3239 to e785922 Compare April 10, 2023 20:50
@potiuk potiuk merged commit a1b99fe into apache:main Apr 10, 2023
@potiuk potiuk deleted the change-module-names-for-job-runners branch April 10, 2023 21:33
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 11, 2023
potiuk added a commit to potiuk/airflow that referenced this pull request Apr 11, 2023
This is the final step of decoupling of the job runner from ORM
based BaseJob. After this change, finally we rich the state that
the BaseJob is just a state of the Job being run, but all
the logic is kept in separate "JobRunner" entity which just
keeps the reference to the job. Also it makes sure that
job in each runner is defined as appropriate for each job type:

* SchedulerJobRunner, BackfillJobRunner can only use BaseJob
* DagProcessorJobRunner, TriggererJobRunner and especially the
  LocalTaskJobRunner can keep both BaseJob and it's Pydantic
  BaseJobPydantic representation - for AIP-44 usage.

The highlights of this change:

* Job does not have job_runner reference any more
* Job is a mandatory parameter when creating each JobRunner
* run_job method takes as parameter the job (i.e. where the state
  of the job is called) and executor_callable - i.e. the method
  to run when the job gets executed
* heartbeat callback is also passed a generic callable in order
  to execute the post-heartbeat operation of each of the job
  type
* there is no more need to specify job_type when you create
  BaseJob, the job gets its type by a simply creating a runner
  with the job

This is the final stage of refactoring that was split into
reviewable stages: apache#30255 -> apache#30302 -> apache#30308 -> this PR.

Closes: apache#30325
potiuk added a commit that referenced this pull request Apr 11, 2023
This is the final step of decoupling of the job runner from ORM
based BaseJob. After this change, finally we rich the state that
the BaseJob is just a state of the Job being run, but all
the logic is kept in separate "JobRunner" entity which just
keeps the reference to the job. Also it makes sure that
job in each runner is defined as appropriate for each job type:

* SchedulerJobRunner, BackfillJobRunner can only use BaseJob
* DagProcessorJobRunner, TriggererJobRunner and especially the
  LocalTaskJobRunner can keep both BaseJob and it's Pydantic
  BaseJobPydantic representation - for AIP-44 usage.

The highlights of this change:

* Job does not have job_runner reference any more
* Job is a mandatory parameter when creating each JobRunner
* run_job method takes as parameter the job (i.e. where the state
  of the job is called) and executor_callable - i.e. the method
  to run when the job gets executed
* heartbeat callback is also passed a generic callable in order
  to execute the post-heartbeat operation of each of the job
  type
* there is no more need to specify job_type when you create
  BaseJob, the job gets its type by a simply creating a runner
  with the job

This is the final stage of refactoring that was split into
reviewable stages: #30255 -> #30302 -> #30308 -> this PR.

Closes: #30325
potiuk added a commit to potiuk/airflow that referenced this pull request May 14, 2023
The change #apache#30302 split Job from JobRunner, but it missed the fact
that SchedulerJob had a special case of checking the threshold -
instead of using the standard grace multiplier, it used whatever
has been defined in the `scheduler_helth_check_threshold`. The
`is_alive` method in SchedulerJobRunner has remained unused, and
default 2.1 grace multiplier has been used for both /health
endpoint and `airflow jobs check`.

This PR brings the exception for SchedulerJob back and clarifies
that the same treshold is also used for airflow jobs check in
the documentation.

Fixes: apache#31200
potiuk added a commit that referenced this pull request May 15, 2023
The change ##30302 split Job from JobRunner, but it missed the fact
that SchedulerJob had a special case of checking the threshold -
instead of using the standard grace multiplier, it used whatever
has been defined in the `scheduler_helth_check_threshold`. The
`is_alive` method in SchedulerJobRunner has remained unused, and
default 2.1 grace multiplier has been used for both /health
endpoint and `airflow jobs check`.

This PR brings the exception for SchedulerJob back and clarifies
that the same treshold is also used for airflow jobs check in
the documentation.

Fixes: #31200
ephraimbuddy pushed a commit that referenced this pull request May 15, 2023
The change ##30302 split Job from JobRunner, but it missed the fact
that SchedulerJob had a special case of checking the threshold -
instead of using the standard grace multiplier, it used whatever
has been defined in the `scheduler_helth_check_threshold`. The
`is_alive` method in SchedulerJobRunner has remained unused, and
default 2.1 grace multiplier has been used for both /health
endpoint and `airflow jobs check`.

This PR brings the exception for SchedulerJob back and clarifies
that the same treshold is also used for airflow jobs check in
the documentation.

Fixes: #31200
(cherry picked from commit f366d95)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AIP-44 Airflow Internal API area:API Airflow's REST/HTTP API area:CLI area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

AIP-44 Rename Job Runner modules

6 participants