-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-15691: [Dev] Update archery to work with either master or main as default branch #14033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format? or See also: |
a0429fa to
60883df
Compare
|
This change is ready for review and ready to be considered for running the CI workflows that are awaiting approval. Thank you for your help on this! |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lafiona . Here are a number of comments and suggestions.
dev/archery/archery/crossbow/cli.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid doing this at module import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you for this suggestion, I've moved the code to set the default within the function.
dev/archery/archery/crossbow/core.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... can we instead default to "master" for the time being?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this suggestion, instead of erroring, I will set the default to "master" while printing a warning containing the same details as the error message above. It could be helpful for the user to know that the two heuristics that we're using are not working in their environment and it is falling back to a hard-coded default. We've also added a Jira ticket (ARROW-18011) to modify the default value from "master" to "main" after the migration.
|
Also @raulcd FYI |
|
Thanks @pitrou for your code review, I've addressed your feedback in the latest commits. |
raulcd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @lafiona for the PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my understanding this is eventually used on the install_dask.sh and install_pandas.sh scripts:
https://github.com/apache/arrow/blob/master/ci/scripts/install_dask.sh#L29
and
https://github.com/apache/arrow/blob/master/ci/scripts/install_pandas.sh#L38
I think we will also have to update these if's otherwise. I've tested to run the archery command for pandas from current master:
$ PANDAS=master archery docker run --no-leaf-cache conda-python-pandas
...
=> => # Collecting git+https://github.com/pandas-dev/pandas.git
=> => # Cloning https://github.com/pandas-dev/pandas.git to /tmp/pip-req-build-op7u80ho
=> => # Running command git clone --filter=blob:none --quiet https://github.com/pandas-dev/pandas.git /tmp/pip-req-build-op7u80ho
=> => # Resolved https://github.com/pandas-dev/pandas.git to commit 67d75f3715ed8bfb19edc6d99d16f39daba6e461
=> => # Preparing metadata (pyproject.toml): started
and from your branch:
$ PANDAS=main archery docker run --no-leaf-cache conda-python-pandas
...
#0 33.87 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#0 34.39 ERROR: Could not find a version that satisfies the requirement pandas==main (from versions: 0.1, 0.2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.21.1, 0.22.0, 0.23.0, 0.23.1, 0.23.2, 0.23.3, 0.23.4, 0.24.0, 0.24.1, 0.24.2, 0.25.0, 0.25.1, 0.25.2, 0.25.3, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.4.0rc0, 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.5.0rc0, 1.5.0)
#0 34.39 ERROR: No matching distribution found for pandas==main
with these changes we would stop testing the latest git version and try to find main from: pip install pandas==${pandas} instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for noticing this and bringing it up!
I attempted to fix this by modifying the install scripts to use Git to compute the current default branch, however, at the time of running the scripts, the working directory is not within a git repository.
Since the flag value that is used is not dependent on the actual default branch name of the Arrow or Pandas repositories, it could be helpful to change the flag value to default, or one of the following:
develupstream_devel
If this sounds good to you, I can make this change and also update the documentation page at: https://arrow.apache.org/docs/developers/continuous_integration/docker.html#usage to reflect the new flag value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your approach sounds good to me, as this name refers to the upstream development version of the related project (pandas, dask) it makes more sense to use a flag like upstream_devel instead of master or main to differentiate this vs the latest release, the nightly build or a specific version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @raulcd! I've made this change and qualified by running the sample command from your original comment and running the affected tests in dev/archery/archery/docker/tests/test_docker.py.
I've also made the changes to the documentation and qualified by performing a sphinx build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the late reply. We also have to update the tasks.yml definition to make crossbow run the correct task versions, updating master for upstream_devel on the following:
https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml#L1495
and
https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml#L1515
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries and thank you! I've made this update.
dev/archery/archery/docker/cli.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also update this to upstream_devel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have updated this sample command too.
raulcd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this changes on my fork and have triggered a bunch of nightly jobs and all seems to be working as expected (raulcd#13 (comment)) the failures are current failures on our nightlies. I am happy with it. I have investigated around archery and I can't seem to find other places where we are using hardcoded "master", so I am +1 on it.
@kou @kszucs this is a subtask for the bigger task: https://issues.apache.org/jira/browse/ARROW-15689 it seems to advance towards migrating from master to main, is there anything else you would expect on this?
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
…n archery/archery/crossbow/cli.py
'master' for now in dev/archery/archery/crossbow/core.py
…termined, default to 'master' in dev/archery/archery/release/core.py
…uted by Git rather than hard-coded defaults
… and Dask to 'upstream_devel', and update the documentation.
…ery/docker/cli.py
Co-authored-by: Sutou Kouhei <[email protected]>
…pace. Co-authored-by: Sutou Kouhei <[email protected]>
…sage. Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
|
@github-actions crossbow submit example-cpp-minimal-build-static |
|
Revision: 374f540 Submitted crossbow builds: ursacomputing/crossbow @ actions-05d3f08a75
|
|
Thank you @lafiona ! |
|
Benchmark runs are scheduled for baseline = 21564cf and contender = 8861c0c. 8861c0c is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Overview
The goal of this pull request is to update
archeryto work with a repository default branch namedmasterormain, as part of the effort to rename the Apache Arrow repository's default branch tomain. The parent Jira ticket can be found here.Implementation
archery,crossbow, anddockercommand line interface code to reference the mainline development branch (default git branch) generically.masterbranch.crossbowbenchmarking examples to generically specify the<default-branch>rather than a hard-coded value..github/workflows/integration.yml, add an environment variableDEFAULT_BRANCHto thearcherycommand in the "Execute Docker Build" step, so thatarcherycan reliably access the default branch value..github/workflows/archery.yml, add an environment variableDEFAULT_BRANCHfor all steps. This environment variable was already used by theGit Fixupstep. It will also be used by theArchery Unittestsstep.default_branch_name, to theRepoclass indev/archery/archery/crossbow/core.pyfor computing the default branch name.DEFAULT_BRANCHenvironment variable, takes precedent in determining the default branch name (this is for overriding the git-based heuristic and qualifying in CI).pygit2is used to get the default branch name via the Apache Arrow repository'soriginremoteHEADreference. This is a heuristic, but in most cases, theHEADreference of the remote points to the default branch.Releaseclass indev/archery/archery/release/core.pyfor computing the default branch name. Similar to thedefault_branch_nameproperty forRepoinarchery/archery/crossbow/core.py:DEFAULT_BRANCHenvironment variable, takes precedent in determining the default branch name (this is for qualifying in CI).GitPythonis used to get the default branch name via the Apache Arrow repository'soriginremoteHEADreference.PANDASandDASKDocker Build Parameter value for indicating the upstream development branch toupstream_devel.Out of scope:
masterin the test fixtures files indev/archery/archery/test/fixtures. It appears that the data only refers to external repositories, such asursa-labs/ursabot, which currently usesmaster, so these instances were not modified.Testing
archeryandcrossbowcommands in local clones of both themathworks/arrowandapache/arrowrepositories.releasecomponent, but thereleasetests pass in CI.archery dockercommand after setting thePANDASenvironment variable to confirm that the correct version of Pandas is used.Future Directions
masterandmain(ARROW-17777)ARCHERY_DEFAULT_BRANCH, or from the repository's remote head reference. (ARROW-18011)Notes
Thank you @kevingurney for your help with this pull request!