-
Notifications
You must be signed in to change notification settings - Fork 106
[DO NOT MERGE] Tentative vfs-2.33.0 branch #405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
329bb48 to
ba702dc
Compare
|
Looks good! The And sorry about the At some stage, we should squash all the |
4dce453 to
ee0447c
Compare
aaed16d to
9e4057a
Compare
While using the reset --stdin feature on windows path added may have a \r at the end of the path that wasn't getting removed so didn't match the path in the index and wasn't reset. Signed-off-by: Kevin Willford <[email protected]>
Signed-off-by: Saeed Noursalehi <[email protected]>
Teach STATUS to optionally serialize the results of a status computation to a file. Teach STATUS to optionally read an existing serialization file and simply print the results, rather than actually scanning. This is intended for immediate status results on extremely large repos and assumes the use of a service/daemon to maintain a fresh current status snapshot. Signed-off-by: Jeff Hostetler <[email protected]>
Teach status serialization to take an optional pathname on
the command line to direct that cache data be written there
rather than to stdout. When used this way, normal status
results will still be written to stdout.
When no path is given, only binary serialization data is
written to stdout.
Usage:
git status --serialize[=<path>]
Signed-off-by: Jeff Hostetler <[email protected]>
This adds hard-coded call to GVFS.hooks.exe before and after each Git command runs. To make sure that this is only called on repositories cloned with GVFS, we test for the tell-tale .gvfs. Signed-off-by: Ben Peart <[email protected]>
Teach status deserialize code to reject status cache when printing in porcelain V2 and there are unresolved conflicts in the cache file. A follow-on task might extend the cache format to include this additiona data. See code for longer explanation. Signed-off-by: Jeff Hostetler <[email protected]>
Add trace2 region around read_object_process to collect time spent waiting for missing objects to be dynamically fetched. Signed-off-by: Jeff Hostetler <[email protected]>
Since we really want to be based on a `.vfs.*` tag, let's make sure that there was a new-enough one, i.e. one that agrees with the first three version numbers of the recorded default version. This prevents e.g. v2.22.0.vfs.0.<some-huge-number>.<commit> from being used when the current release train was not yet tagged. It is important to get the first three numbers of the version right because e.g. Scalar makes decisions depending on those (such as assuming that the `git maintenance` built-in is not available, even though it actually _is_ available). Signed-off-by: Johannes Schindelin <[email protected]>
Suggested by Ben Peart. Signed-off-by: Johannes Schindelin <[email protected]>
Fix "git status --deserialize" to correctly report both pathnames
for renames. Add a test case to verify.
A change was made upstream that added an additional "rename_status"
field to the "struct wt_status_change_data" structure. It is used
during the various print routines to decide if 2 pathnames need to
be printed.
5134ccd
wt-status.c: rename rename-related fields in wt_status_change_data
The fix here is to add that field to the status cache data.
Signed-off-by: Jeff Hostetler <[email protected]>
Add trace2 region and data events describing attempts to deserialize
status data using a status cache.
A category:status, label:deserialize region is pushed around the
deserialize code.
Deserialization results when reading from a file are:
category:status, path = <path>
category:status, polled = <number_of_attempts>
category:status, result = "ok" | "reject"
When reading from STDIN are:
category:status, path = "STDIN"
category:status, result = "ok" | "reject"
Status will fallback and run a normal status scan when a "reject"
is reported (unless "--deserialize-wait=fail").
If "ok" is reported, status was able to use the status cache and
avoid scanning the workdir.
Additionally, a cmd_mode is emitted for each step: collection,
deserialization, and serialization. For example, if deserialization
is attempted and fails and status falls back to actually computing
the status, a cmd_mode message containing "deserialize" is issued
and then a cmd_mode for "collect" is issued.
Also, if deserialization fails, a data message containing the
rejection reason is emitted.
Signed-off-by: Jeff Hostetler <[email protected]>
This header file will accumulate GVFS-specific definitions. Signed-off-by: Kevin Willford <[email protected]>
Signed-off-by: Alejandro Pauly <[email protected]>
Changes to the global or repo-local excludes files can change the results returned by "git status" for untracked files. Therefore, it is important that the exclude-file values used during serialization are still current at the time of deserialization. Teach "git status --serialize" to report metadata on the user's global exclude file (which defaults to "$XDG_HOME/git/ignore") and for the repo-local excludes file (which is in ".git/info/excludes"). Serialize will record the pathnames and mtimes for these files in the serialization header (next to the mtime data for the .git/index file). Teach "git status --deserialize" to validate this new metadata. If either exclude file has changed since the serialization-cache-file was written, then deserialize will reject the cache file and force a full/normal status run. Signed-off-by: Jeff Hostetler <[email protected]>
Add trace information around status serialization. Signed-off-by: Jeff Hostetler <[email protected]>
- include `scalar` - build *unsigned* .dmg & .pkg for target OS version 10.6 - upload artifacts to workflow
This adds support for releasing to Ubuntu repositories hosted at http://packages.microsoft.com/ (hosting location for Microsoft's official apt/yum repos). This allows users to install via apt-get on Hirsute/Bionic. Details to configure appropriate repos can be found here: https://docs.microsoft.com/en-us/windows-server/administration/Linux-Package-Repository-for-Microsoft-Software).
Update winget manifest
Commit 58634db ("rebase: Allow merge strategies to be used when rebasing", 2006-06-21) added the --merge option to git-rebase so that renames could be detected (at least when using the `recursive` merge backend). However, git-am -3 gained that same ability in commit 579c9bb ("Use merge-recursive in git-am -3.", 2006-12-28). As such, the comment about being able to detect renames is not particularly noteworthy. Remove it. While tweaking this description, add a quick comment about when --merge became the default. Signed-off-by: Elijah Newren <[email protected]>
- include `scalar` - build & upload unsigned .deb package
Updating `README.md` with instructions for `apt-get` setup and install for Ubuntu Bionic + Hirsute.
Upstream, a20f704 (add: warn when asked to update SKIP_WORKTREE entries, 2021-04-08) modified how 'git add <pathspec>' works with cache entries marked with the SKIP_WORKTREE bit. The intention is to prevent a user from accidentally adding a path that is outside their sparse-checkout definition but somehow matches an existing index entry. A similar change for 'git rm' happened in d5f4b82 (rm: honor sparse checkout patterns, 2021-04-08). This breaks when using the virtual filesystem in VFS for Git. It is rare, but we could be in a scenario where the user has staged a change and then the file is projected away. If the user re-adds the file, then this warning causes the command to fail with the advise message. Disable this logic when core_virtualfilesystem is enabled. Signed-off-by: Derrick Stolee <[email protected]>
Add release-apt-get workflow
Signed-off-by: Elijah Newren <[email protected]>
- sign using Azure-stored certificates & client - sign on Windows agent via python script - job skipped if credentials for accessing certificate aren't present
We are currently using 'Release tag' to describe the required input to our `workflow_dispatch` trigger. This is inaccurate - this field actually requires a 'Release id', which I discovered when testing GCM Core `apt-get` deployments yesterday. Updating so that the description doesn't confuse folks running the workflow for a release that is not 'latest'.
Upstream, a20f704 (add: warn when asked to update SKIP_WORKTREE entries, 04-08-2021) modified how 'git add <pathspec>' works with cache entries marked with the SKIP_WORKTREE bit. The intention is to prevent a user from accidentally adding a path that is outside their sparse-checkout definition but somehow matches an existing index entry. This breaks when using the virtual filesystem in VFS for Git. It is rare, but we could be in a scenario where the user has staged a change and then the file is projected away. If the user re-adds the file, then this warning causes the command to fail with the advise message. Disable this logic when core_virtualfilesystem is enabled. This should allow the VFS for Git functional tests to pass (at least the ones in the default run). I'll create a `-pr` installer build to check before merging this.
There were two locations in the code that referred to 'merge-recursive' but which were also applicable to 'merge-ort'. Update them to more general wording. Signed-off-by: Elijah Newren <[email protected]>
- create release & uploads artifact using Octokit - use job "if" condition to handle uploading signed *or* unsigned .deb
Add instructions for `apt-get` install to `README`
There are a few reasons to switch the default:
* Correctness
* Extensibility
* Performance
I'll provide some summaries about each.
=== Correctness ===
The original impetus for a new merge backend was to fix issues that were
difficult to fix within recursive's design. The success with this goal
is perhaps most easily demonstrated by running the following:
$ git grep -2 KNOWN_FAILURE t/ | grep -A 4 GIT_TEST_MERGE_ALGORITHM
$ git grep test_expect_merge_algorithm.failure.success t/
$ git grep test_expect_merge_algorithm.success.failure t/
In order, these greps show:
* Seven sets of submodule tests (10 total tests) that fail with
recursive but succeed with ort
* 22 other tests that fail with recursive, but succeed with ort
* 0 tests that pass with recursive, but fail with ort
=== Extensibility ===
Being able to perform merges without touching the working tree or index
makes it possible to create new features that were difficult with the
old backend:
* Merging, cherry-picking, rebasing, reverting in bare repositories...
or just on branches that aren't checked out.
* `git diff AUTO_MERGE` -- ability to see what changes the user has
made to resolve conflicts so far (see commit 5291828 ("merge-ort:
write $GIT_DIR/AUTO_MERGE whenever we hit a conflict", 2021-03-20)
* A --remerge-diff option for log/show, used to show diffs for merges
that display the difference between what an automatic merge would
have created and what was recorded in the merge. (This option will
often result in an empty diff because many merges are clean, but for
the non-clean ones it will show how conflicts were fixed including
the removal of conflict markers, and also show additional changes
made outside of conflict regions to e.g. fix semantic conflicts.)
* A --remerge-diff-only option for log/show, similar to --remerge-diff
but also showing how cherry-picks or reverts differed from what an
automatic cherry-pick or revert would provide.
The last three have been implemented already (though only one has been
submitted upstream so far; the others were waiting for performance work
to complete), and I still plan to implement the first one.
=== Performance ===
I'll quote from the summary of my final optimization for merge-ort
(while fixing the testcase name from 'no-renames' to 'few-renames'):
Timings
Infinite
merge- merge- Parallelism
recursive recursive of rename merge-ort
v2.30.0 current detection current
---------- --------- ----------- ---------
few-renames: 18.912 s 18.030 s 11.699 s 198.3 ms
mega-renames: 5964.031 s 361.281 s 203.886 s 661.8 ms
just-one-mega: 149.583 s 11.009 s 7.553 s 264.6 ms
Speedup factors
Infinite
merge- merge- Parallelism
recursive recursive of rename
v2.30.0 current detection merge-ort
---------- --------- ----------- ---------
few-renames: 1 1.05 1.6 95
mega-renames: 1 16.5 29 9012
just-one-mega: 1 13.6 20 565
And, for partial clone users:
Factor reduction in number of objects needed
Infinite
merge- merge- Parallelism
recursive recursive of rename
v2.30.0 current detection merge-ort
---------- --------- ----------- ---------
mega-renames: 1 1 1 181.3
Signed-off-by: Elijah Newren <[email protected]>
Implement workflow to create GitHub release with attached `git` installers
Make it clear that `ort` is the default merge strategy now rather than `recursive`, including moving `ort` to the front of the list of merge strategies. Signed-off-by: Elijah Newren <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
The 'ort' strategy is a new algorithm to replace the 'recursive' merge strategy. I've been reviewing some of the performance patches upstream, many of which are already in Git 2.32.0 (more coming in 2.33.0) and even with the ones already included, it is a clear performance win for our large repos. I tested on the Office monorepo and consistently saw merge times in the 5-6 second range. With the 'recursive' strategy, these would range from 7-20 seconds. My tests reproduced merges found within the commit history, and the ones that succeeded without conflicts matched the committed changes. There were even a few where the 'recursive' strategy did not resolve to the committed change, but the 'ort' version did (probably because of better rename detection). Not only is this a beneficial performance change for our users across `microsoft/git`, it will be a critical step to allowing `git merge` to work quickly with sparse index. In my testing of a prototype, I was able to get `git merge` commands with sparse index and the 'ort' strategy down to 0.5-1.5 seconds in most cases. (Cases with a merge conflict outside of the sparse-checkout definition jumped back up to the 6-7 second range, which is expected, and should be rare.) cc: @newren for awareness. Thanks for the patches! These were applied from those sent to the list via git#1055.
Copy the `index_state->dir_hash` back to the real istate after expanding a sparse index. A crash was observed in `git status` during some hashmap lookups with corrupted hashmap entries. During an index expansion, new cache-entries are added to the `index_state->name_hash` and the `dir_hash` in a temporary `index_state` variable `full`. However, only the `name_hash` hashmap from this temp variable was copied back into the real `istate` variable. The original copy of the `dir_hash` was incorrectly preserved. If the table in the `full->dir_hash` hashmap were realloced, the stale version (in `istate`) would be corrupted. Signed-off-by: Jeff Hostetler <[email protected]>
56b7c85 to
322ee20
Compare
Without this change, the mere conflicts start creating <path>~cruft files on-disk, which is caught by the VFS for Git functional tests. Signed-off-by: Derrick Stolee <[email protected]>
815f10a to
24f9d5c
Compare
|
Just wanted to point out the late addition of 24f9d5c, which is necessary for the ORT strategy to work with VFS for Git. I'm going to update our internal documentation about updating things for VFS for Git, and watching out for |
vdye
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right to ask.
|
Here is the first attempt of rebasing the
vfs-2.32.0onto thev2.33.0-rc0tag. There were morefixup!commits than usual here, as well as some interesting merge conflicts with upstream, or locally:As pointed out in [Low Priority] React to ds/write-index-with-hashfile-api #389, we skip computing the trailing hash of the index in VFS for Git. Upstream changes to the index writing code (@derrickstolee's fault) use the
hashfileAPI now, but that caused a conflict with this feature. Resolved by allowing thehashfileAPI to remove the hashing part and just be a buffering API.An upstream change to the loose object cache structure required rewriting how we were adding objects to the cache when downloading loose objects in the
gvfs-helper.I rebased Handle Scalar enlistments without
srcsubdirectory #402 and Make 'ort' the default merge strategy #404. Thanks @vdye for squashing in the changes from Handle Scalar enlistments withoutsrcsubdirectory #402!I updated co-authorship to recognize @vdye's work in this area.
Here is the range-diff as of the rebase onto
v2.33.0.windows.1: