Skip to content

Conversation

@dscho
Copy link
Member

@dscho dscho commented Jul 27, 2021

tl;dr: This series contributes the core part of the Scalar command to the Git project. This command provides a convenient way to clone/initialize very large repositories (think: monorepos).

Note: This patch series' focus is entirely on Scalar, on choosing sensible defaults and offering a delightful user experience around working with monorepos, and not about changing any existing paradigms for contrib/ (even if catching up on the mail thread is likely to give interested readers that false impression).

Changes since v9:

  • The patches to build Scalar and run its tests as part of Git's CI/PR, have been dropped because a recent unrelated patch series does not interact well with them.

Changes since v8:

  • The rebase on top of v2.34.0, which changed the default merge strategy to ORT, should have changed the default for merge.renames to true. This is now the case.
  • Accommodate preemptively for ab/ci-updates which invalidates assumptions made by this patch series that would still hold true with v2.34.0 but are no longer valid in seen and would trigger CI build breakages.

Changes since v7:

  • Clarified in the commit message why we cannot easily encapsulate the Scalar part of the CMake configuration in contrib/scalar/.
  • Improved the README.md.

Changes since v6:

  • Rebased on top of v2.34.0.
  • Inserted a commit that adds contrib/scalar/README.md, containing the roadmap of what I have planned for Scalar.
  • The Scalar test's definition of GIT_TEST_MAINT_SCHEDULER has been adjusted to accommodate for a change in v2.32.0..v2.34.0.
  • The config setting defaults now include fetch.showForcedUpdates=false, which has been identified as helping with a performance issue in large repositories.
  • To avoid mistaking the current patch series for being feature-complete enough to unleash onto end users, I moved the Makefile rules to build HTML/manual pages to a later patch series.
  • The patch that adds support for -c <key>=<value> and -C <directory> was moved to its own add-on patch series: While it is obvious that those options are valuable to have, an open question is whether there are other "pre-command" options in git that would be useful, too, and I would like to postpone that discussion to that date.
  • I added two patches that I had planned on keeping in an add-on patch series for later, to build and test Scalar as part of the CI. I am still not 100% certain that it is a good idea to do so already now, but let's see what the reviewers have to say.

Changes since v5:

  • Fixed the commit message talking about make -C contrib/scalar/Makefile.
  • Fixed the git ls-tree invocation suggested in the manual for scalar clone.
  • Invoking make -C contrib/scalar, then changing a source file of libgit.a and then immediately invoking make -C contrib/scalar again will now implicitly rebuild libgit.a.

Changes since v4:

  • scalar delete now refuses to delete anything if it was started from within the enlistment.
  • scalar delete releases any handles to the object store before deleting the enlistment.
  • The OBJECTS list in the Makefile will now include Scalar.
  • scalar register now supports secondary worktrees, in addition to the primary worktree.

Changes since v3:

  • Moved the "Changes since" section to the top, to make it easier to see what changed.
  • Reworded the commit message of the first patch.
  • Removed the [RFC] prefix because I did not hear any objections against putting this into contrib/.

Changes since v2:

  • Adjusted the description of the list command in the manual page , as suggested by Bagas.
  • Addressed two style nits in cmd_run().
  • The documentation of git reconfigure -a was improved.

Changes since v1:

  • A couple typos were fixed
  • The code parsing the output of ls-remote was made more readable
  • The indentation used in scalar.txt now consistently uses tabs
  • We no longer hard-code core.bare = false when registering with Scalar

Background

Microsoft invested a lot of effort into scaling Git to the needs of the Windows operating system source code. Based on the experience of the first approach, VFS for Git, the Scalar project was started. Scalar specifically has as its core goal to funnel all improvements into core Git.

The present

The Scalar project provides a completely functional non-virtual experience for monorepos. But why stop there. The Scalar project was designed to be a self-destructing vehicle to allow those key concepts to be moved into core Git itself for the benefit of all. For example, partial clone, sparse-checkout, and scheduled background maintenance have already been upstreamed and removed from Scalar proper. This patch series provides a C-based implementation of the final remaining portions of the Scalar command. This will make it easier for users to experiment with the Scalar command. It will also make it substantially easier to experiment with moving functionality from Scalar into core Git, while maintaining backwards-compatibility for existing Scalar users.

The C-based Scalar has been shipped to Scalar users, and can be tested by any interested reader: https://github.com/microsoft/git/releases/ (it offers a Git for Windows installer, a macOS package and an Ubuntu package, Scalar has been included since v2.33.0.vfs.0.0).

Next steps

Since there are existing Scalar users, I want to ensure backwards-compatibility with its existing command-line interface. Keeping that in mind, everything in this series is up for discussion.

I obviously believe that Scalar brings a huge benefit, and think that it would be ideal for all of Scalar's learnings to end up in git clone/git init/git maintenance eventually. It is also conceivable, however, that the scalar command could graduate to be a core part of Git at some stage in the future (such a decision would probably depend highly on users' feedback). See also the discussion about the architecture of Scalar, kicked off by Stolee.

On top of this patch series, I have lined up a few more:

  1. Implement a scalar diagnose command.
  2. Use the built-in FSMonitor (that patch series obviously needs to wait for FSMonitor to be integrated).
  3. Modify the config machinery to be more generous about concurrent writes, say, to the user-wide config.
  4. A few patches to optionally build and install scalar as part of a regular Git install (also teaching git help scalar to find the Scalar documentation

These are included in my vfs-with-scalar branch thicket. On top of that, this branch thicket also includes patches I do not plan on upstreaming, mainly because they are too specific either to VFS for Git, or they support Azure Repos (which does not offer partial clones but speaks the GVFS protocol, which can be used to emulate partial clones).

One other thing is very interesting about that vfs-with-scalar branch thicket: it contains a GitHub workflow which will run Scalar's quite extensive Functional Tests suite. This test suite is quite comprehensive and caught us a lot of bugs in the past, not only in the Scalar code, but also core Git.

Epilogue

Now, to address some questions that I imagine every reader has who made it this far:

  • Why not put the Scalar functionality directly into core Git, even a built-in? I wanted to provide an easy way for Git contributors to "play with" Scalar, without forcing a new top-level command into Git.
  • Why implement the Scalar command in the Git code base? Apart from simplifying Scalar maintenance in the Microsoft port of Git, the tight version coupling between Git and Scalar reduces the maintenance burden even further. Besides, I believe that it will make it much easier to shift functionality from Scalar into core Git, once we took the hurdle of accepting the Scalar code into the code base.
  • Why contribute Scalar to the Git project? We are biased, of course, yet our data-driven approach provides evidence that Scalar helps handling huge repositories with ease. By contributing it to the core Git project, we are able to share it with more users, especially some users who do not want to install Microsoft's fork of Git. We also hope that a lot of Scalar (maybe all of it) will end up in core Git, to benefit even more users.

cc: Derrick Stolee [email protected]
cc: Eric Sunshine [email protected]
cc: Ævar Arnfjörð Bjarmason [email protected]
cc: Elijah Newren [email protected]
cc: Bagas Sanjaya [email protected]
cc: "Theodore Ts'o" [email protected]
cc: Matt Rogers [email protected]
cc: Jeff King [email protected]

Copy link

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really well organized! I found a few points that would be good to fix before sending upstream.

@dscho dscho force-pushed the scalar-the-beginning branch 2 times, most recently from 0648bf6 to ee893f2 Compare July 28, 2021 23:04
@dscho dscho changed the title Upstream Scalar/C [RFC] Upstreaming the Scalar command Jul 28, 2021
@derrickstolee
Copy link

I'm happy with your latest version. Thanks!

@dscho dscho force-pushed the scalar-the-beginning branch 2 times, most recently from 0833924 to 4f609b2 Compare August 30, 2021 15:23
@dscho
Copy link
Member Author

dscho commented Aug 30, 2021

I'm happy with your latest version. Thanks!

Sorry, I had to go through all of the commits, and decided to make extensive (although admittedly only cosmetic) changes. The most important difference is that I integrated @vdye's fix for the Scalar enlistment discovery.

I also decided to squash a couple of commits that could be perceived as "oops, let's correct this" fixups to earlier commits in the topic branch.

In other words, the shape is sufficiently different enough to merit a re-review...

@dscho dscho force-pushed the scalar-the-beginning branch from 4f609b2 to 6455b18 Compare August 30, 2021 15:29
@dscho
Copy link
Member Author

dscho commented Aug 30, 2021

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 30, 2021

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-1005/dscho/scalar-the-beginning-v1

To fetch this version to local tag pr-1005/dscho/scalar-the-beginning-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-1005/dscho/scalar-the-beginning-v1

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 31, 2021

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/30/21 5:34 PM, Johannes Schindelin via GitGitGadget wrote:
> tl;dr: This series contributes the Scalar command to the Git project. This
> command provides an opinionated way to create and configure repositories
> with a focus on very large repositories.

I want to give Johannes a big thanks for organizing this RFC. As you
can see from the authorship of the patches, this was an amazingly
collaborative effort, but Johannes led the way by creating a base that
the rest of us could work with, then finally he brought in all of the
gritty details to finish the effort.

> Background
> ==========

...

> The Scalar project
> was created to make that separation, refine the key concepts, and then
> extract those features into the new Scalar command.

When people have asked me how Scalar fits with the core Git client, I
point them to our "Philosophy of Scalar" document [1]. The most concise
summary of our goals since starting Scalar has been that Scalar aligns
with features already within Git that enable scale. I've said several
times that we are constantly making Scalar do less by making Git do more.

[1] https://github.com/microsoft/git/blob/HEAD/contrib/scalar/docs/philosophy.md

Here is an example: when our large, internal customer told us that they
required Linux support for Scalar, we looked at what it would take. We
could have done the necessary platform-specific things to convince .NET
Core to create a long-running process that launched Git maintenance tasks
at different intervals, creating a similar mechanism to the Windows and
macOS services that did those operations. But we also knew that the
existing system was stuck with architectural decisions from VFS for Git
that were not actually in service of how Scalar worked. Instead, we
decided to build background maintenance into Git itself and had our Linux
port of Scalar run "git maintenance start".

Once the Linux port was proven out with Git's background maintenance, we
realized that the window where a user actually interacts with Scalar instead
of Git is extremely narrow: users run "scalar clone" or "scalar register"
and otherwise only run Git commands. The Scalar process does not need to
exist outside of that. (There are some other helpers that can be used in
a pinch to diagnose and fix problems, but they are rarely used. These
commands, such as 'scalar diagnose' can be contributed separately.)

It became clear that for our own needs it would be easier to ship one
installer that included the microsoft/git fork and the Scalar CLI, and
it would be simple to rewrite the Scalar CLI with all of the Git helper
APIs. We organized the code in a way that we thought would be amenable
to an upstream contribution (by placing in contrib/ and using Git code
style).

The thing about these commands is that they are _opinionated_. We rely
on these opinions for important internal users, but we realize that they
are not necessarily optimal for all users. Hence, we did not think it
wise to push those opinions onto the 'git' executable. Having 'scalar'
continue to live as a separate executable made sense to us.

I believe that by contributing Scalar to the full community, that we
create opportunities for Git in the future. For one, users and Git
distributors can opt into compiling Scalar so it is more available
to users who are interested. Another hopeful idea is that maybe this
reinvigorates ideas of how to streamline Git clones for large repos
without users needing to learn each and every knob to twist to get
things working. Since the Scalar CLI is contributed in the full
license of the Git project, pieces of it can be adapted into Git
proper as needed.

I look forward to hearing your thoughts.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 31, 2021

User Derrick Stolee <[email protected]> has been added to the cc: list.

@@ -0,0 +1,730 @@
/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Mon, Aug 30, 2021 at 5:35 PM Johannes Schindelin via GitGitGadget
<[email protected]> wrote:
> After a Scalar upgrade, it can come in really handy if there is an easy
> way to reconfigure all Scalar enlistments. This new option offers this
> functionality.
>
> Signed-off-by: Johannes Schindelin <[email protected]>
> ---
> diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
> @@ -121,6 +121,10 @@ After a Scalar upgrade, or when the configuration of a Scalar enlistment
> +With the `--all` option, all enlistments currently registered with Scalar
> +will be reconfigured. This option is meant to to be run every time Scalar
> +was upgraded.

s/was/is/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Eric,

On Tue, 31 Aug 2021, Eric Sunshine wrote:

> On Mon, Aug 30, 2021 at 5:35 PM Johannes Schindelin via GitGitGadget
> <[email protected]> wrote:
> > After a Scalar upgrade, it can come in really handy if there is an easy
> > way to reconfigure all Scalar enlistments. This new option offers this
> > functionality.
> >
> > Signed-off-by: Johannes Schindelin <[email protected]>
> > ---
> > diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
> > @@ -121,6 +121,10 @@ After a Scalar upgrade, or when the configuration of a Scalar enlistment
> > +With the `--all` option, all enlistments currently registered with Scalar
> > +will be reconfigured. This option is meant to to be run every time Scalar
> > +was upgraded.
>
> s/was/is/

I wanted to convey a temporal order, so I changed it to "every time after
Scalar is upgraded". Okay?

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Fri, Sep 3, 2021 at 11:23 AM Johannes Schindelin
<[email protected]> wrote:
> On Tue, 31 Aug 2021, Eric Sunshine wrote:
> > On Mon, Aug 30, 2021 at 5:35 PM Johannes Schindelin via GitGitGadget
> > > +With the `--all` option, all enlistments currently registered with Scalar
> > > +will be reconfigured. This option is meant to to be run every time Scalar
> > > +was upgraded.
> >
> > s/was/is/
>
> I wanted to convey a temporal order, so I changed it to "every time after
> Scalar is upgraded". Okay?

I think I understood the intent of the original, but it causes a
grammatical hiccup. Your revised version can work, although I might
write it this way:

    This option is meant to be run each time Scalar is upgraded.

However, perhaps that is too ambiguous and some users may think that
the process of upgrading Scalar will automatically run this command,
and you'd like to make it clear that it is the user's responsibility.
So, perhaps:

    Use this option after each Scalar upgrade.

or something.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Eric,

On Fri, 3 Sep 2021, Eric Sunshine wrote:

> On Fri, Sep 3, 2021 at 11:23 AM Johannes Schindelin
> <[email protected]> wrote:
> > On Tue, 31 Aug 2021, Eric Sunshine wrote:
> > > On Mon, Aug 30, 2021 at 5:35 PM Johannes Schindelin via GitGitGadget
> > > > +With the `--all` option, all enlistments currently registered with Scalar
> > > > +will be reconfigured. This option is meant to to be run every time Scalar
> > > > +was upgraded.
> > >
> > > s/was/is/
> >
> > I wanted to convey a temporal order, so I changed it to "every time after
> > Scalar is upgraded". Okay?
>
> I think I understood the intent of the original, but it causes a
> grammatical hiccup. Your revised version can work, although I might
> write it this way:
>
>     This option is meant to be run each time Scalar is upgraded.
>
> However, perhaps that is too ambiguous and some users may think that
> the process of upgrading Scalar will automatically run this command,
> and you'd like to make it clear that it is the user's responsibility.
> So, perhaps:
>
>     Use this option after each Scalar upgrade.
>
> or something.

I like the last one best, too.

Thank you,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 31, 2021

User Eric Sunshine <[email protected]> has been added to the cc: list.

@@ -0,0 +1,824 @@
/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Mon, Aug 30, 2021 at 5:35 PM Johannes Schindelin via GitGitGadget
<[email protected]> wrote:
> The .NET version of Scalar has a `version` command. This was necessary
> because it was versioned independently of Git.
>
> Since Scalar is now tightly coupled with Git, it does not make sense for
> them to show different versions. Therefore, it shows the same output as
> `git versions`. For backwards-compatibility with the .NET version,

s/versions/version/

> `scalar version` prints to `stderr`, though (`git version` prints to
> `stdout` instead).
>
> Signed-off-by: Johannes Schindelin <[email protected]>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Eric,

On Tue, 31 Aug 2021, Eric Sunshine wrote:

> On Mon, Aug 30, 2021 at 5:35 PM Johannes Schindelin via GitGitGadget
> <[email protected]> wrote:
> > The .NET version of Scalar has a `version` command. This was necessary
> > because it was versioned independently of Git.
> >
> > Since Scalar is now tightly coupled with Git, it does not make sense for
> > them to show different versions. Therefore, it shows the same output as
> > `git versions`. For backwards-compatibility with the .NET version,
>
> s/versions/version/

Thank you!
Dscho

>
> > `scalar version` prints to `stderr`, though (`git version` prints to
> > `stdout` instead).
> >
> > Signed-off-by: Johannes Schindelin <[email protected]>
>

@@ -0,0 +1,292 @@
/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Aug 30 2021, Derrick Stolee via GitGitGadget wrote:

> [...]
> +#ifndef WIN32
> +		{ "core.untrackedCache", "true" },
> +#else
> +		/*
> +		 * Unfortunately, Scalar's Functional Tests demonstrated
> +		 * that the untracked cache feature is unreliable on Windows
> +		 * (which is a bummer because that platform would benefit the
> +		 * most from it). For some reason, freshly created files seem
> +		 * not to update the directory's `lastModified` time
> +		 * immediately, but the untracked cache would need to rely on
> +		 * that.
> +		 *
> +		 * Therefore, with a sad heart, we disable this very useful
> +		 * feature on Windows.
> +		 */
> +		{ "core.untrackedCache", "false" },
> +#endif
> [...]

Ok, but why the need to set it to "false" explicitly? Does it need to be
so opinionated as to overwrite existing user-set config in these cases?

> +		{ "core.bare", "false" },

Shouldn't this be set by "git init" already?

> [...]
> +		{ "core.logAllRefUpdates", "true" },

An opinionated thing unrelated to performance?

> [...]
> +		{ "feature.manyFiles", "false" },
> +		{ "feature.experimental", "false" },

Ditto the question about the need to set this, these are false by
default, right?

> [...]
> +		if (git_config_get_string(config[i].key, &value)) {
> +			trace2_data_string("scalar", the_repository, config[i].key, "created");
> +			if (git_config_set_gently(config[i].key,
> +						  config[i].value) < 0)
> +				return error(_("could not configure %s=%s"),
> +					     config[i].key, config[i].value);
> +		} else {
> +			trace2_data_string("scalar", the_repository, config[i].key, "exists");
> +			free(value);
> +		}

The commit message doesn't discuss these trace2 additions, these in
particular seem like they might be useful, but better done as as some
more general trace2 intergration in config.c, i.e. if the functions
being called here did the same logging on config set/get.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/31/2021 4:11 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Aug 30 2021, Derrick Stolee via GitGitGadget wrote:
> 
>> [...]
>> +#ifndef WIN32
>> +		{ "core.untrackedCache", "true" },
>> +#else
>> +		/*
>> +		 * Unfortunately, Scalar's Functional Tests demonstrated
>> +		 * that the untracked cache feature is unreliable on Windows
>> +		 * (which is a bummer because that platform would benefit the
>> +		 * most from it). For some reason, freshly created files seem
>> +		 * not to update the directory's `lastModified` time
>> +		 * immediately, but the untracked cache would need to rely on
>> +		 * that.
>> +		 *
>> +		 * Therefore, with a sad heart, we disable this very useful
>> +		 * feature on Windows.
>> +		 */
>> +		{ "core.untrackedCache", "false" },
>> +#endif
>> [...]
> 
> Ok, but why the need to set it to "false" explicitly? Does it need to be
> so opinionated as to overwrite existing user-set config in these cases?

Users can overwrite this local config value, but this is placed to avoid
a global config value from applying specifically within Scalar-created
repos.
 
>> +		{ "core.bare", "false" },
> 
> Shouldn't this be set by "git init" already?

This one is probably a bit _too_ defensive. It can be removed.

>> [...]
>> +		{ "core.logAllRefUpdates", "true" },
> 
> An opinionated thing unrelated to performance?

It's an opinionated thing related to supporting monorepo users. It helps
us diagnose issues they have by recreating a sequence of events.

>> [...]
>> +		{ "feature.manyFiles", "false" },
>> +		{ "feature.experimental", "false" },
> 
> Ditto the question about the need to set this, these are false by
> default, right?

But if a user has them on globally, then we don't want them to apply
locally (in favor of the settings that we set explicitly).

>> [...]
>> +		if (git_config_get_string(config[i].key, &value)) {
>> +			trace2_data_string("scalar", the_repository, config[i].key, "created");
>> +			if (git_config_set_gently(config[i].key,
>> +						  config[i].value) < 0)
>> +				return error(_("could not configure %s=%s"),
>> +					     config[i].key, config[i].value);
>> +		} else {
>> +			trace2_data_string("scalar", the_repository, config[i].key, "exists");
>> +			free(value);
>> +		}
> 
> The commit message doesn't discuss these trace2 additions, these in
> particular seem like they might be useful, but better done as as some
> more general trace2 intergration in config.c, i.e. if the functions
> being called here did the same logging on config set/get.

If we want to do such a tracing change within git_config_set*(), then
that would be an appropriate replacement. The biggest reason to include
them here is to trace that an existing value already exists, for the
case of running 'scalar reconfigure' during an upgrade. That part
doesn't make much sense to put into config.c.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented Aug 31, 2021

User Ævar Arnfjörð Bjarmason <[email protected]> has been added to the cc: list.

@@ -0,0 +1,57 @@
QUIET_SUBDIR0 = +$(MAKE) -C # space to separate -C and subdir
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote:

> To test the Scalar command, create a test script in contrib/scalar/t
> that is executed as `make -C contrib/scalar test`. Since Scalar has no
> meaningful capabilities yet, the only test is rather simple. We will add
> more tests in subsequent commits that introduce corresponding, new
> functionality.

As a comment on 01..03/15: I'd really prefer if we stop using this
pattern of sub-Makefile, the dependencies are a pain to manage, and we
end up copy/pasting large sets of functionality.

That would mean just adding the build of this command to the top-level
Makefile behind some "CONTRIB_SCALAR" flag or whatever, but I find that
much cleaner than....

> @@ -21,7 +22,7 @@ include ../../config.mak.uname
>  TARGETS = scalar$(X) scalar.o
>  GITLIBS = ../../common-main.o ../../libgit.a ../../xdiff/lib.a
>  
> -all: scalar$X
> +all: scalar$X ../../bin-wrappers/scalar
>  
> [...]
> +../../bin-wrappers/scalar: ../../wrap-for-bin.sh Makefile
> [...]
>  scalar.html: | scalar.1 # prevent them from trying to build `doc.dep` in parallel

...things like this, which refer to assets built by other Makefiles, and
need to plaster over the dependency issues...

> +++ b/contrib/scalar/t/Makefile
> @@ -0,0 +1,78 @@
> +# Run scalar tests
> +#
> +# Copyright (c) 2005,2021 Junio C Hamano, Johannes Schindelin
> +#
> +
> +-include ../../../config.mak.autogen
> +-include ../../../config.mak
> +
> +SHELL_PATH ?= $(SHELL)
> +PERL_PATH ?= /usr/bin/perl
> +RM ?= rm -f
> +PROVE ?= prove
> +DEFAULT_TEST_TARGET ?= test
> +TEST_LINT ?= test-lint
> +
> +ifdef TEST_OUTPUT_DIRECTORY
> +TEST_RESULTS_DIRECTORY = $(TEST_OUTPUT_DIRECTORY)/test-results
> +else
> +TEST_RESULTS_DIRECTORY = ../../../t/test-results
> +endif
> +
> +# Shell quote;
> +SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
> +PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH))
> +TEST_RESULTS_DIRECTORY_SQ = $(subst ','\'',$(TEST_RESULTS_DIRECTORY))
> +
> +T = $(sort $(wildcard t[0-9][0-9][0-9][0-9]-*.sh))
> +
> +all: $(DEFAULT_TEST_TARGET)
> +
> +test: $(TEST_LINT)
> +	$(MAKE) aggregate-results-and-cleanup
> +
> +prove: $(TEST_LINT)
> +	@echo "*** prove ***"; GIT_CONFIG=.git/config $(PROVE) --exec '$(SHELL_PATH_SQ)' $(GIT_PROVE_OPTS) $(T) :: $(GIT_TEST_OPTS)
> +	$(MAKE) clean-except-prove-cache
> +
> +$(T):
> +	@echo "*** $@ ***"; GIT_CONFIG=.git/config '$(SHELL_PATH_SQ)' $@ $(GIT_TEST_OPTS)
> +
> +clean-except-prove-cache:
> +	$(RM) -r 'trash directory'.* '$(TEST_RESULTS_DIRECTORY_SQ)'
> +	$(RM) -r valgrind/bin
> +
> +clean: clean-except-prove-cache
> +	$(RM) .prove
> +
> +test-lint: test-lint-duplicates test-lint-executable test-lint-shell-syntax
> +
> +test-lint-duplicates:
> +	@dups=`echo $(T) | tr ' ' '\n' | sed 's/-.*//' | sort | uniq -d` && \
> +		test -z "$$dups" || { \
> +		echo >&2 "duplicate test numbers:" $$dups; exit 1; }
> +
> +test-lint-executable:
> +	@bad=`for i in $(T); do test -x "$$i" || echo $$i; done` && \
> +		test -z "$$bad" || { \
> +		echo >&2 "non-executable tests:" $$bad; exit 1; }
> +
> +test-lint-shell-syntax:
> +	@'$(PERL_PATH_SQ)' ../../../t/check-non-portable-shell.pl $(T)
> +
> +aggregate-results-and-cleanup: $(T)
> +	$(MAKE) aggregate-results
> +	$(MAKE) clean
> +
> +aggregate-results:
> +	for f in '$(TEST_RESULTS_DIRECTORY_SQ)'/t*-*.counts; do \
> +		echo "$$f"; \
> +	done | '$(SHELL_PATH_SQ)' ../../../t/aggregate-results.sh
> +
> +valgrind:
> +	$(MAKE) GIT_TEST_OPTS="$(GIT_TEST_OPTS) --valgrind"
> +
> +test-results:
> +	mkdir -p test-results
> +
> +.PHONY: $(T) aggregate-results clean valgrind

...and this entire copy/pasting & adjusting of t/Makefile.

@@ -0,0 +1,583 @@
/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote:

> This implements Scalar's opinionated `clone` command: it tries to use a
> partial clone and sets up a sparse checkout by default. In contrast to
> `git clone`, `scalar clone` sets up the worktree in the `src/`
> subdirectory, to encourage a separation between the source files and the
> build output (which helps Git tremendously because it avoids untracked
> files that have to be specifically ignored when refreshing the index).

Perhaps nobody else wondered this while reading this, but I thought this
might be some sparse/worktree magic where cloning into "foo" would have
"foo/.git", but the worktree was somehow magically mapped at foo/src/".

But no, it just takes your "scalar clone <url> foo" and translates it to
"foo/src", so you'll get a directory at "foo".

> Note: We intentionally use a slightly wasteful `set_config()` function
> (which does not reuse a single `strbuf`, for example, though performance
> _really_ does not matter here) for convenience and readability.

FWIW I think the commit message could do without this, that part of the
code is obviously not performance sensitive at all. But maybe an
explicit note helps anyway...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Tue, Aug 31, 2021 at 8:04 AM Ævar Arnfjörð Bjarmason
<[email protected]> wrote:
> On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote:
> > Note: We intentionally use a slightly wasteful `set_config()` function
> > (which does not reuse a single `strbuf`, for example, though performance
> > _really_ does not matter here) for convenience and readability.
>
> FWIW I think the commit message could do without this, that part of the
> code is obviously not performance sensitive at all. But maybe an
> explicit note helps anyway...

FWIW, I also found this distracting; it takes the reader's attention
away from more important aspects of the patch. (But it alone is not
worth a re-roll; it was just a minor hiccup.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--8323328-74038682-1630682474=:55
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Eric,

On Tue, 31 Aug 2021, Eric Sunshine wrote:

> On Tue, Aug 31, 2021 at 8:04 AM =C3=86var Arnfj=C3=B6r=C3=B0 Bjarmason
> <[email protected]> wrote:
> > On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote:
> > > Note: We intentionally use a slightly wasteful `set_config()` functi=
on
> > > (which does not reuse a single `strbuf`, for example, though perform=
ance
> > > _really_ does not matter here) for convenience and readability.
> >
> > FWIW I think the commit message could do without this, that part of th=
e
> > code is obviously not performance sensitive at all. But maybe an
> > explicit note helps anyway...
>
> FWIW, I also found this distracting; it takes the reader's attention
> away from more important aspects of the patch. (But it alone is not
> worth a re-roll; it was just a minor hiccup.)

Since I reworked the remote default branch parsing anyway, I removed this
paragraph from the commit message.

Ciao,
Dscho

--8323328-74038682-1630682474=:55--

@@ -0,0 +1,652 @@
/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Aug 30 2021, Derrick Stolee via GitGitGadget wrote:

> +	const char *usagestr[] = { NULL, NULL };

Missing usage strings?

> +	if (argc == 0)

Style nit (per style guide): s/argc == 0/!argc/g.

> +	if (!strcmp("all", argv[0]))
> +		i = -1;

Style nit (per style guide): missing braces here.

(Just noting the style nits once, but more in this patch, and presumably
the rest of the series...)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--8323328-1090042449-1630684217=:55
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi =C3=86var,

On Tue, 31 Aug 2021, =C3=86var Arnfj=C3=B6r=C3=B0 Bjarmason wrote:

> On Mon, Aug 30 2021, Derrick Stolee via GitGitGadget wrote:
>
> > +	const char *usagestr[] =3D { NULL, NULL };
>
> Missing usage strings?

This command will show a generated usage, i.e. a non-static string. It
therefore cannot be specified here already. See the `strbuf_*()` calls
populating `buf` and the `usagestr[0] =3D buf.buf;` assignment.

> > +	if (argc =3D=3D 0)
>
> Style nit (per style guide): s/argc =3D=3D 0/!argc/g.

It is true that we often do this, but in this instance it would be
misleading: `argc` is a counter, not a Boolean.

> > +	if (!strcmp("all", argv[0]))
> > +		i =3D -1;
>
> Style nit (per style guide): missing braces here.

The style guide specifically allows my preference to leave single-line
blocks without curlies.

Ciao,
Johannes

--8323328-1090042449-1630684217=:55--

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <[email protected]> writes:

> Hi Ævar,
>
> On Tue, 31 Aug 2021, Ævar Arnfjörð Bjarmason wrote:
>
>> On Mon, Aug 30 2021, Derrick Stolee via GitGitGadget wrote:
>>
>> > +	const char *usagestr[] = { NULL, NULL };
>>
>> Missing usage strings?
>
> This command will show a generated usage, i.e. a non-static string. It
> therefore cannot be specified here already. See the `strbuf_*()` calls
> populating `buf` and the `usagestr[0] = buf.buf;` assignment.
>
>> > +	if (argc == 0)
>>
>> Style nit (per style guide): s/argc == 0/!argc/g.
>
> It is true that we often do this, but in this instance it would be
> misleading: `argc` is a counter, not a Boolean.

That argument could be a plausible excuse to deviate from the style
if it were

	if (argc == 0)
		do no args case;
	else if (argc == 1)
		do one arg case;
	else if (argc == 2)
		do two args case;
	...

Replacing the first one with "if (!argc)" may make it less readable.

But I do not think the reasoning applies here

	if (argc == 0)
		do a thing that applies only to no args case;

without "else".  This is talking about "do we have any argument? Yes
or no?" Boolean here.

>> > +	if (!strcmp("all", argv[0]))
>> > +		i = -1;
>>
>> Style nit (per style guide): missing braces here.
>
> The style guide specifically allows my preference to leave single-line
> blocks without curlies.

Actually, the exception goes the other way, no?

We generally want to avoid such an unnecessary braces around a
single statement block.  But when we have an else clause that has a
block with multiple statements (hence braces are required), as an
exception, the guide asks you to write braces around the body of the
if side for consistency.

When you only have just a couple of lines on the "else {}" side, I
do not think it matters too much either way for readability, though.
I cannot see the "else" side in the above clause, but IIRC it wasn't
just a few lines, was it?

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--8323328-1395184817-1631128285=:55
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Junio,

On Fri, 3 Sep 2021, Junio C Hamano wrote:

> Johannes Schindelin <[email protected]> writes:
>
> > Hi =C3=86var,
> >
> > On Tue, 31 Aug 2021, =C3=86var Arnfj=C3=B6r=C3=B0 Bjarmason wrote:
> >
> >> On Mon, Aug 30 2021, Derrick Stolee via GitGitGadget wrote:
> >>
> >> > +	const char *usagestr[] =3D { NULL, NULL };
> >>
> >> Missing usage strings?
> >
> > This command will show a generated usage, i.e. a non-static string. It
> > therefore cannot be specified here already. See the `strbuf_*()` calls
> > populating `buf` and the `usagestr[0] =3D buf.buf;` assignment.
> >
> >> > +	if (argc =3D=3D 0)
> >>
> >> Style nit (per style guide): s/argc =3D=3D 0/!argc/g.
> >
> > It is true that we often do this, but in this instance it would be
> > misleading: `argc` is a counter, not a Boolean.
>
> That argument could be a plausible excuse to deviate from the style
> if it were
>
> 	if (argc =3D=3D 0)
> 		do no args case;
> 	else if (argc =3D=3D 1)
> 		do one arg case;
> 	else if (argc =3D=3D 2)
> 		do two args case;
> 	...
>
> Replacing the first one with "if (!argc)" may make it less readable.
>
> But I do not think the reasoning applies here
>
> 	if (argc =3D=3D 0)
> 		do a thing that applies only to no args case;
>
> without "else".  This is talking about "do we have any argument? Yes
> or no?" Boolean here.

Well, I offer a differing opinion. But you're right, we are at least
consistent in Git's source code in using `!i` where other projects would
use `i =3D=3D 0`, and consistency is definitely something I'd like to see =
more
in Git, not less.

So I changed it as you suggested.

>
> >> > +	if (!strcmp("all", argv[0]))
> >> > +		i =3D -1;
> >>
> >> Style nit (per style guide): missing braces here.
> >
> > The style guide specifically allows my preference to leave single-line
> > blocks without curlies.
>
> Actually, the exception goes the other way, no?
>
> We generally want to avoid such an unnecessary braces around a
> single statement block.  But when we have an else clause that has a
> block with multiple statements (hence braces are required), as an
> exception, the guide asks you to write braces around the body of the
> if side for consistency.

You're right. I am somehow still using the previous style where we
_required_ single-line blocks _not_ to have curly brackets (see e.g.
aa1c48df817 ([PATCH] ls-tree enhancements, 2005-04-15), the `else` part of
the added `if (! eltbuf)` block).

>
> When you only have just a couple of lines on the "else {}" side, I
> do not think it matters too much either way for readability, though.
> I cannot see the "else" side in the above clause, but IIRC it wasn't
> just a few lines, was it?

It depends what you count as "just a few lines". There are seven lines
enclosed within the curly brackets of the `else` block.

But as much as I enjoy thorough reviews of the Scalar code, I am failing
at getting excited about code style discussions, therefore I simply went
with your suggestion to enclose even the single-line block in curly
brackets.

Thanks,
Dscho

--8323328-1395184817-1631128285=:55--

@@ -0,0 +1,675 @@
/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote:

> This comes in handy during Scalar upgrades, or when config settings were
> messed up by mistake.

> [...]
>  		const char *key;
>  		const char *value;
> +		int overwrite_on_reconfigure;

If you make this a "keep_on_reconfigure", then ...

>  	} config[] = {
> -		{ "am.keepCR", "true" },
> -		{ "core.FSCache", "true" },
> -		{ "core.multiPackIndex", "true" },
> -		{ "core.preloadIndex", "true" },
> +		/* Required */
> +		{ "am.keepCR", "true", 1 },
> +		{ "core.FSCache", "true", 1 },
> +		{ "core.multiPackIndex", "true", 1 },
> +		{ "core.preloadIndex", "true", 1 },

You won't need the churn/boilerplate of adding "1" to everything here,
but can just change the initial patch to use designated initializers.

That along with a throwaway macro like:

#define SCALAR_CFG_TRUE(k) (.key = k, .value = "true")
#define SCALAR_CFG_FALSE(k) (.key = k, .value = "false")

Might (or might not) make this even easier to eyeball...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--8323328-1470407212-1630684427=:55
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi =C3=86var,

On Tue, 31 Aug 2021, =C3=86var Arnfj=C3=B6r=C3=B0 Bjarmason wrote:

>
> On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote:
>
> > This comes in handy during Scalar upgrades, or when config settings we=
re
> > messed up by mistake.
>
> > [...]
> >  		const char *key;
> >  		const char *value;
> > +		int overwrite_on_reconfigure;
>
> If you make this a "keep_on_reconfigure", then ...

I do not think that this would be a better name, or that renaming this
field would do anything except cause more work for me.

>
> >  	} config[] =3D {
> > -		{ "am.keepCR", "true" },
> > -		{ "core.FSCache", "true" },
> > -		{ "core.multiPackIndex", "true" },
> > -		{ "core.preloadIndex", "true" },
> > +		/* Required */
> > +		{ "am.keepCR", "true", 1 },
> > +		{ "core.FSCache", "true", 1 },
> > +		{ "core.multiPackIndex", "true", 1 },
> > +		{ "core.preloadIndex", "true", 1 },
>
> You won't need the churn/boilerplate of adding "1" to everything here,
> but can just change the initial patch to use designated initializers.
>
> That along with a throwaway macro like:
>
> #define SCALAR_CFG_TRUE(k) (.key =3D k, .value =3D "true")
> #define SCALAR_CFG_FALSE(k) (.key =3D k, .value =3D "false")
>
> Might (or might not) make this even easier to eyeball...

To me, it makes things less readable. There is an entire section with the
header `/* Optional */` below, and I want this list to stay as readable as
it is now.

Ciao,
Dscho

--8323328-1470407212-1630684427=:55--

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Fri, Sep 03 2021, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Tue, 31 Aug 2021, Ævar Arnfjörð Bjarmason wrote:
>
>>
>> On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote:
>>
>> > This comes in handy during Scalar upgrades, or when config settings were
>> > messed up by mistake.
>>
>> > [...]
>> >  		const char *key;
>> >  		const char *value;
>> > +		int overwrite_on_reconfigure;
>>
>> If you make this a "keep_on_reconfigure", then ...
>
> I do not think that this would be a better name, or that renaming this
> field would do anything except cause more work for me.

It would also result in more readable code, i.e. why add boilerplate ",
1" to a boolean field in this case if every single setting is set to
"1"? Doesn't it make more sense to invert the variable name & save on
the verbosity?

>>
>> >  	} config[] = {
>> > -		{ "am.keepCR", "true" },
>> > -		{ "core.FSCache", "true" },
>> > -		{ "core.multiPackIndex", "true" },
>> > -		{ "core.preloadIndex", "true" },
>> > +		/* Required */
>> > +		{ "am.keepCR", "true", 1 },
>> > +		{ "core.FSCache", "true", 1 },
>> > +		{ "core.multiPackIndex", "true", 1 },
>> > +		{ "core.preloadIndex", "true", 1 },
>>
>> You won't need the churn/boilerplate of adding "1" to everything here,
>> but can just change the initial patch to use designated initializers.
>>
>> That along with a throwaway macro like:
>>
>> #define SCALAR_CFG_TRUE(k) (.key = k, .value = "true")
>> #define SCALAR_CFG_FALSE(k) (.key = k, .value = "false")
>>
>> Might (or might not) make this even easier to eyeball...
>
> To me, it makes things less readable. There is an entire section with the
> header `/* Optional */` below, and I want this list to stay as readable as
> it is now.

Yeah, I think those macros are probably less readable too. I should have
phrased that as a "one could even...", but just the smaller change of
avoiding the ", 1" everywhere seems worthwhile.

@@ -0,0 +1,844 @@
/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote:

> The `git` executable has these two very useful options:
>
> -C <directory>:
> 	switch to the specified directory before performing any actions
>
> -c <key>=<value>:
> 	temporarily configure this setting for the duration of the
> 	specified scalar subcommand
>
> With this commit, we teach the `scalar` executable the same trick.
> [...]
> +	while (argc > 1 && *argv[1] == '-') {
> +		if (!strcmp(argv[1], "-C")) {
> +			if (argc < 3)
> +				die(_("-C requires a <directory>"));
> +			if (chdir(argv[2]) < 0)
> +				die_errno(_("could not change to '%s'"),
> +					  argv[2]);
> +			argc -= 2;
> +			argv += 2;
> +		} else if (!strcmp(argv[1], "-c")) {
> +			if (argc < 3)
> +				die(_("-c requires a <key>=<value> argument"));
> +			git_config_push_parameter(argv[2]);
> +			argc -= 2;
> +			argv += 2;
> +		} else
> +			break;
> +	}

This along with my earlier comment about the Makefile copy/pasting makes
me wonder if an easier way to integrate this wouldn't be to refactor
git.c a bit to have it understand either "git" or "scalar", then instead
of "ls-tree" etc. as "git" the subcommands would become "built-ins".

Which would give us both "[git|scalar] [-c ...] <cmd>" for free, and
elimante the need for the inevetable future divergence of wanting -p,
-P, --exec-path etc. in both.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/31/2021 4:32 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote:
> 
>> The `git` executable has these two very useful options:
>>
>> -C <directory>:
>> 	switch to the specified directory before performing any actions
>>
>> -c <key>=<value>:
>> 	temporarily configure this setting for the duration of the
>> 	specified scalar subcommand
>>
>> With this commit, we teach the `scalar` executable the same trick.
>> [...]
>> +	while (argc > 1 && *argv[1] == '-') {
>> +		if (!strcmp(argv[1], "-C")) {
>> +			if (argc < 3)
>> +				die(_("-C requires a <directory>"));
>> +			if (chdir(argv[2]) < 0)
>> +				die_errno(_("could not change to '%s'"),
>> +					  argv[2]);
>> +			argc -= 2;
>> +			argv += 2;
>> +		} else if (!strcmp(argv[1], "-c")) {
>> +			if (argc < 3)
>> +				die(_("-c requires a <key>=<value> argument"));
>> +			git_config_push_parameter(argv[2]);
>> +			argc -= 2;
>> +			argv += 2;
>> +		} else
>> +			break;
>> +	}
> 
> This along with my earlier comment about the Makefile copy/pasting makes
> me wonder if an easier way to integrate this wouldn't be to refactor
> git.c a bit to have it understand either "git" or "scalar", then instead
> of "ls-tree" etc. as "git" the subcommands would become "built-ins".
> 
> Which would give us both "[git|scalar] [-c ...] <cmd>" for free, and
> elimante the need for the inevetable future divergence of wanting -p,
> -P, --exec-path etc. in both.
 
Such a change would likely eliminate the ability to not include Scalar
when building the Git codebase, which we tried to avoid by keeping it
within contrib and have it be compiled via an opt-in flag.

If we want to talk about integrating Scalar into Git in a deeper way,
then that is an interesting discussion to have, but it lives at a much
higher level than Makefile details.

The questions we are really looking to answer in this RFC are:

1. Will the Git project accept Scalar into its codebase?

2. What is the best place for Scalar to live in the Git codebase?

We erred on the side of keeping Scalar as optional as possible. If
the community is more interested in a deeper integration, then that
could be an interesting direction.

In my opinion, I think the current tactic is safest. We could always
decide on a deeper integration later by moving the code around. It
seems harder to do the reverse.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Tue, Aug 31 2021, Derrick Stolee wrote:

> On 8/31/2021 4:32 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Mon, Aug 30 2021, Johannes Schindelin via GitGitGadget wrote:
>> 
>>> The `git` executable has these two very useful options:
>>>
>>> -C <directory>:
>>> 	switch to the specified directory before performing any actions
>>>
>>> -c <key>=<value>:
>>> 	temporarily configure this setting for the duration of the
>>> 	specified scalar subcommand
>>>
>>> With this commit, we teach the `scalar` executable the same trick.
>>> [...]
>>> +	while (argc > 1 && *argv[1] == '-') {
>>> +		if (!strcmp(argv[1], "-C")) {
>>> +			if (argc < 3)
>>> +				die(_("-C requires a <directory>"));
>>> +			if (chdir(argv[2]) < 0)
>>> +				die_errno(_("could not change to '%s'"),
>>> +					  argv[2]);
>>> +			argc -= 2;
>>> +			argv += 2;
>>> +		} else if (!strcmp(argv[1], "-c")) {
>>> +			if (argc < 3)
>>> +				die(_("-c requires a <key>=<value> argument"));
>>> +			git_config_push_parameter(argv[2]);
>>> +			argc -= 2;
>>> +			argv += 2;
>>> +		} else
>>> +			break;
>>> +	}
>> 
>> This along with my earlier comment about the Makefile copy/pasting makes
>> me wonder if an easier way to integrate this wouldn't be to refactor
>> git.c a bit to have it understand either "git" or "scalar", then instead
>> of "ls-tree" etc. as "git" the subcommands would become "built-ins".
>> 
>> Which would give us both "[git|scalar] [-c ...] <cmd>" for free, and
>> elimante the need for the inevetable future divergence of wanting -p,
>> -P, --exec-path etc. in both.
>  
> Such a change would likely eliminate the ability to not include Scalar
> when building the Git codebase, which we tried to avoid by keeping it
> within contrib and have it be compiled via an opt-in flag.

I mean to still have it behind a flag, but to handle it similar to how
we handle NO_CURL, EXCLUDED_PROGRAMS and the like, i.e. not requiring
parallel maintenance of copy/pasted Makefile logic in contrib/.

> If we want to talk about integrating Scalar into Git in a deeper way,
> then that is an interesting discussion to have, but it lives at a much
> higher level than Makefile details.

To be clear I'm proposing no change at all in term of what happens when
you run "make install", just commenting on the implementation details of
how we arrange for things to be built and configured before that step.

I realize that this is following some prior art of
e.g. contrib/subtree/Makefile, but IMNSHO that approach is a historical
mistake we should be backing out of. There was some recent discussion of
this here:
https://lore.kernel.org/git/[email protected]/

E.g. now we have some painful management of the depencency graph between
/Makefile and Documentation/Makefile requiring fixes like 56550ea7180
(Makefile: add missing dependencies of 'config-list.h', 2021-04-08),
adding yet another Makefile into the mix which (to take one example)
depends on doc.dep, which in turn depends on ...; It's all a bunch of
needless complexity we can avoid.

> The questions we are really looking to answer in this RFC are:
>
> 1. Will the Git project accept Scalar into its codebase?
>
> 2. What is the best place for Scalar to live in the Git codebase?
>
> We erred on the side of keeping Scalar as optional as possible. If
> the community is more interested in a deeper integration, then that
> could be an interesting direction.

Indeed, to be clear I realize I'm entirely punting on the real questions
you're interested in. I just gave this an initial cursory skimming for
now, I have not formed an informed opinion on your #1, but just a little
bit of #2.

My initial reaction to #1 without having looked into it deeply is some
combination of "sure, why not?", and that the people/group contributing
major scalability work to git.git should be given the benefit of the
doubt. Maybe we won't keep "scalar" long-term, or change its UI etc.,
all of that can be handled in some carefully worded documentation
somewhere.

Of course all these suggestions I'm making about Makefile arrangement
are rather pointless if there isn't consensus to get past the hurdle of
your #1.

> In my opinion, I think the current tactic is safest. We could always
> decide on a deeper integration later by moving the code around. It
> seems harder to do the reverse.

I think "deeper integration" is the reverse of what you think it is.

I.e. if I'm patching or maintaining part of the Makefile logic to it's
deeper (or perhaps "gnarlier" is the righ word?) integration to need to
duplicate that work in two places, or always take into account that some
not-built-by-default-but-quite-common command's *.txt docs and *.sh
tests live in some unusual place for the purposes of CI, lint, tooling
etc.

In other words, it's a question of how much net complexity is being
added to the (build) system. That complexity doesn't automatically
reduce just because some files live in another directory, sometimes
that's an increase in complexity.

Whereas just conditionally adding it to some list in the top-level
Makefile (or Documentation/Makefile) is relatively maintenance-free, and
to our users / packagers the result should be the same or near enough.
It won't matter to them if building the optional thing is another "make"
command or just a flag to the existing "make" command.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 1, 2021

On the Git mailing list, Elijah Newren wrote (reply to this):

On Mon, Aug 30, 2021 at 5:52 PM Derrick Stolee <[email protected]> wrote:
>
> On 8/30/21 5:34 PM, Johannes Schindelin via GitGitGadget wrote:
> > tl;dr: This series contributes the Scalar command to the Git project. This
> > command provides an opinionated way to create and configure repositories
> > with a focus on very large repositories.
>
> I want to give Johannes a big thanks for organizing this RFC. As you
> can see from the authorship of the patches, this was an amazingly
> collaborative effort, but Johannes led the way by creating a base that
> the rest of us could work with, then finally he brought in all of the
> gritty details to finish the effort.
>
> > Background
> > ==========
>
> ...
>
> > The Scalar project
> > was created to make that separation, refine the key concepts, and then
> > extract those features into the new Scalar command.
>
> When people have asked me how Scalar fits with the core Git client, I
> point them to our "Philosophy of Scalar" document [1]. The most concise
> summary of our goals since starting Scalar has been that Scalar aligns
> with features already within Git that enable scale. I've said several
> times that we are constantly making Scalar do less by making Git do more.
>
> [1] https://github.com/microsoft/git/blob/HEAD/contrib/scalar/docs/philosophy.md
>
> Here is an example: when our large, internal customer told us that they
> required Linux support for Scalar, we looked at what it would take. We
> could have done the necessary platform-specific things to convince .NET
> Core to create a long-running process that launched Git maintenance tasks
> at different intervals, creating a similar mechanism to the Windows and
> macOS services that did those operations. But we also knew that the
> existing system was stuck with architectural decisions from VFS for Git
> that were not actually in service of how Scalar worked. Instead, we
> decided to build background maintenance into Git itself and had our Linux
> port of Scalar run "git maintenance start".
>
> Once the Linux port was proven out with Git's background maintenance, we
> realized that the window where a user actually interacts with Scalar instead
> of Git is extremely narrow: users run "scalar clone" or "scalar register"
> and otherwise only run Git commands. The Scalar process does not need to
> exist outside of that. (There are some other helpers that can be used in
> a pinch to diagnose and fix problems, but they are rarely used. These
> commands, such as 'scalar diagnose' can be contributed separately.)
>
> It became clear that for our own needs it would be easier to ship one
> installer that included the microsoft/git fork and the Scalar CLI, and
> it would be simple to rewrite the Scalar CLI with all of the Git helper
> APIs. We organized the code in a way that we thought would be amenable
> to an upstream contribution (by placing in contrib/ and using Git code
> style).
>
> The thing about these commands is that they are _opinionated_. We rely
> on these opinions for important internal users, but we realize that they
> are not necessarily optimal for all users. Hence, we did not think it
> wise to push those opinions onto the 'git' executable. Having 'scalar'
> continue to live as a separate executable made sense to us.
>
> I believe that by contributing Scalar to the full community, that we
> create opportunities for Git in the future. For one, users and Git
> distributors can opt into compiling Scalar so it is more available
> to users who are interested. Another hopeful idea is that maybe this
> reinvigorates ideas of how to streamline Git clones for large repos
> without users needing to learn each and every knob to twist to get
> things working. Since the Scalar CLI is contributed in the full
> license of the Git project, pieces of it can be adapted into Git
> proper as needed.
>
> I look forward to hearing your thoughts.
>
> Thanks,
> -Stolee

Looks like exciting stuff, you two.  I'm behind on review as it is; I
still need to get back to Stolee's sparse-index add/rm/mv series, but
I'll try to circle back and take a look.

@gitgitgadget
Copy link

gitgitgadget bot commented Sep 1, 2021

User Elijah Newren <[email protected]> has been added to the cc: list.

@@ -0,0 +1,292 @@
/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes:

> +static void setup_enlistment_directory(int argc, const char **argv,
> +				       const char * const *usagestr,
> +				       const struct option *options,
> +				       struct strbuf *enlistment_root)
> +{
> +	struct strbuf path = STRBUF_INIT;
> +	char *root;
> +	int enlistment_found = 0;
> +
> +	if (startup_info->have_repository)
> +		BUG("gitdir already set up?!?");
> +
> +	if (argc > 1)
> +		usage_with_options(usagestr, options);
> +
> +	/* find the worktree, determine its corresponding root */
> +	if (argc == 1)
> +		strbuf_add_absolute_path(&path, argv[0]);
> +	else if (strbuf_getcwd(&path) < 0)
> +		die(_("need a working directory"));
> +
> +	strbuf_trim_trailing_dir_sep(&path);
> +	do {
> +		const size_t len = path.len;
> +
> +		/* check if currently in enlistment root with src/ workdir */
> +		strbuf_addstr(&path, "/src/.git");
> +		if (is_git_directory(path.buf)) {
> +			strbuf_strip_suffix(&path, "/.git");
> +
> +			if (enlistment_root)
> +				strbuf_add(enlistment_root, path.buf, len);
> +
> +			enlistment_found = 1;
> +			break;
> +		}

This special casing of "normally the top of the working tree is
enlisted, but if the repository is called src/, then we enslist
one level up" is a bit of eyesore because

 (1) it is unclear why such a directory with 'src/' subdirectory is
     so special, and

 (2) it fails to serve those who has the same need but named their
     source subdirectory differently (like 'source/').

"The design decisions we made are all part of being opinionated" can
all explain it away, but at least we should let the users know where
the opinionated choices scalar makes want to lead them to, and this
"src/" stuff needs a bit of clarification.  Perhaps a documentation
will be added in later steps?

> +	for (i = 0; config[i].key; i++) {
> +		if (git_config_get_string(config[i].key, &value)) {
> +			trace2_data_string("scalar", the_repository, config[i].key, "created");
> +			if (git_config_set_gently(config[i].key,
> +						  config[i].value) < 0)
> +				return error(_("could not configure %s=%s"),
> +					     config[i].key, config[i].value);
> +		} else {
> +			trace2_data_string("scalar", the_repository, config[i].key, "exists");
> +			free(value);
> +		}

I wonder if we should have a table of configuration variables and
their default values.  The above code implements a skewed "we only
avoid overriding what is explicitly configured".  A variable that
the user left unconfigured because the user found its default
satisfactory will be overridden, and if the value scalar wants to
use happens to be the default value, we leave an explicit
configuration to that default value in the resulting configuration
file.

But I think the above is the best we can do without such a central
registry of configuration variables.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Wed, 1 Sep 2021, Junio C Hamano wrote:

> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
>
> > +static void setup_enlistment_directory(int argc, const char **argv,
> > +				       const char * const *usagestr,
> > +				       const struct option *options,
> > +				       struct strbuf *enlistment_root)
> > +{
> > +	struct strbuf path = STRBUF_INIT;
> > +	char *root;
> > +	int enlistment_found = 0;
> > +
> > +	if (startup_info->have_repository)
> > +		BUG("gitdir already set up?!?");
> > +
> > +	if (argc > 1)
> > +		usage_with_options(usagestr, options);
> > +
> > +	/* find the worktree, determine its corresponding root */
> > +	if (argc == 1)
> > +		strbuf_add_absolute_path(&path, argv[0]);
> > +	else if (strbuf_getcwd(&path) < 0)
> > +		die(_("need a working directory"));
> > +
> > +	strbuf_trim_trailing_dir_sep(&path);
> > +	do {
> > +		const size_t len = path.len;
> > +
> > +		/* check if currently in enlistment root with src/ workdir */
> > +		strbuf_addstr(&path, "/src/.git");
> > +		if (is_git_directory(path.buf)) {
> > +			strbuf_strip_suffix(&path, "/.git");
> > +
> > +			if (enlistment_root)
> > +				strbuf_add(enlistment_root, path.buf, len);
> > +
> > +			enlistment_found = 1;
> > +			break;
> > +		}
>
> This special casing of "normally the top of the working tree is
> enlisted, but if the repository is called src/, then we enslist
> one level up" is a bit of eyesore because
>
>  (1) it is unclear why such a directory with 'src/' subdirectory is
>      so special, and
>
>  (2) it fails to serve those who has the same need but named their
>      source subdirectory differently (like 'source/').

All true. I wish we had come up with a better way, or with a way to
override this via an option.

Unfortunately, we are now bound by the fact that there are already users
out there...

> "The design decisions we made are all part of being opinionated" can
> all explain it away, but at least we should let the users know where
> the opinionated choices scalar makes want to lead them to, and this
> "src/" stuff needs a bit of clarification.  Perhaps a documentation
> will be added in later steps?

I had hoped that the initial blurb of the manual page was sufficient, but
you're right, the `register` subcommand is particular in that it allows to
force Scalar to consider the worktree to be identical to the Scalar
enlistment. I added this:

	diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
	index 1593da45eae..568987064b2 100644
	--- a/contrib/scalar/scalar.txt
	+++ b/contrib/scalar/scalar.txt
	@@ -40,6 +40,10 @@ register [<enlistment>]::
		and starts background maintenance. If `<enlistment>` is not provided,
		then the enlistment associated with the current working directory is
		registered.
	++
	+Note: when this subcommand is called in a worktree that is called `src/`, its
	+parent directory is considered to be the Scalar enlistment. If the worktree is
	+_not_ called `src/`, it itself will be considered to be the Scalar enlistment.

> > +	for (i = 0; config[i].key; i++) {
> > +		if (git_config_get_string(config[i].key, &value)) {
> > +			trace2_data_string("scalar", the_repository, config[i].key, "created");
> > +			if (git_config_set_gently(config[i].key,
> > +						  config[i].value) < 0)
> > +				return error(_("could not configure %s=%s"),
> > +					     config[i].key, config[i].value);
> > +		} else {
> > +			trace2_data_string("scalar", the_repository, config[i].key, "exists");
> > +			free(value);
> > +		}
>
> I wonder if we should have a table of configuration variables and
> their default values.  The above code implements a skewed "we only
> avoid overriding what is explicitly configured".  A variable that
> the user left unconfigured because the user found its default
> satisfactory will be overridden, and if the value scalar wants to
> use happens to be the default value, we leave an explicit
> configuration to that default value in the resulting configuration
> file.
>
> But I think the above is the best we can do without such a central
> registry of configuration variables.

Even with such a central registry, there would still be the question
whether the user, by staying with the default, wanted Git (or in this
instance, Scalar) to keep using the old default. The intention is
unfortunately not clear just from setting the variable.

So I think this is the best we can do.

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <[email protected]> writes:

>> "The design decisions we made are all part of being opinionated" can
>> all explain it away, but at least we should let the users know where
>> the opinionated choices scalar makes want to lead them to, and this
>> "src/" stuff needs a bit of clarification.  Perhaps a documentation
>> will be added in later steps?
>
> I had hoped that the initial blurb of the manual page was sufficient, but
> you're right, the `register` subcommand is particular in that it allows to
> force Scalar to consider the worktree to be identical to the Scalar
> enlistment. I added this:

Sorry, if it weren't clear that I was commenting on each step as I
read along without peeking later steps.  I think I saw it was
written somewhere that this was to encourage use of read-only
directory that keeps the sources with build artifacts and crufts
created outside it (so forests of projects will not have the source
directories, each of which has its own .git/, next to each other---
instead we would have shell directories, each with its own src/ and
src/.git, next to each other).  The additional documentation below
is a good thing to have handy when readers learn how to use
"register" (or more generally, what an "enlistment" is).  As long as
the motivation behind that design is given somewhere (not necessarily
here) for readers to discover, I am OK with the design.

> 	diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
> 	index 1593da45eae..568987064b2 100644
> 	--- a/contrib/scalar/scalar.txt
> 	+++ b/contrib/scalar/scalar.txt
> 	@@ -40,6 +40,10 @@ register [<enlistment>]::
> 		and starts background maintenance. If `<enlistment>` is not provided,
> 		then the enlistment associated with the current working directory is
> 		registered.
> 	++
> 	+Note: when this subcommand is called in a worktree that is called `src/`, its
> 	+parent directory is considered to be the Scalar enlistment. If the worktree is
> 	+_not_ called `src/`, it itself will be considered to be the Scalar enlistment.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 10, 2021

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Dec 09, 2021 at 09:57:59AM -0800, Junio C Hamano wrote:

> > So I think this is as likely to cause somebody a headache due to a dumb
> > portability problem or random bitrot as it is to actually find a bug. I
> > guess test-extra wouldn't be run by default, but only via CI, so maybe
> > that limits the blast radius sufficiently.
> 
> Yeah, that is the exact thought I had when I did it.  Anybody who is
> not aware of test target other than 'test' will not be hurt, and we
> explicitly make the CI aware of 'test-all' to trigger it.  But as
> long as somebody bothered to write the tests, exercising them to
> reveal bitrot-bugs either in the tested contrib stuff or the tests
> themselves to be fixed or removed would be a good thing to do.

I'm don't have strong feelings on it either way. But if we think those
tests are worth running in CI, then...

> So I am tempted to do
> 
> test-extra: all
> 	$(MAKE) -C contrib/credential/netrc test
> 	$(MAKE) -C contrib/diff-highlight test
> 	: $(MAKE) -C contrib/mw-to-git test
> 	$(MAKE) -C contrib/subtree test

...we'd probably want to keep running mw-to-git tests, and teach one of
the CI environments to install the appropriate perl modules to avoid
skipping them.

-Peff

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 10, 2021

On the Git mailing list, Jeff King wrote (reply to this):

On Fri, Dec 10, 2021 at 03:38:53AM +0100, Ævar Arnfjörð Bjarmason wrote:

> I just don't think it makes any sense that I edit say refs.[ch], run
> "make test" locally, but only see that something broke in scalar's
> specific use of libgit.a later when I look at GitHub CI.

I'm definitely sympathetic to this. Having been surprised by CI failure
on something that worked locally is annoying at best, and downright
frustrating when you can't easily reproduce the problem.

But isn't that already true for most of the value that CI provides?
While part of its purpose may be a back-stop for folks who don't run
"make test" locally, I think the biggest value is that it covers a much
wider variety of platforms and scenarios that you don't get out of "make
test" already.

In some of those cases you can reproduce the problem locally by tweaking
build or test knobs. But in others it can be quite a bit more
challenging (e.g., something that segfaults only on Windows). At least
in the proposed change here you'd only be a "make test-all" away from
reproducing the problem locally.

I dunno. I don't feel that strongly either way about whether scalar
tests should be part of "make test". Mostly just observing that this is
not exactly a new case.

> If I'm preparing patches for submission I'll need to get CI passing, so
> I'll need to fix those tests & behavior either way as it's
> in-tree. Knowing about the failures later-not-sooner wastes more time,
> not less.

I think there's probably a tradeoff here. How often you get a "late"
notification of a bug (and how much of your time that wastes) versus how
much time you spend locally running tests that you don't care about.

I do agree that CI presents a bit of a conundrum for stuff at the edge
of the project. It's become a de facto requirement for it to pass. In
general that's good. But it means that features which were introduced
under the notion of "the people who care about this area will tend to
its maintenance" slowly become _everybody's_ problem as soon as they
have any CI coverage. Another example here is the cmake stuff. Or the
recent discussion about "-x" and bash.

I wonder if there's a good way to make some CI results informational,
rather than "failing". I.e., run scalar tests via CI, but if you're not
working on scalar, you don't have to care. Folks who are interested in
the area would keep tabs on those results and make sure that Junio's
tree stays passing.

That view disagrees with the final paragraph here, though:

> The reason we do that with the completion is because some changes to
> e.g. tweak getopts will need to have a corresponding change to the
> completion.
> 
> The reason we've not done that with contrib/{subtree,mw-to-git}/ is
> because those are thoroughly in the category of only incidentally being
> in-tree.
> [...]
> Scalar is thoroughly on the "completion" side of that divide, not
> "subtree".

I haven't followed the discussion closely, but in my mind "scalar" was
still in the "it may live in-tree for convenience, but people who aren't
working on it don't necessarily need to care about it" camp. Maybe
that's not the plan, though.

-Peff

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 10, 2021

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Fri, Dec 10 2021, Jeff King wrote:

> On Fri, Dec 10, 2021 at 03:38:53AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> I just don't think it makes any sense that I edit say refs.[ch], run
>> "make test" locally, but only see that something broke in scalar's
>> specific use of libgit.a later when I look at GitHub CI.
>
> I'm definitely sympathetic to this. Having been surprised by CI failure
> on something that worked locally is annoying at best, and downright
> frustrating when you can't easily reproduce the problem.
>
> But isn't that already true for most of the value that CI provides?
> While part of its purpose may be a back-stop for folks who don't run
> "make test" locally, I think the biggest value is that it covers a much
> wider variety of platforms and scenarios that you don't get out of "make
> test" already.
>
> In some of those cases you can reproduce the problem locally by tweaking
> build or test knobs. But in others it can be quite a bit more
> challenging (e.g., something that segfaults only on Windows). At least
> in the proposed change here you'd only be a "make test-all" away from
> reproducing the problem locally.
>
> I dunno. I don't feel that strongly either way about whether scalar
> tests should be part of "make test". Mostly just observing that this is
> not exactly a new case.

Yes. I'm not saying that "make test" should always run what a full CI
run covers.

Just that a proposed change that's really only adding one-more-test-file
testing a thing in contrib in the sense that we test
t/t9902-completion.sh should similarly be part of "make test".

>> If I'm preparing patches for submission I'll need to get CI passing, so
>> I'll need to fix those tests & behavior either way as it's
>> in-tree. Knowing about the failures later-not-sooner wastes more time,
>> not less.
>
> I think there's probably a tradeoff here. How often you get a "late"
> notification of a bug (and how much of your time that wastes) versus how
> much time you spend locally running tests that you don't care about.
>
> I do agree that CI presents a bit of a conundrum for stuff at the edge
> of the project. It's become a de facto requirement for it to pass. In
> general that's good. But it means that features which were introduced
> under the notion of "the people who care about this area will tend to
> its maintenance" slowly become _everybody's_ problem as soon as they
> have any CI coverage. Another example here is the cmake stuff. Or the
> recent discussion about "-x" and bash.
>
> I wonder if there's a good way to make some CI results informational,
> rather than "failing". I.e., run scalar tests via CI, but if you're not
> working on scalar, you don't have to care. Folks who are interested in
> the area would keep tabs on those results and make sure that Junio's
> tree stays passing.

I think if we're not caring about its failures in combination with
git.git changes there wouldn't be much point in having it in-tree at
all. That would just be like what we've got with git-cinnabar.git.

I would like it in tree. I just don' think the test/CI setup needs to be
a special snowflake.

> That view disagrees with the final paragraph here, though:
>
>> The reason we do that with the completion is because some changes to
>> e.g. tweak getopts will need to have a corresponding change to the
>> completion.
>> 
>> The reason we've not done that with contrib/{subtree,mw-to-git}/ is
>> because those are thoroughly in the category of only incidentally being
>> in-tree.
>> [...]
>> Scalar is thoroughly on the "completion" side of that divide, not
>> "subtree".
>
> I haven't followed the discussion closely, but in my mind "scalar" was
> still in the "it may live in-tree for convenience, but people who aren't
> working on it don't necessarily need to care about it" camp. Maybe
> that's not the plan, though.

Since v1 of the series[1] it's been compiled unconditionally, and there
have been tests. We just didn't run the tests.

In v6 the tests started being run as part of CI, which was ejected in
v10 due to "[an] unrelated patch series does not interact well with
them", which as I noted upthread in [2] isn't accurate, so I think the
stated reason for ejecting the CI from the proposed topic doesn't
reflect reality.

Since then 1d855a6b335 (Merge branch 'ab/ci-updates' into next,
2021-12-07) landed, so I'd think that any narrow tweaks to get the CI
working could be based on top of that topic.

1. https://lore.kernel.org/git/[email protected]/
2. https://lore.kernel.org/git/[email protected]/

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 10, 2021

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Wed, 8 Dec 2021, Junio C Hamano wrote:

> We ship contrib/ stuff within our primary source tree but except for
> the completion scripts that are tested from our primary test suite,
> their test suites are not run in the CI.
>
> Teach the main Makefile a "test-extra" target, which goes into each
> package in contrib/ whose Makefile has its own "test" target and
> runs "make test" there.  Add a "test-all" target to make it easy to
> drive both the primary tests and these contrib tests from CI and use
> it.

That sends a strong message that the stuff in contrib/ is now fully under
your maintenance, i.e. first-class supported.

If I were you, I wouldn't.

> Junio C Hamano <[email protected]> writes:
>
> > That is an interesting way to demonstrate how orthogonal the issues
> > are, which in turn means that it is not such a big deal to add back
> > the coverage to the part that goes to contrib/scalar/.

I'd rather focus, _some_ focus, on the actual Scalar idea and code.

> > As the actual implementation, it is a bit too icky, though.
>
> So, how about doing it this way?  This is based on 'master' and does
> not cover contrib/scalar, but if we want to go this route, it should
> be trivial to do it on top of a merge of ab/ci-updates and js/scalar
> into 'master'.  Good idea?  Terrible idea?  Not good enough?

Peff mentioned a couple of times how tedious it is to address CI failures
e.g. in the Windows part of Git's CI runs.

So it makes only sense to avoid the same problem with contrib/scalar/
altogether, especially as long as you keep saying that you are still
uncertain whether it will make it into Git as a top-level command.

Which is a strong argument in favor of just leaving the CI part of
contrib/scalar/ out for now, and let it remain _my_ responsibility to
react to any build/test problems arising from unrelated patch series
entering `seen`.

Doing it that way would also have the benefit of allowing more focus on
the actual code in contrib/scalar/scalar.c.

Not that it needs more review, I don't think, as both Stolee and Elijah
gave their thumbs-up already, and I've not received any feedback that
would require further changes to `scalar.c`, at least as of _this_ patch
series.

Ciao,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 10, 2021

On the Git mailing list, Elijah Newren wrote (reply to this):

On Thu, Dec 9, 2021 at 10:12 AM Junio C Hamano <[email protected]> wrote:
>
> Ævar Arnfjörð Bjarmason <[email protected]> writes:
>
> >> So, how about doing it this way?  This is based on 'master' and does
> >> not cover contrib/scalar, but if we want to go this route, it should
> >> be trivial to do it on top of a merge of ab/ci-updates and js/scalar
> >> into 'master'.  Good idea?  Terrible idea?  Not good enough?
> >
> > With the caveat that I think the greater direction here makes no sense,
> > i.e. scalar didn't need its own build system etc. in the first place, so
> > having hack-upon-hack to fix various integration issues is clearly worse
> > than just having it behave like everything else....
>
> We decided to start Scalar in contrib/, as it hasn't been proven
> that Scalar is in a good enough shape to deserve to be in this tree,
> and we are giving it a chance by adding it to contrib/ first, hoping
> that it may graduate to the more official status someday [*].

Is that the hope?  I thought the wish was for it to eventually
"disappear" rather than "graduate", as per the following bits of
Dscho's cover letter:

"""
The Scalar project was designed to be a self-destructing vehicle...For
example, partial clone, sparse-checkout, and scheduled background
maintenance have already been upstreamed and removed from Scalar
proper...[Adding Scalar to contrib will] make it substantially easier
to experiment with moving functionality from Scalar into core Git.
"""

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 10, 2021

On the Git mailing list, Johannes Schindelin wrote (reply to this):

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--8323328-1273704495-1639179826=:90
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Peff,

On Fri, 10 Dec 2021, Jeff King wrote:

> On Fri, Dec 10, 2021 at 03:38:53AM +0100, =C3=86var Arnfj=C3=B6r=C3=B0 B=
jarmason wrote:
>
> > I just don't think it makes any sense that I edit say refs.[ch], run
> > "make test" locally, but only see that something broke in scalar's
> > specific use of libgit.a later when I look at GitHub CI.
>
> I'm definitely sympathetic to this. Having been surprised by CI failure
> on something that worked locally is annoying at best, and downright
> frustrating when you can't easily reproduce the problem.

I feel your frustration. Same here.

> But isn't that already true for most of the value that CI provides?
> While part of its purpose may be a back-stop for folks who don't run
> "make test" locally, I think the biggest value is that it covers a much
> wider variety of platforms and scenarios that you don't get out of "make
> test" already.
>
> In some of those cases you can reproduce the problem locally by tweaking
> build or test knobs. But in others it can be quite a bit more
> challenging (e.g., something that segfaults only on Windows). At least
> in the proposed change here you'd only be a "make test-all" away from
> reproducing the problem locally.
>
> I dunno. I don't feel that strongly either way about whether scalar
> tests should be part of "make test". Mostly just observing that this is
> not exactly a new case.

It isn't a new case.

What is new is that we are talking about CI for patches targeting contrib/
specifically to introduce something cautiously that still has a chance of
not ending up in Git proper (for whatever reasons), as Junio seems to
be anxious to not give any premature "go" to integrate Scalar fully.

In that light, I am somewhat surprised that we are still discussing
putting a burden on contributors having to adapt contrib/scalar/ to
their changes, when Junio still endeavors the option of not accepting
that to-be-adapted code into core Git, after all.

I fully expected everybody to be on board with leaving the responsibility
to keep contrib/scalar/ building and passing the tests to _me_, until the
day Scalar is accepted as a full Git command (which might not happen).

> > If I'm preparing patches for submission I'll need to get CI passing, s=
o
> > I'll need to fix those tests & behavior either way as it's
> > in-tree. Knowing about the failures later-not-sooner wastes more time,
> > not less.
>
> I think there's probably a tradeoff here. How often you get a "late"
> notification of a bug (and how much of your time that wastes) versus how
> much time you spend locally running tests that you don't care about.
>
> I do agree that CI presents a bit of a conundrum for stuff at the edge
> of the project. It's become a de facto requirement for it to pass. In
> general that's good. But it means that features which were introduced
> under the notion of "the people who care about this area will tend to
> its maintenance" slowly become _everybody's_ problem as soon as they
> have any CI coverage. Another example here is the cmake stuff. Or the
> recent discussion about "-x" and bash.
>
> I wonder if there's a good way to make some CI results informational,
> rather than "failing". I.e., run scalar tests via CI, but if you're not
> working on scalar, you don't have to care. Folks who are interested in
> the area would keep tabs on those results and make sure that Junio's
> tree stays passing.
>
> That view disagrees with the final paragraph here, though:
>
> > The reason we do that with the completion is because some changes to
> > e.g. tweak getopts will need to have a corresponding change to the
> > completion.
> >
> > The reason we've not done that with contrib/{subtree,mw-to-git}/ is
> > because those are thoroughly in the category of only incidentally bein=
g
> > in-tree.
> > [...]
> > Scalar is thoroughly on the "completion" side of that divide, not
> > "subtree".
>
> I haven't followed the discussion closely, but in my mind "scalar" was
> still in the "it may live in-tree for convenience, but people who aren't
> working on it don't necessarily need to care about it" camp. Maybe
> that's not the plan, though.

I had hoped for a clearer answer from Junio where he sees Scalar in the
long term, for now he seems to be undecided.

As a consequence, I kept targeting contrib/scalar/ with this first patch
series, to leave the door open for keeping it in contrib/ as a "not
maintained by Junio!" part of the tree.

That is independent, of course, of my intention to keep maintaining
Scalar's code (once we get a few steps further, that is, because we're
still quite stuck here, the Scalar patch series has not seen any concerns
in the last half dozen iterations about its design nor about its actual
code). I intend to keep maintainig the Scalar code no matter whether it
lives in contrib/ or whether it will be turned into a first-class command
whose source code lives in the top-level directory.

So yes, from my side I do not understand at all where this notion comes
from that contrib/scalar/ should be treated any different than
contrib/subtree/ for now. At least until contrib/scalar/ is
feature-complete, that won't change.

But of course, we can keep discussing back and forth the build process of
Scalar, whether it should be tested in CI or not, whether it should be in
contrib/ or in the top-level directory or not in Git at all, without
getting the Scalar patches anywhere, for the next few years, in which case
the outcome of that discussion will be completely moot because the Scalar
patches would still be as stuck as they are right now. In which case it
would be super annoying for any contributor who had to adapt the code in
contrib/scalar/ to code changes in libgit.a, for no value in return
whatsoever. So far, that contributor has been me.

I sincerely hope that it won't come to that, and that we can move forward
with this here patch series, with the next ones I have lined up to make
Scalar feature-complete, and _then_ discuss the merits of making Scalar a
first-class Git command or not. At that point we will automatically have
the answer whether to build Scalar and run its tests as part of Git's CI.

Ciao,
Dscho

--8323328-1273704495-1639179826=:90--

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 11, 2021

On the Git mailing list, Johannes Schindelin wrote (reply to this):

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--8323328-181199005-1639182586=:90
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Junio,

On Wed, 8 Dec 2021, Junio C Hamano wrote:

> Johannes Schindelin <[email protected]> writes:
>
> > The Scalar Functional Tests were designed with Azure Repos in mind, i.=
e.
> > they specifically verify that the `gvfs-helper` (emulating Partial Clo=
ne
> > using the predecessor of Partial Clone, the GVFS protocol) manages to
> > access the repositories in the intended way.
> > ...
> > I do realize, though, that clarity of intention has been missing from =
this
> > mail thread all around, so let me ask point blank: Junio, do you want =
me
> > to include upstreaming `gvfs-helper` in the overall Scalar plan?
>
> Sorry, I do not follow.

In
https://lore.kernel.org/git/CABPp-BGpe9Q5k22Yu8a=3D1xwu=3DpZYSeNQoqEgf+DN0=
[email protected]/
(i.e. in the great great grand parent of this mail), you specifically
replied to my mentioning Scalar's Functional Test suite:

	> > One other thing is very interesting about that vfs-with-scalar
	> > branch thicket: it contains a GitHub workflow which will run
	> > Scalar's quite extensive Functional Tests suite. This test
	> > suite is quite comprehensive and caught us a lot of bugs in
	> > the past, not only in the Scalar code, but also core Git.
	>
	> From your wording it sounds like the plan might not include
	> moving these tests over.  Perhaps it doesn't make sense to move
	> them all over, but since they've caught problems in both Scalar
	> and core Git, it would be nice to see many of those tests come
	> to Git as well as part of a future follow on series.

I had mentioned a couple of times that I had no intention to move Scalar's
Function Tests into contrib/scalar/, and your wording "it would be nice to
see many of those tests come to Git as well" made it sound as if you
disagreed with that intention.

But it was not a clear "please do port them over" nor a "nah, we don't
want that test suite implemented in C# and requiring, for the most part,
access to a dedicacted Azure Repo".

Hence I was asking for a clear answer to the question whether you want me
to spend time on preparing a patch series to contribute Scalar's
Functional Tests to contrib/scalar/ as well.

I _suspect_ your clear answer, if you are willing to give it as clearly,
to be "no, we do not do integration tests here, and besides, C# is not a
language we want to add to Git's tree".

> What I was lamenting about was the lack of CI test coverage of stuff
> that is already being considered to go 'next'.  Specifically, since
> contrib/scalar/Makefile in 'seen' has a 'test' target, it would be a
> shame not to exercise it, when we should be able to do so in the CI
> fairly easily.

We do have a very different understanding of "fairly easily" in that case.
Three iterations, and three weeks time spent on implementing what you
suggest, only to see broken by the merge of the `ab/ci-updates` patch
series, suggesting a fixup for the incorrect merge, seeing that fixup
rejected, and then more discussing, all of that does not strike me as
"fairly easily". It strikes me as "a lot of time and effort was spent,
mostly stepping on toes".

Granted, if `ab/ci-updates` would not have happened, it would have been
much easier. Or if `ab/ci-updates` had waited until `js/scalar` advanced
to `next`. But the way it happened was (unnecessarily?) un-easy.

> I fail to see what gvfs-helper has to do with anything in the
> context of advancing the js/scalar topic as we have today.

Okay, okay! I was just asking about gvfs-helper because that would be
required to port over Scalar's Functional Tests. The same Functional Tests
that I heard you mentioning would be "nice to see" to "come to Git as
well".

> If "The Scalar Functional Tests" that were designed with Azure Repos in
> mind is not a good fit to come into contrib/scalar/, it is fine not to
> have it here---lack of it would not make the test target you have in
> contrib/scalar/Makefile any less valuable, I would think.

The test target won't go anywhere, no worries. Just like the test target
in contrib/subtree/ does not go anywhere.

And just like `contrib/subtree/`, it does not have to be run as part of
Git's CI build.

> Unless you are saying that "make -C contrib/scalar test" is useless,
> that is.  But I do not think that is the case.

It is as useful as `make -C contrib/subtree test`. Which, as =C3=86var wil=
l
readily offer, is broken, because it does not ensure that top-level `make
all` is executed and therefore in a fresh checkout will fail.

Of course, I disagree that it is "broken". It works as designed. It is in
the contrib/ part of the tree, i.e. safely in the realm of "you have to
build Git first, and then the thing in contrib/". In other words, the idea
to "fix" this kind of "broken"ness is a solution in search of a problem.

And as I have said multiple times, I still think that having Scalar's code
in contrib/ is a good spot to experiment with it. It sends the right
signal of "this is not really something we promise to maintain just yet".
It is a logical place for code that developers can build themselves, but
that is not built and installed with Git by default.

Having it in the Git tree will give interested developers a chance who
want to clone a large repository on Linux, without having to touch
anything with "Microsoft" in its repository name.

Having it in the Git tree will give interested developers a chance to
experiment with things like "let's try to let `scalar clone` _not_
clone into `<enlistment>/src/`, but instead create a bare clone in
`<enlistment>/.git` and make `<enlistment>/src/` a worktree". Things like
that.

I would find those things quite a bit more useful than to force regular
Git contributors who want to change libgit.a (even if it is just pointless
refactoring) to pay attention to contrib/scalar/ in CI, when there is
still no clear answer whether Scalar will even become a first-class Git
command eventually (which I hope it will, of course).

Ciao,
Dscho

--8323328-181199005-1639182586=:90--

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 11, 2021

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Sat, Dec 11 2021, Johannes Schindelin wrote:

> Hi Junio,
> [...]
> We do have a very different understanding of "fairly easily" in that case.
> Three iterations, and three weeks time spent on implementing what you
> suggest, only to see broken by the merge of the `ab/ci-updates` patch
> series, suggesting a fixup for the incorrect merge, seeing that fixup
> rejected, and then more discussing, all of that does not strike me as
> "fairly easily". It strikes me as "a lot of time and effort was spent,
> mostly stepping on toes".

I sent you a working path to a fixup in [1] on the 23rd of November
where we won't go from running zero tests in compile-only to running
just the scalar test.

Junio replied[2] ("the above" referring to [1]):

    I think the above shows that it is a bug in the topic itself,

You didn't reply further in that fixup thread, and then your v9 re-roll
a week later still had the same issue[3] discussed therein. I again
pointed that out[4]:

    Is it intentional that the previously compile-only "pedantic" job is now
    running the scalar tests?

You didn't reply, but in your v10 decided to make the current iteration
of this series have no CI testing at all, and cited the interaction with
ab/ci-updates[4]:

    because a recent unrelated patch series does not interact well with them.

Which I think is clearly inaccurate, because...

> Granted, if `ab/ci-updates` would not have happened, it would have been
> much easier. Or if `ab/ci-updates` had waited until `js/scalar` advanced
> to `next`. But the way it happened was (unnecessarily?) un-easy.

...your initial patch to run the scalar tests in CI[5] was part of v7, and
had the issue described above. It pre-dates the v1 of ab/ci-updates
being on-list by a couple of days[6].

So yes, I do think it was "easy", as in that was an easy fix-up. You
just didn't follow up on it and submitted re-rolls with the already
noted breakage.

I don't blame you for that, maybe you were busy, it slipped through
etc.

But I don't accept that delays in this topic are my fault, or something
to the effect that that this whole saga represents some failure of the
review process.

Our topics textually/semantically conflicted, it happens. I offered a
fixup & way forward. Fixing it was trivial, and still is. You just
didn't follow-up.

> [...]
>> If "The Scalar Functional Tests" that were designed with Azure Repos in
>> mind is not a good fit to come into contrib/scalar/, it is fine not to
>> have it here---lack of it would not make the test target you have in
>> contrib/scalar/Makefile any less valuable, I would think.
>
> The test target won't go anywhere, no worries. Just like the test target
> in contrib/subtree/ does not go anywhere.
>
> And just like `contrib/subtree/`, it does not have to be run as part of
> Git's CI build.

But unlike contrib/completion, which we do run as part of Git's CI
build[7]?

>> Unless you are saying that "make -C contrib/scalar test" is useless,
>> that is.  But I do not think that is the case.
>
> It is as useful as `make -C contrib/subtree test`. Which, as Ævar will
> readily offer, is broken, because it does not ensure that top-level `make
> all` is executed and therefore in a fresh checkout will fail.

Before the scalar topic there was only one "make" entry point to build
libgit.a, contrib/scalar/Makefile makes that two. That was the immediate
prompt for the fixup discussion in [1].

So no, I won't offer that "make -C contrib/subtree test" is broken, it
doesn't try to build libgit.a and errors out right away if git isn't
built.

Your scalar patches do try, get most of the way there, and fail.

Your bicycle isn't broken if it doesn't make coffee, but if your fridge
has a built-in coffee maker and it doesn't work it's broken, at least as
it pertains to its coffee making function.

I think I made that distinction clear in [8], but apparently not clear
enough, as you seem to be under the impression that I was conveying the
opposite of the idea I was trying to get across.

> Of course, I disagree that it is "broken". It works as designed. It is in
> the contrib/ part of the tree, i.e. safely in the realm of "you have to
> build Git first, and then the thing in contrib/". In other words, the idea
> to "fix" this kind of "broken"ness is a solution in search of a problem.

I agree with that, but it's your proposed patches that contain the build
integration you're describing as unnecessary for "contrib/subtree/". In
v8->v8 of the series you changed the CI integration from:

    make -C contrib/scalar test

To:

    make && make -C contrib/scalar test

While keeping the bits in contrib/scalar/Makefile that made it go most
of the way towards a working "libgit.a" useful for testing, but it
breaks before we get everything we need to run the "test" target.

Which I find to be odd given the above comparison to contib/subtree/. If
you have to build git first at the top level why is it trying and
failing to build git? "contrib/subtree" doesn't.

> [...]
> I would find those things quite a bit more useful than to force regular
> Git contributors who want to change libgit.a (even if it is just pointless
> refactoring) to pay attention to contrib/scalar/ in CI, when there is
> still no clear answer whether Scalar will even become a first-class Git
> command eventually (which I hope it will, of course).

It's in-tree, scalar.c is compiled by default, so they'll have to choice
but to pay attention to it.

The question is whether we should have test and CI coverage for code in
that state.

1. https://lore.kernel.org/git/[email protected]/
2. https://lore.kernel.org/git/[email protected]/
3. https://lore.kernel.org/git/[email protected]/
4. https://lore.kernel.org/git/[email protected]/
5. https://lore.kernel.org/git/1b0328fa236a35c2427b82f53c32944e513580d3.1637158762.git.gitgitgadget@gmail.com/
6. https://lore.kernel.org/git/[email protected]/
7. https://lore.kernel.org/git/[email protected]/
8. https://lore.kernel.org/git/[email protected]/
9. https://lore.kernel.org/git/[email protected]/

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 11, 2021

There was a status update in the "Cooking" section about the branch js/scalar on the Git mailing list:

Add pieces from "scalar" to contrib/.

Will merge to 'master'.
source: <[email protected]>

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 11, 2021

On the Git mailing list, Elijah Newren wrote (reply to this):

Hi Dscho,

On Fri, Dec 10, 2021 at 4:29 PM Johannes Schindelin
<[email protected]> wrote:
>
> Hi Junio,
>
> On Wed, 8 Dec 2021, Junio C Hamano wrote:
>
> > Johannes Schindelin <[email protected]> writes:
> >
> > > The Scalar Functional Tests were designed with Azure Repos in mind, i.e.
> > > they specifically verify that the `gvfs-helper` (emulating Partial Clone
> > > using the predecessor of Partial Clone, the GVFS protocol) manages to
> > > access the repositories in the intended way.
> > > ...
> > > I do realize, though, that clarity of intention has been missing from this
> > > mail thread all around, so let me ask point blank: Junio, do you want me
> > > to include upstreaming `gvfs-helper` in the overall Scalar plan?
> >
> > Sorry, I do not follow.
>
> In
> https://lore.kernel.org/git/CABPp-BGpe9Q5k22Yu8a=1xwu=pZYSeNQoqEgf+DN07cU4EB1ew@mail.gmail.com/
> (i.e. in the great great grand parent of this mail), you specifically
> replied to my mentioning Scalar's Functional Test suite:
>
>         > > One other thing is very interesting about that vfs-with-scalar
>         > > branch thicket: it contains a GitHub workflow which will run
>         > > Scalar's quite extensive Functional Tests suite. This test
>         > > suite is quite comprehensive and caught us a lot of bugs in
>         > > the past, not only in the Scalar code, but also core Git.
>         >
>         > From your wording it sounds like the plan might not include
>         > moving these tests over.  Perhaps it doesn't make sense to move
>         > them all over, but since they've caught problems in both Scalar
>         > and core Git, it would be nice to see many of those tests come
>         > to Git as well as part of a future follow on series.

This is me and my email you are quoting; these aren't Junio's words.
I'm afraid my confusion may have snowballed for others here.  Sorry
about that.

I simply misunderstood at the time -- I thought there were scalar-only
tests (rather than scalar+gvfs tests) that were not being considered
for upstreaming.  As I mentioned before[1], I'm sorry for the
confusion and seemingly opening an unrelated can of worms.  I agree
that we don't need gvfs tests, or tests that combine gvfs with other
things like scalar, or c# tests.

[1] https://lore.kernel.org/git/CABPp-BFmNiqY=NfN7Ys3XE8wYBn1EQ_War+0QLq96Tk7FO6zfg@mail.gmail.com/

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 11, 2021

On the Git mailing list, Bagas Sanjaya wrote (reply to this):

On 09/12/21 03.04, Junio C Hamano wrote:
> We ship contrib/ stuff within our primary source tree but except for
> the completion scripts that are tested from our primary test suite,
> their test suites are not run in the CI.
> 
> Teach the main Makefile a "test-extra" target, which goes into each
> package in contrib/ whose Makefile has its own "test" target and
> runs "make test" there.  Add a "test-all" target to make it easy to
> drive both the primary tests and these contrib tests from CI and use
> it.
> 
> Signed-off-by: Junio C Hamano <[email protected]>

No test failures found with test-all on my system.

Tested-by: Bagas Sanjaya <[email protected]>

-- 
An old man doll... just what I always wanted! - Clara

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 11, 2021

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Elijah,

On Fri, 10 Dec 2021, Elijah Newren wrote:

> On Fri, Dec 10, 2021 at 4:29 PM Johannes Schindelin
> <[email protected]> wrote:
> >
> > On Wed, 8 Dec 2021, Junio C Hamano wrote:
> >
> > > Johannes Schindelin <[email protected]> writes:
> > >
> > > > The Scalar Functional Tests were designed with Azure Repos in mind, i.e.
> > > > they specifically verify that the `gvfs-helper` (emulating Partial Clone
> > > > using the predecessor of Partial Clone, the GVFS protocol) manages to
> > > > access the repositories in the intended way.
> > > > ...
> > > > I do realize, though, that clarity of intention has been missing from this
> > > > mail thread all around, so let me ask point blank: Junio, do you want me
> > > > to include upstreaming `gvfs-helper` in the overall Scalar plan?
> > >
> > > Sorry, I do not follow.
> >
> > In
> > https://lore.kernel.org/git/CABPp-BGpe9Q5k22Yu8a=1xwu=pZYSeNQoqEgf+DN07cU4EB1ew@mail.gmail.com/
> > (i.e. in the great great grand parent of this mail), you specifically
> > replied to my mentioning Scalar's Functional Test suite:
> >
> >         > > One other thing is very interesting about that vfs-with-scalar
> >         > > branch thicket: it contains a GitHub workflow which will run
> >         > > Scalar's quite extensive Functional Tests suite. This test
> >         > > suite is quite comprehensive and caught us a lot of bugs in
> >         > > the past, not only in the Scalar code, but also core Git.
> >         >
> >         > From your wording it sounds like the plan might not include
> >         > moving these tests over.  Perhaps it doesn't make sense to move
> >         > them all over, but since they've caught problems in both Scalar
> >         > and core Git, it would be nice to see many of those tests come
> >         > to Git as well as part of a future follow on series.
>
> This is me and my email you are quoting; these aren't Junio's words.
> I'm afraid my confusion may have snowballed for others here.  Sorry
> about that.
>
> I simply misunderstood at the time -- I thought there were scalar-only
> tests (rather than scalar+gvfs tests) that were not being considered
> for upstreaming.  As I mentioned before[1], I'm sorry for the
> confusion and seemingly opening an unrelated can of worms.  I agree
> that we don't need gvfs tests, or tests that combine gvfs with other
> things like scalar, or c# tests.
>
> [1] https://lore.kernel.org/git/CABPp-BFmNiqY=NfN7Ys3XE8wYBn1EQ_War+0QLq96Tk7FO6zfg@mail.gmail.com/

No worries, I am glad it is sorted out now.

Ciao,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 13, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <[email protected]> writes:

>> Teach the main Makefile a "test-extra" target, which goes into each
>> package in contrib/ whose Makefile has its own "test" target and
>> runs "make test" there.  Add a "test-all" target to make it easy to
>> drive both the primary tests and these contrib tests from CI and use
>> it.
>
> That sends a strong message that the stuff in contrib/ is now fully under
> your maintenance, i.e. first-class supported.

I do not think running tests on stuff in contrib/ sends any such
message.  It primarily helps _us_ to catch more regressions than we
may otherwise miss.  By the way, this is not limited to contrib/; if
we had tests for gitk, we would have caught the recent regression in
"diff -m" before it got inflicted on the general public, but that
would not have been just to help "gitk", but to help keep "diff -m"
sane and stable [*].

By running tests on in-tree contrib/ like scalar, at least we would
notice when we are making breaking changes.  At least, the need for
scalar (either for the API broken by such a change to be kept
unchanged or done in a different way, or the code that uses the API
on the scalar side to be updated) would be noticed earlier than
stuff totally outside and not even in contrib/.

Of course, you have to bear the burden of (A) changing the way
scalar uses the API, or (B) participating in the design of the
change to the API that may break scalar's use so that everybody
including scalar would be happy, or both.  It's not like I am
responsible for everything that happens in the tree, and it is our
shared responsibility to maintain the health of the codebase.  It is
not limited to stuff inside or outside contrib/.

There are projects that want to use libgit.a by binding us as a
submodule and without interacting with us very much.  And they are
on their own when we change the internals.  Do you mean that you
want to make scalar into the same status as they are?

> Not that it needs more review, I don't think, as both Stolee and Elijah
> gave their thumbs-up already, and I've not received any feedback that
> would require further changes to `scalar.c`, at least as of _this_ patch
> series.

So that argues even more to have a way to make sure we catch
unintended breakages by any future mindless tree-wide "clean-ups"
and interface changes, no?


[Footnote]

* I just double checked the candidates for "test-extra" to see if
  they are meant to run with a random Git they happen to see on the
  $PATH, or they are designed to test with the version of Git we
  just built, and it seems it is the latter for the ones nominated
  in the test-extra patch.  Otherwise it would indeed reduce the
  benefit in half---we are not helping to catch regressions in the
  core stuff in such a case.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 13, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

Elijah Newren <[email protected]> writes:

> On Thu, Dec 9, 2021 at 10:12 AM Junio C Hamano <[email protected]> wrote:
>>
>> Ævar Arnfjörð Bjarmason <[email protected]> writes:
>>
>> >> So, how about doing it this way?  This is based on 'master' and does
>> >> not cover contrib/scalar, but if we want to go this route, it should
>> >> be trivial to do it on top of a merge of ab/ci-updates and js/scalar
>> >> into 'master'.  Good idea?  Terrible idea?  Not good enough?
>> >
>> > With the caveat that I think the greater direction here makes no sense,
>> > i.e. scalar didn't need its own build system etc. in the first place, so
>> > having hack-upon-hack to fix various integration issues is clearly worse
>> > than just having it behave like everything else....
>>
>> We decided to start Scalar in contrib/, as it hasn't been proven
>> that Scalar is in a good enough shape to deserve to be in this tree,
>> and we are giving it a chance by adding it to contrib/ first, hoping
>> that it may graduate to the more official status someday [*].
>
> Is that the hope?  I thought the wish was for it to eventually
> "disappear" rather than "graduate", as per the following bits of
> Dscho's cover letter:
>
> """
> The Scalar project was designed to be a self-destructing vehicle...For
> example, partial clone, sparse-checkout, and scheduled background
> maintenance have already been upstreamed and removed from Scalar
> proper...[Adding Scalar to contrib will] make it substantially easier
> to experiment with moving functionality from Scalar into core Git.
> """

I can go either way, but my impression from Dscho's messages has
always been that there is no strong reason to switch existing scalar
users to say "git clone <options that give behaviour like scalar>"
when their fingers and scripts are used to say "scalar <this>", and
a very thin shell may remain in some form in the ideal world.


@gitgitgadget
Copy link

gitgitgadget bot commented Dec 13, 2021

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <[email protected]> writes:

> I'm don't have strong feelings on it either way. But if we think those
> tests are worth running in CI, then...
>
>> So I am tempted to do
>> 
>> test-extra: all
>> 	$(MAKE) -C contrib/credential/netrc test
>> 	$(MAKE) -C contrib/diff-highlight test
>> 	: $(MAKE) -C contrib/mw-to-git test
>> 	$(MAKE) -C contrib/subtree test
>
> ...we'd probably want to keep running mw-to-git tests, and teach one of
> the CI environments to install the appropriate perl modules to avoid
> skipping them.

I saw netrc credential helper break on one of the jobs that lack
Perl, so the test there needs to be fixed before we can include it
in test-extra.



@gitgitgadget
Copy link

gitgitgadget bot commented Dec 14, 2021

On the Git mailing list, Jeff King wrote (reply to this):

On Mon, Dec 13, 2021 at 12:42:37AM -0800, Junio C Hamano wrote:

> Johannes Schindelin <[email protected]> writes:
> 
> >> Teach the main Makefile a "test-extra" target, which goes into each
> >> package in contrib/ whose Makefile has its own "test" target and
> >> runs "make test" there.  Add a "test-all" target to make it easy to
> >> drive both the primary tests and these contrib tests from CI and use
> >> it.
> >
> > That sends a strong message that the stuff in contrib/ is now fully under
> > your maintenance, i.e. first-class supported.
> 
> I do not think running tests on stuff in contrib/ sends any such
> message.  It primarily helps _us_ to catch more regressions than we
> may otherwise miss.  By the way, this is not limited to contrib/; if
> we had tests for gitk, we would have caught the recent regression in
> "diff -m" before it got inflicted on the general public, but that
> would not have been just to help "gitk", but to help keep "diff -m"
> sane and stable [*].

I'd actually be a lot more sympathetic to automatically running gitk
tests, because it's just consuming the public API of git (i.e., the
scriptable plumbing interface). If we accidentally break that, it is the
problem of the person who made the breaking change, and we would want
them to know it as soon as possible.

With something like scalar, though, it is adding new callers of the
private API. It might be useful for somebody doing tree-wide refactoring
to know they've broken something there. But it might also be a hassle,
because now they have to care about fixing it, if they are interested in
un-breaking their build (or un-breaking CI). The scalar code is now
their problem, even though it's "just" in contrib/.

In other words, it comes down to a question of where the burden for
fixing things lies. Of course it is nice if somebody doing tree-wide
refactoring fixes up scalar, too. But by making it optional to build
and/or test stuff in contrib/ (rather than tying it to "make all" or to
CI), it lets people decide how nice they want to be.

For other stuff in contrib/, I'm not sure to what degree it applies.
diff-highlight is pretty standalone for instance. I guess it _could_ be
broken by a public-API change in Git, but I find it pretty unlikely.

> Of course, you have to bear the burden of (A) changing the way
> scalar uses the API, or (B) participating in the design of the
> change to the API that may break scalar's use so that everybody
> including scalar would be happy, or both.  It's not like I am
> responsible for everything that happens in the tree, and it is our
> shared responsibility to maintain the health of the codebase.  It is
> not limited to stuff inside or outside contrib/.
> 
> There are projects that want to use libgit.a by binding us as a
> submodule and without interacting with us very much.  And they are
> on their own when we change the internals.  Do you mean that you
> want to make scalar into the same status as they are?

I kind of thought that final paragraph was the plan, at least to start
with.

-Peff

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 14, 2021

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Dec 14, 2021 at 08:16:26AM -0500, Jeff King wrote:

> > There are projects that want to use libgit.a by binding us as a
> > submodule and without interacting with us very much.  And they are
> > on their own when we change the internals.  Do you mean that you
> > want to make scalar into the same status as they are?
> 
> I kind of thought that final paragraph was the plan, at least to start
> with.

Oh, and just to be clear: I am really OK with either direction. I'm only
claiming that I think both approaches are self-consistent and are making
a tradeoff (finding bugs earlier, versus shifting burden of bug-fixing
around).

-Peff

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 14, 2021

This patch series was integrated into seen via git@383d342.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 14, 2021

This patch series was integrated into next via git@6248603.

@gitgitgadget gitgitgadget bot added the next label Dec 14, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Dec 15, 2021

This patch series was integrated into seen via git@ae4fb07.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 15, 2021

This patch series was integrated into seen via git@b46ea1e.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 16, 2021

There was a status update in the "Cooking" section about the branch js/scalar on the Git mailing list:

Add pieces from "scalar" to contrib/.

Will merge to 'master'.
source: <[email protected]>

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

This patch series was integrated into seen via git@62e83d4.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

This patch series was integrated into next via git@62e83d4.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

This patch series was integrated into master via git@62e83d4.

@gitgitgadget gitgitgadget bot added the master label Dec 21, 2021
@gitgitgadget gitgitgadget bot closed this Dec 21, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

Closed via 62e83d4.

@dscho dscho deleted the scalar-the-beginning branch December 22, 2021 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants