Skip to content

kpd: add optional mirror support #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,21 @@ poetry install
poetry run python -m unittest
```

### Mirror setup

To make more efficient use of network bandwidth consider having a mirror of your target git tree
under /mirror/ or something like that and set the configuration attribute "mirror_dir" variable to the
path where to find possible git trees.

If your git tree is a linux clone set the "linux_clone" to true. In that case, in case your
target exact basename repo is not in in the mirror path, for example {{ mirror_dir }}/linux-subsystem.git
then the extra fallback path of {{ mirror_dir }}/linux.git will be used as a reference target.

A reference target mirror path is only used if it exists. The mirror takes effect by leveraging
the git clone --reference option when cloning. Using this can save considerable bandwidth and
space, allowing kpd to run on thing guests on a corporate environment with for example an NFS
mount for local git trees on a network.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this section belongs to the README file.

It would be great to have a separate document about the config, if you don't mind making a draft of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

## Running
```
poetry run python -m kernel_patches_daemon --config <config_path> --label-color configs/labels.json
Expand Down
4 changes: 3 additions & 1 deletion configs/kpd.json
Original file line number Diff line number Diff line change
Expand Up @@ -41,5 +41,7 @@
"github_oauth_token": "<TOKEN>"
}
},
"base_directory": "/tmp/repos"
"base_directory": "/tmp/repos",
"mirror_dir": "/mirror/",
"linux_clone": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"linux_clone" is a confusing name for this flag, and also hardcoding "linux.git" seems unnecessary.

How about we change this option to smth like "mirror_fallback_repo", and set it to the path or git url of the fallback repository?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure..

}
25 changes: 24 additions & 1 deletion kernel_patches_daemon/branch_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -547,6 +547,8 @@ def __init__(
app_auth: Optional[Auth.AppInstallationAuth] = None,
email: Optional[EmailConfig] = None,
http_retries: Optional[int] = None,
linux_clone: bool = False,
mirror_dir: Optional[str] = None,
) -> None:
super().__init__(
repo_url=repo_url,
Expand All @@ -559,6 +561,8 @@ def __init__(
self.email = email

self.log_extractor = log_extractor
self.mirror_dir = mirror_dir
self.linux_clone = linux_clone
self.ci_repo_url = ci_repo_url
self.ci_repo_dir = _uniq_tmp_folder(ci_repo_url, ci_branch, base_directory)
self.ci_branch = ci_branch
Expand Down Expand Up @@ -682,9 +686,28 @@ def do_sync(self) -> None:
def full_sync(self, path: str, url: str, branch: str) -> git.Repo:
logging.info(f"Doing full clone from {redact_url(url)}, branch: {branch}")

multi_opts: Optional[List[str]] = None
if self.mirror_dir:
upstream_name = os.path.basename(self.upstream_url)
reference_path = os.path.join(self.mirror_dir, upstream_name)
fallback = None
if self.linux_clone:
fallback = os.path.join(self.mirror_dir, "linux.git")
if (
not os.path.exists(reference_path)
and fallback
and os.path.exists(fallback)
):
reference_path = fallback
if os.path.exists(reference_path):
multi_opts = ["--reference", reference_path]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL git clone --reference, very cool.

nit: Is it more appropriate to use --reference-if-able (man git-clone)? Although we already checked the path in python, so it probably doesn't matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not aware of --reference-if-able. yes. Good.


with HistogramMetricTimer(git_clone_duration, {"branch": branch}):
shutil.rmtree(path, ignore_errors=True)
repo = git.Repo.clone_from(url, path)
if multi_opts:
repo = git.Repo.clone_from(url, path, multi_options=multi_opts)
else:
repo = git.Repo.clone_from(url, path)
Comment on lines +707 to +710
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes a lot of sense to always have mirror_dir and use the option only to specify a path. And then effectively use it as a cache.

If a mirror doesn't exist, clone it from scratch to mirror_dir. And then always use --reference when syncing.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes perfect sense to me. I'll see if codex can parse, "look a this URL and make all the suggested changes".

_reset_repo(repo, f"origin/{branch}")

git_clone_counter.add(1, {"branch": branch})
Expand Down
4 changes: 4 additions & 0 deletions kernel_patches_daemon/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,8 @@ class KPDConfig:
branches: Dict[str, BranchConfig]
tag_to_branch_mapping: Dict[str, List[str]]
base_directory: str
mirror_dir: Optional[str] = None
linux_clone: bool = False

@classmethod
def from_json(cls, json: Dict) -> "KPDConfig":
Expand Down Expand Up @@ -203,6 +205,8 @@ def from_json(cls, json: Dict) -> "KPDConfig":
for name, json_config in json["branches"].items()
},
base_directory=json["base_directory"],
mirror_dir=json.get("mirror_dir"),
linux_clone=json.get("linux_clone", False),
)

@classmethod
Expand Down
2 changes: 2 additions & 0 deletions kernel_patches_daemon/github_sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,8 @@ def __init__(
ci_branch=branch_config.ci_branch,
log_extractor=_log_extractor_from_project(kpd_config.patchwork.project),
base_directory=kpd_config.base_directory,
mirror_dir=kpd_config.mirror_dir,
linux_clone=kpd_config.linux_clone,
Comment on lines +117 to +118
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since new config options are parameters of BranchWorker, let's make them branch-worker specific in the configuration file too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

http_retries=http_retries,
github_oauth_token=branch_config.github_oauth_token,
app_auth=github_app_auth_from_branch_config(branch_config),
Expand Down
53 changes: 53 additions & 0 deletions tests/test_branch_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
TEST_CI_REPO_URL = f"https://user:[email protected]/ci-org/{TEST_CI_REPO}"
TEST_CI_BRANCH = "test_ci_branch"
TEST_BASE_DIRECTORY = "/repos"
TEST_MIRROR_DIRECTORY = "/mirror"
TEST_BRANCH = "test-branch"
TEST_CONFIG: Dict[str, Any] = {
"version": 2,
Expand Down Expand Up @@ -124,6 +125,8 @@ def __init__(self, *args: Any, **kwargs: Any) -> None:
"ci_branch": TEST_CI_BRANCH,
"log_extractor": DefaultGithubLogExtractor(),
"base_directory": TEST_BASE_DIRECTORY,
"mirror_dir": None,
"linux_clone": False,
}
presets.update(kwargs)

Expand Down Expand Up @@ -464,6 +467,56 @@ def test_fetch_repo_path_exists_git_exception(self) -> None:
self._bw.fetch_repo(*fetch_params)
fr.assert_called_once_with(*fetch_params)

def test_full_sync_with_mirror_dir(self) -> None:
bw = BranchWorkerMock(mirror_dir=TEST_MIRROR_DIRECTORY)
reference = os.path.join(
TEST_MIRROR_DIRECTORY, os.path.basename(TEST_UPSTREAM_REPO_URL)
)
with (
patch("kernel_patches_daemon.branch_worker.os.path.exists") as exists,
patch("kernel_patches_daemon.branch_worker.shutil.rmtree") as rm,
):
exists.side_effect = lambda p: p == reference
bw.upstream_url = TEST_UPSTREAM_REPO_URL
bw.full_sync("somepath", "giturl", "branch")
self._git_repo_mock.clone_from.assert_called_once_with(
"giturl",
"somepath",
multi_options=["--reference", reference],
)

def test_full_sync_with_linux_mirror_fallback(self) -> None:
bw = BranchWorkerMock(mirror_dir=TEST_MIRROR_DIRECTORY, linux_clone=True)
fallback = os.path.join(TEST_MIRROR_DIRECTORY, "linux.git")
with (
patch("kernel_patches_daemon.branch_worker.os.path.exists") as exists,
patch("kernel_patches_daemon.branch_worker.shutil.rmtree") as rm,
):
exists.side_effect = lambda p: p == fallback
bw.upstream_url = TEST_UPSTREAM_REPO_URL
bw.full_sync("somepath", "giturl", "branch")
self._git_repo_mock.clone_from.assert_called_once_with(
"giturl",
"somepath",
multi_options=["--reference", fallback],
)

def test_full_sync_without_linux_mirror_fallback(self) -> None:
bw = BranchWorkerMock(mirror_dir=TEST_MIRROR_DIRECTORY, linux_clone=False)
fallback = os.path.join(TEST_MIRROR_DIRECTORY, "linux.git")
with (
patch("kernel_patches_daemon.branch_worker.os.path.exists") as exists,
patch("kernel_patches_daemon.branch_worker.shutil.rmtree") as rm,
):
exists.side_effect = lambda p: p == fallback
bw.upstream_url = TEST_UPSTREAM_REPO_URL
bw.full_sync("somepath", "giturl", "branch")
# Without linux_mirror we should not use fallback
self._git_repo_mock.clone_from.assert_called_once_with(
"giturl",
"somepath",
)

def test_expire_branches(self) -> None:
"""Only the branch that matches pattern and is expired should be deleted"""
not_expired_time = datetime.fromtimestamp(3 * BRANCH_TTL)
Expand Down
11 changes: 11 additions & 0 deletions tests/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,5 +208,16 @@ def test_valid(self) -> None:
),
},
base_directory="/repos",
mirror_dir=None,
linux_clone=False,
)
self.assertEqual(config, expected_config)

def test_linux_clone_enabled(self) -> None:
kpd_config_json = read_fixture("fixtures/kpd_config.json")
kpd_config_json["linux_clone"] = True

with patch("builtins.open", mock_open(read_data="TEST_KEY_FILE_CONTENT")):
config = KPDConfig.from_json(kpd_config_json)

self.assertTrue(config.linux_clone)
14 changes: 14 additions & 0 deletions tests/test_github_sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,20 @@ class TestCase:
gh.workers[TEST_BRANCH].ci_repo_dir.startswith(case.prefix),
)

def test_init_with_mirror_dir(self) -> None:
config = copy.copy(TEST_CONFIG)
config["mirror_dir"] = "/mirror"
kpd_config = KPDConfig.from_json(config)
gh = GithubSyncMock(kpd_config=kpd_config)
self.assertEqual("/mirror", gh.workers[TEST_BRANCH].mirror_dir)

def test_init_with_linux_clone(self) -> None:
config = copy.copy(TEST_CONFIG)
config["linux_clone"] = True
kpd_config = KPDConfig.from_json(config)
gh = GithubSyncMock(kpd_config=kpd_config)
self.assertTrue(gh.workers[TEST_BRANCH].linux_clone)

def test_close_existing_prs_for_series(self) -> None:
matching_pr_mock = MagicMock()
matching_pr_mock.title = "matching"
Expand Down