Skip to content

Conversation

@edolstra
Copy link
Collaborator

@edolstra edolstra commented Nov 3, 2025

Motivation

For repos with a lot of non-linearity in the commit graph (like Nixpkgs), this speeds up getting the revcount a lot, e.g. nix flake metadata /path/to/nixpkgs?rev=9dc7035bbee85ffc740d893e02cb64460f11989f went from 9.1s to 3.7s.

Context

Summary by CodeRabbit

  • Performance Improvements
    • Repository operations now benefit from concurrent processing, resulting in improved performance during commit traversal and repository scans.

For repos with a lot of non-linearity in the commit graph (like
Nixpkgs), this speeds up getting the revcount a lot, e.g. `nix flake
metadata /path/to/nixpkgs?rev=9dc7035bbee85ffc740d893e02cb64460f11989f` went
from 9.1s to 3.7s.
@coderabbitai
Copy link

coderabbitai bot commented Nov 3, 2025

Walkthrough

The code refactors repository revision counting to use concurrent multithreading. It introduces a getPool() method on GitRepoImpl, replaces single-threaded commit traversal with thread pool-based parallel processing, and switches from boost::unordered_flat_set to boost::concurrent_flat_set for thread-safe commit tracking.

Changes

Cohort / File(s) Change Summary
Concurrent Git Revision Counting
src/libfetchers/git-utils.cc
Adds getPool() method to GitRepoImpl; refactors getRevCount() to use concurrent traversal with ThreadPool and boost::concurrent_flat_set instead of single-threaded processing; updates GitFileSystemObjectSinkImpl to utilize the new pool-based approach; adds includes for concurrent containers and ranges utilities

Sequence Diagram

sequenceDiagram
    participant Caller as Caller
    participant GRI as GitRepoImpl
    participant Pool as ThreadPool
    participant RepoPool as RepoPool
    participant Worker as Worker Lambda

    Caller->>GRI: getRevCount(start)
    activate GRI
    
    Note over GRI: Initialize
    GRI->>GRI: getPool() → RepoPool
    GRI->>RepoPool: create concurrent_flat_set
    GRI->>RepoPool: insert startOid → done set
    GRI->>GRI: create ThreadPool
    
    Note over GRI: Parallel Traversal
    GRI->>Pool: spawn workers
    activate Pool
    
    loop Concurrent Processing
        Pool->>Worker: process commit parent
        activate Worker
        Worker->>Worker: retrieve parentOID
        Worker->>Worker: check if seen
        alt Not yet processed
            Worker->>RepoPool: insert into done set
            Worker->>Pool: enqueue next parent
        else Already processed
            Worker->>Worker: skip
        end
        deactivate Worker
    end
    
    Pool->>GRI: synchronize (wait completion)
    deactivate Pool
    
    GRI->>Caller: return revision count
    deactivate GRI
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Threading logic: Introduces ThreadPool-based concurrent processing; ensure proper synchronization and no race conditions exist beyond the concurrent container's guarantees
  • Concurrent data structures: Migration from unordered_flat_set to concurrent_flat_set requires validation of semantics (performance, contention, rollback behavior if needed)
  • New public API: getPool() method addition affects API surface; verify it aligns with broader design patterns in the codebase
  • Error handling alignment: Changes to parentOID retrieval and validation within the lambda; ensure error propagation remains correct in multithreaded context
  • Integration point: GitFileSystemObjectSinkImpl changes; verify the pool usage doesn't introduce new failure modes

Poem

🐰 Hopping through commits in parallel stride,
ThreadPools dancing, none left behind,
Concurrent rabbits, checking git's tide,
Faster together, what treasures we'll find!

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'GitRepo::getRevCount(): Compute revcount in parallel' directly and specifically describes the main change in the changeset. It clearly identifies the method being modified (getRevCount), the class it belongs to (GitRepo), and the key improvement (parallel computation). This aligns perfectly with the documented objectives of adding parallel computation to speed up revcount calculation and the implemented changes that introduce multithreaded capability for repository operations.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch parallel-revcount

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Nov 3, 2025

@github-actions github-actions bot temporarily deployed to pull request November 3, 2025 14:35 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 236ebef and 2533c66.

📒 Files selected for processing (1)
  • src/libfetchers/git-utils.cc (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/libfetchers/git-utils.cc (2)
src/libutil/include/nix/util/pool.hh (2)
  • Pool (66-75)
  • Pool (93-99)
src/libfetchers/include/nix/fetchers/git-utils.hh (5)
  • rev (31-31)
  • rev (33-33)
  • rev (85-85)
  • rev (92-92)
  • rev (107-107)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build_aarch64-darwin / build
  • GitHub Check: build_x86_64-linux / build

@edolstra edolstra added this pull request to the merge queue Nov 3, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 3, 2025
@cole-h cole-h added this pull request to the merge queue Nov 3, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Nov 3, 2025
@edolstra edolstra added this pull request to the merge queue Nov 4, 2025
Merged via the queue into main with commit 2ea3231 Nov 4, 2025
35 checks passed
@edolstra edolstra deleted the parallel-revcount branch November 4, 2025 14:23
@cole-h
Copy link
Member

cole-h commented Nov 4, 2025

upstream ref NixOS#14462

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants