Skip to content

dulwich==0.23.1 breaks libgit2 index parsing #1643

@skshetry

Description

@skshetry

The recent dulwich release (0.23.1) has started breaking DVC.

https://github.com/iterative/dvc/actions/runs/15990598700/job/45103035168#step:6:177

DVC uses multiple Git backends (dulwich, pygit2, and GitPython) to cover feature gaps across implementations and sometimes for performance reasons.

I bisected the issue to this commit 9596a2c, which started writing index extensions with empty extension data.

dulwich/dulwich/index.py

Lines 795 to 798 in 9596a2c

# Write extensions
if extensions:
for extension in extensions:
write_index_extension(f, extension)

The index format written by dulwich looks to be technically valid per the index-format spec. Git seems to handle this without the issue. But pygit2/libgit2 seems to expect the Cache Tree extension to at least have one entry.

Relevant code for reference:

I understand that this isn’t strictly a bug in dulwich since the format is spec-compliant. Still, would you be open to a patch that simply skips writing an index extension when the data is empty?

Minimal reproducer

# /// script
# dependencies = [
#   "dulwich==0.23.1",
#   "pygit2==1.18.0",
# ]
# ///
import tempfile

from dulwich.index import TreeExtension
from dulwich.porcelain import init
from pygit2 import Repository

path = tempfile.mkdtemp()
print(path)
with init(path) as repo:
    index = repo.open_index()
    index._extensions.append(TreeExtension.from_bytes(b""))
    index.write()


pyg_repo = Repository(path)
pyg_repo.index.read()
uv run script.py
/var/folders/3g/1vds4g8d4p3909hrwr65j6300000gn/T/tmp3u5lf5iw
Traceback (most recent call last):
  File "/Users/user/projects/dvcorg/dvc/script.py", line 22, in <module>
    pyg_repo.index.read()
    ^^^^^^^^^^^^^^
  File "/Users/user/.cache/uv/environments-v2/script-f78ce4ac73dbb248/lib/python3.13/site-packages/pygit2/repository.py", line 649, in index
    check_error(err, io=True)
    ~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/Users/user/.cache/uv/environments-v2/script-f78ce4ac73dbb248/lib/python3.13/site-packages/pygit2/errors.py", line 66, in check_error
    raise GitError(message)
_pygit2.GitError: corrupted TREE extension in index

Let me know if you’d like a PR for this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions