Fix Git pack-file parsing issue #5062

nikita-tkachenko-datadog · 2023-04-13T10:34:18Z

What Does This Do

This change fixes an issue that happens when parsing some Git pack-files.
Currently the logic reads Git object size from the pack file, then reads the number of bytes equal to that size, then decompresses the data that was read.
The problem is that the number that's being read in the first step is the size of the Git object AFTER decompression.
So the second steps reads more bytes than it needs, which is fine in most cases, as decompression stops once the end of the compressed data has been reached.
This is a problem, however, when the Git object is located at the end of the file - in that case the logic tries to read more bytes than there are available, and fails with an error.
The quick fix is to stop reading once the end of the file has been reached.

Motivation

Because of this issue, Git metadata cannot be extracted for some repositories (one example is Mybatis-3, which was used for testing and uncovered this issue).

… are available in the pack file

smola

If I understand the issue correctly, while this PR fixes the issue with end of file, the byte buffer allocated here is generally much larger than needed?

nikita-tkachenko-datadog · 2023-04-13T14:17:48Z

If I understand the issue correctly, while this PR fixes the issue with end of file, the byte buffer allocated here is generally much larger than needed?

Correct, the size of the buffer is equal to the size of the decompressed object, while the data that we read into it is the compressed object.

It is a quick-and-dirty fix, but fixing this "properly" would require much more changes in the code (which originally was not written by us, but ported from an opensource library).

I wouldn't say that allocating a larger buffer is much of an issue here - the buffer size is typically around 1KB, and we do this parsing only once during agent startup

Fix Git pack-file parsing issue: avoid trying to read more bytes than…

60cfa0f

… are available in the pack file

nikita-tkachenko-datadog added type: bug Bug report and fix comp: ci visibility Continuous Integration Visibility labels Apr 13, 2023

fix tests

e4d58e3

nikita-tkachenko-datadog marked this pull request as ready for review April 13, 2023 13:31

nikita-tkachenko-datadog requested a review from a team as a code owner April 13, 2023 13:31

smola approved these changes Apr 13, 2023

View reviewed changes

nikita-tkachenko-datadog merged commit 612fbd7 into master Apr 13, 2023

nikita-tkachenko-datadog deleted the nikita-tkachenko/git-pack-parse-issue branch April 13, 2023 14:59

github-actions bot added this to the 1.12.0 milestone Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Git pack-file parsing issue #5062

Fix Git pack-file parsing issue #5062

Uh oh!

nikita-tkachenko-datadog commented Apr 13, 2023

Uh oh!

smola left a comment

Uh oh!

nikita-tkachenko-datadog commented Apr 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix Git pack-file parsing issue #5062

Fix Git pack-file parsing issue #5062

Uh oh!

Conversation

nikita-tkachenko-datadog commented Apr 13, 2023

What Does This Do

Motivation

Uh oh!

smola left a comment

Choose a reason for hiding this comment

Uh oh!

nikita-tkachenko-datadog commented Apr 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants