Skip to content

Conversation

@nikita-tkachenko-datadog
Copy link
Contributor

What Does This Do

This change fixes an issue that happens when parsing some Git pack-files.
Currently the logic reads Git object size from the pack file, then reads the number of bytes equal to that size, then decompresses the data that was read.
The problem is that the number that's being read in the first step is the size of the Git object AFTER decompression.
So the second steps reads more bytes than it needs, which is fine in most cases, as decompression stops once the end of the compressed data has been reached.
This is a problem, however, when the Git object is located at the end of the file - in that case the logic tries to read more bytes than there are available, and fails with an error.
The quick fix is to stop reading once the end of the file has been reached.

Motivation

Because of this issue, Git metadata cannot be extracted for some repositories (one example is Mybatis-3, which was used for testing and uncovered this issue).

@nikita-tkachenko-datadog nikita-tkachenko-datadog added type: bug Bug report and fix comp: ci visibility Continuous Integration Visibility labels Apr 13, 2023
@nikita-tkachenko-datadog nikita-tkachenko-datadog marked this pull request as ready for review April 13, 2023 13:31
@nikita-tkachenko-datadog nikita-tkachenko-datadog requested a review from a team as a code owner April 13, 2023 13:31
Copy link
Member

@smola smola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the issue correctly, while this PR fixes the issue with end of file, the byte buffer allocated here is generally much larger than needed?

@nikita-tkachenko-datadog
Copy link
Contributor Author

If I understand the issue correctly, while this PR fixes the issue with end of file, the byte buffer allocated here is generally much larger than needed?

Correct, the size of the buffer is equal to the size of the decompressed object, while the data that we read into it is the compressed object.

It is a quick-and-dirty fix, but fixing this "properly" would require much more changes in the code (which originally was not written by us, but ported from an opensource library).

I wouldn't say that allocating a larger buffer is much of an issue here - the buffer size is typically around 1KB, and we do this parsing only once during agent startup

@nikita-tkachenko-datadog nikita-tkachenko-datadog merged commit 612fbd7 into master Apr 13, 2023
@nikita-tkachenko-datadog nikita-tkachenko-datadog deleted the nikita-tkachenko/git-pack-parse-issue branch April 13, 2023 14:59
@github-actions github-actions bot added this to the 1.12.0 milestone Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: ci visibility Continuous Integration Visibility type: bug Bug report and fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants