Skip to content

Conversation

iremyux
Copy link
Contributor

@iremyux iremyux commented Sep 23, 2025

This PR improves the experience of reading compressed tar archives by detecting common compression formats before parsing. It now provides clear, localized messages that explain whether the archive is compressed with a supported format (e.g., GZIP, ZLIB) or an unsupported one (e.g., BZIP2, LZ4, XZ). For supported formats, it guides users to wrap the stream with the right .NET class, and for unsupported formats, it advises decompression first.

Fixes #89056

@Copilot Copilot AI review requested due to automatic review settings September 23, 2025 14:41
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the System.Formats.Tar library by adding compression format detection at the beginning of tar archive reading. When users attempt to read a compressed tar file directly without decompression, the library now provides helpful error messages that identify the compression type and guide users on the appropriate solution.

  • Adds magic number detection for common compression formats (GZIP, ZLIB, BZIP2, LZ4, XZ, etc.)
  • Provides differentiated error messages for supported vs unsupported compression formats
  • Integrates compression checking into both synchronous and asynchronous tar reading workflows

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
TarHeader.Read.cs Adds compression detection logic and calls to check magic numbers during tar header reading
Strings.resx Adds localized error message templates for supported and unsupported compression detection

Comment on lines 811 to 814
ThrowIfUnsupportedCompression(buffer, [0x5D, 0x00, 0x00], "LZMA");
ThrowIfUnsupportedCompression(buffer, [0x89, 0x4C, 0x5A, 0x4F], "LZO");
ThrowIfUnsupportedCompression(buffer, [0xFD, 0x37, 0x7A, 0x58, 0x5A, 0x00], "XZ");
}
Copy link

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LZMA magic number check is incomplete. LZMA files typically have a 13-byte header with magic bytes [0x5D, 0x00, 0x00] followed by dictionary size and uncompressed size. Checking only the first 3 bytes may produce false positives since this is a very short pattern that could appear in valid tar data.

Suggested change
ThrowIfUnsupportedCompression(buffer, [0x5D, 0x00, 0x00], "LZMA");
ThrowIfUnsupportedCompression(buffer, [0x89, 0x4C, 0x5A, 0x4F], "LZO");
ThrowIfUnsupportedCompression(buffer, [0xFD, 0x37, 0x7A, 0x58, 0x5A, 0x00], "XZ");
}
ThrowIfLzmaCompression(buffer);
ThrowIfUnsupportedCompression(buffer, [0x89, 0x4C, 0x5A, 0x4F], "LZO");
ThrowIfUnsupportedCompression(buffer, [0xFD, 0x37, 0x7A, 0x58, 0x5A, 0x00], "XZ");
}
// Checks for LZMA header: first 3 bytes are [0x5D, 0x00, 0x00], and length >= 13
static void ThrowIfLzmaCompression(ReadOnlySpan<byte> buffer)
{
if (buffer.Length >= 13 &&
buffer[0] == 0x5D &&
buffer[1] == 0x00 &&
buffer[2] == 0x00)
{
throw new InvalidDataException(SR.Format(SR.TarUnsupportedCompressionDetected, "LZMA"));
}
}

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tar: Detect magic numbers in archives compressed with popular algorithms

1 participant