-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Add compression magic number detection to System.Formats.Tar with helpful error messages #119996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the System.Formats.Tar library by adding compression format detection at the beginning of tar archive reading. When users attempt to read a compressed tar file directly without decompression, the library now provides helpful error messages that identify the compression type and guide users on the appropriate solution.
- Adds magic number detection for common compression formats (GZIP, ZLIB, BZIP2, LZ4, XZ, etc.)
- Provides differentiated error messages for supported vs unsupported compression formats
- Integrates compression checking into both synchronous and asynchronous tar reading workflows
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
TarHeader.Read.cs | Adds compression detection logic and calls to check magic numbers during tar header reading |
Strings.resx | Adds localized error message templates for supported and unsupported compression detection |
ThrowIfUnsupportedCompression(buffer, [0x5D, 0x00, 0x00], "LZMA"); | ||
ThrowIfUnsupportedCompression(buffer, [0x89, 0x4C, 0x5A, 0x4F], "LZO"); | ||
ThrowIfUnsupportedCompression(buffer, [0xFD, 0x37, 0x7A, 0x58, 0x5A, 0x00], "XZ"); | ||
} |
Copilot
AI
Sep 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LZMA magic number check is incomplete. LZMA files typically have a 13-byte header with magic bytes [0x5D, 0x00, 0x00] followed by dictionary size and uncompressed size. Checking only the first 3 bytes may produce false positives since this is a very short pattern that could appear in valid tar data.
ThrowIfUnsupportedCompression(buffer, [0x5D, 0x00, 0x00], "LZMA"); | |
ThrowIfUnsupportedCompression(buffer, [0x89, 0x4C, 0x5A, 0x4F], "LZO"); | |
ThrowIfUnsupportedCompression(buffer, [0xFD, 0x37, 0x7A, 0x58, 0x5A, 0x00], "XZ"); | |
} | |
ThrowIfLzmaCompression(buffer); | |
ThrowIfUnsupportedCompression(buffer, [0x89, 0x4C, 0x5A, 0x4F], "LZO"); | |
ThrowIfUnsupportedCompression(buffer, [0xFD, 0x37, 0x7A, 0x58, 0x5A, 0x00], "XZ"); | |
} | |
// Checks for LZMA header: first 3 bytes are [0x5D, 0x00, 0x00], and length >= 13 | |
static void ThrowIfLzmaCompression(ReadOnlySpan<byte> buffer) | |
{ | |
if (buffer.Length >= 13 && | |
buffer[0] == 0x5D && | |
buffer[1] == 0x00 && | |
buffer[2] == 0x00) | |
{ | |
throw new InvalidDataException(SR.Format(SR.TarUnsupportedCompressionDetected, "LZMA")); | |
} | |
} |
Copilot uses AI. Check for mistakes.
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs
Outdated
Show resolved
Hide resolved
…n supported/unsuported formats
This PR improves the experience of reading compressed tar archives by detecting common compression formats before parsing. It now provides clear, localized messages that explain whether the archive is compressed with a supported format (e.g., GZIP, ZLIB) or an unsupported one (e.g., BZIP2, LZ4, XZ). For supported formats, it guides users to wrap the stream with the right .NET class, and for unsupported formats, it advises decompression first.
Fixes #89056