Skip to content

Conversation

@rzikm
Copy link
Member

@rzikm rzikm commented Aug 11, 2025

fixes #117455.

TarReader now throws when encountering checksum failures.

This PR also vectorizes checksum calculation already present on the Write path of the TarHeader, which seems to speed-up some benchmarks

Benchmark data
BenchmarkDotNet v0.14.1-nightly.20250107.205, Windows 11 (10.0.26100.6899)
Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK 10.0.100-rc.1.25451.107
  [Host]     : .NET 10.0.0 (10.0.25.45207), X64 RyuJIT AVX2
  Job-HZMVFR : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-ZKEGXV : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-BWJLCP : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=500ms  MaxIterationCount=100  
MinIterationCount=15  WarmupCount=1  
Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Allocated Alloc Ratio
CreateFromDirectory_Path Job-ZKEGXV \main\corerun.exe 1,407.4 μs 52.51 μs 148.10 μs 1,403.5 μs 1,132.3 μs 1,824.4 μs 1.01 0.15 - 8.21 KB 1.00
CreateFromDirectory_Path Job-BWJLCP \opt-write\corerun.exe 1,433.3 μs 28.66 μs 82.23 μs 1,431.0 μs 1,252.3 μs 1,605.6 μs 1.03 0.12 - 8.21 KB 1.00
CreateFromDirectory_Path_Async Job-ZKEGXV \main\corerun.exe 1,646.2 μs 32.42 μs 71.16 μs 1,644.6 μs 1,529.1 μs 1,854.2 μs 1.00 0.06 - 11.27 KB 1.00
CreateFromDirectory_Path_Async Job-BWJLCP \opt-write\corerun.exe 1,685.2 μs 33.63 μs 77.27 μs 1,684.7 μs 1,542.7 μs 1,906.9 μs 1.03 0.06 - 11.27 KB 1.00
ExtractToDirectory_Path Job-ZKEGXV \main\corerun.exe 1,739.3 μs 34.04 μs 48.82 μs 1,747.7 μs 1,611.4 μs 1,819.7 μs 1.00 0.04 - 8.64 KB 1.00
ExtractToDirectory_Path Job-BWJLCP \opt-write\corerun.exe 1,746.2 μs 34.37 μs 79.66 μs 1,743.1 μs 1,593.1 μs 1,916.6 μs 1.00 0.05 - 8.63 KB 1.00
ExtractToDirectory_Path_Async Job-ZKEGXV \main\corerun.exe 1,768.3 μs 34.78 μs 67.02 μs 1,773.2 μs 1,641.4 μs 1,899.2 μs 1.00 0.05 - 10.42 KB 1.00
ExtractToDirectory_Path_Async Job-BWJLCP \opt-write\corerun.exe 1,760.9 μs 34.99 μs 55.50 μs 1,765.6 μs 1,667.1 μs 1,860.3 μs 1.00 0.05 - 10.42 KB 1.00
CreateFromDirectory_Stream Job-ZKEGXV \main\corerun.exe 767.1 μs 14.97 μs 14.00 μs 770.6 μs 739.9 μs 788.5 μs 1.00 0.03 - 11.51 KB 1.00
CreateFromDirectory_Stream Job-BWJLCP \opt-write\corerun.exe 768.2 μs 7.48 μs 7.00 μs 769.7 μs 755.2 μs 779.3 μs 1.00 0.02 - 11.5 KB 1.00
CreateFromDirectory_Stream_Async Job-ZKEGXV \main\corerun.exe 838.3 μs 12.84 μs 12.01 μs 837.7 μs 816.8 μs 858.2 μs 1.00 0.02 1.5244 13.58 KB 1.00
CreateFromDirectory_Stream_Async Job-BWJLCP \opt-write\corerun.exe 842.9 μs 15.01 μs 13.30 μs 842.3 μs 822.4 μs 870.7 μs 1.01 0.02 - 13.57 KB 1.00
ExtractToDirectory_Stream Job-ZKEGXV \main\corerun.exe 1,708.4 μs 33.93 μs 71.58 μs 1,689.1 μs 1,575.4 μs 1,860.3 μs 1.00 0.06 - 8.49 KB 1.00
ExtractToDirectory_Stream Job-BWJLCP \opt-write\corerun.exe 1,693.4 μs 33.82 μs 72.81 μs 1,677.4 μs 1,561.6 μs 1,830.4 μs 0.99 0.06 - 8.49 KB 1.00
ExtractToDirectory_Stream_Async Job-ZKEGXV \main\corerun.exe 1,746.8 μs 34.50 μs 62.21 μs 1,749.3 μs 1,614.3 μs 1,899.9 μs 1.00 0.05 - 10.15 KB 1.00
ExtractToDirectory_Stream_Async Job-BWJLCP \opt-write\corerun.exe 1,743.3 μs 34.26 μs 58.17 μs 1,740.4 μs 1,647.7 μs 1,856.9 μs 1.00 0.05 - 10.17 KB 1.00
V7TarEntry_WriteEntry Job-ZKEGXV \main\corerun.exe 247.6 ns 8.55 ns 25.21 ns 251.0 ns 189.1 ns 286.9 ns 1.01 0.15 0.0249 264 B 1.00
V7TarEntry_WriteEntry Job-BWJLCP \opt-write\corerun.exe 248.3 ns 12.10 ns 35.69 ns 254.4 ns 177.8 ns 335.8 ns 1.01 0.18 0.0250 264 B 1.00
V7TarEntry_WriteEntry_Async Job-ZKEGXV \main\corerun.exe 344.4 ns 11.38 ns 33.56 ns 353.6 ns 259.9 ns 397.4 ns 1.01 0.15 0.0249 264 B 1.00
V7TarEntry_WriteEntry_Async Job-BWJLCP \opt-write\corerun.exe 270.7 ns 2.23 ns 2.09 ns 270.1 ns 266.2 ns 274.6 ns 0.79 0.08 0.0248 264 B 1.00
UstarTarEntry_WriteEntry Job-ZKEGXV \main\corerun.exe 258.6 ns 8.21 ns 24.20 ns 263.6 ns 186.6 ns 291.5 ns 1.01 0.14 0.0251 264 B 1.00
UstarTarEntry_WriteEntry Job-BWJLCP \opt-write\corerun.exe 192.6 ns 1.95 ns 1.82 ns 192.8 ns 190.3 ns 195.7 ns 0.75 0.08 0.0250 264 B 1.00
UstarTarEntry_WriteEntry_Async Job-ZKEGXV \main\corerun.exe 364.1 ns 10.49 ns 30.92 ns 369.7 ns 276.3 ns 413.6 ns 1.01 0.13 0.0248 264 B 1.00
UstarTarEntry_WriteEntry_Async Job-BWJLCP \opt-write\corerun.exe 286.0 ns 2.36 ns 2.21 ns 285.8 ns 282.8 ns 289.7 ns 0.79 0.08 0.0252 264 B 1.00
PaxTarEntry_WriteEntry Job-ZKEGXV \main\corerun.exe 1,565.0 ns 51.03 ns 150.48 ns 1,592.7 ns 1,204.0 ns 1,800.2 ns 1.01 0.14 0.1873 1983 B 1.00
PaxTarEntry_WriteEntry Job-BWJLCP \opt-write\corerun.exe 1,244.5 ns 3.69 ns 3.08 ns 1,245.0 ns 1,239.4 ns 1,249.5 ns 0.80 0.08 0.1892 1983 B 1.00
PaxTarEntry_WriteEntry_Async Job-ZKEGXV \main\corerun.exe 1,825.4 ns 56.38 ns 166.24 ns 1,843.4 ns 1,397.5 ns 2,107.6 ns 1.01 0.13 0.1881 1983 B 1.00
PaxTarEntry_WriteEntry_Async Job-BWJLCP \opt-write\corerun.exe 1,458.5 ns 21.53 ns 20.14 ns 1,456.9 ns 1,412.5 ns 1,492.9 ns 0.81 0.08 0.1879 1983 B 1.00
GnuTarEntry_WriteEntry Job-ZKEGXV \main\corerun.exe 271.8 ns 8.80 ns 25.95 ns 273.5 ns 199.3 ns 306.7 ns 1.01 0.14 0.0248 264 B 1.00
GnuTarEntry_WriteEntry Job-BWJLCP \opt-write\corerun.exe 206.2 ns 1.16 ns 1.03 ns 206.5 ns 203.3 ns 207.4 ns 0.77 0.08 0.0250 264 B 1.00
GnuTarEntry_WriteEntry_Async Job-ZKEGXV \main\corerun.exe 376.9 ns 12.04 ns 35.49 ns 382.4 ns 287.3 ns 438.2 ns 1.01 0.14 0.0247 264 B 1.00
GnuTarEntry_WriteEntry_Async Job-BWJLCP \opt-write\corerun.exe 305.5 ns 2.11 ns 1.97 ns 305.8 ns 302.6 ns 309.1 ns 0.82 0.08 0.0252 264 B 1.00

@rzikm

This comment was marked as outdated.

@rzikm
Copy link
Member Author

rzikm commented Aug 11, 2025

There seem to be two test files where we expect to progress even with invalid checksum?

node-tar/bad-cksum.tar - this one seems to be extracted correctly with GNU tar but will fail with bsdtar on windows

node-tar$ tar xf bad-cksum.ta
r -C test/
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
node-tar$ ls test/
one-byte.txt
node-tar$ cat test/one-byte.txt
a

❯ tar xf .\bad-cksum.tar -C .\test\
tar.exe: Damaged tar archive
tar.exe: Retrying...
tar.exe: Damaged tar archive
tar.exe: Retrying..

golang_tar/issue12435.tar - this file seems to be garbage, null bytes in checksum.

System.Formats.Tar.Tests.TarWriter_Tests.Verify_Compatibility_RegularFile_EmptyFile_NoSizeStored tests seems to be editing some header data, so might just need fixing the checksum in the test data.

@rzikm rzikm added the breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. label Oct 7, 2025
@dotnet-policy-service dotnet-policy-service bot added the needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet label Oct 7, 2025
@dotnet-policy-service
Copy link
Contributor

dotnet-policy-service bot commented Oct 7, 2025

Added needs-breaking-change-doc-created label because this PR has the breaking-change label.

When you commit this breaking change:

  1. Create and link to this PR and the issue a matching issue in the dotnet/docs repo using the breaking change documentation template, then remove this needs-breaking-change-doc-created label.
  2. Ask a committer to mail the .NET Breaking Change Notification DL.

Tagging @dotnet/compat for awareness of the breaking change.

@rzikm rzikm reopened this Oct 7, 2025
@rzikm rzikm force-pushed the 117455-Tar-reader-checksum-checking branch from b3f26e8 to ee81e70 Compare October 10, 2025 13:36
@rzikm rzikm marked this pull request as ready for review October 10, 2025 13:37
Copilot AI review requested due to automatic review settings October 10, 2025 13:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds checksum validation to TAR archive reading to improve data integrity. The TarReader now throws InvalidDataException when encountering checksum failures instead of silently ignoring them.

Key changes:

  • Adds checksum validation logic to the TAR header reading process
  • Updates tests to expect exceptions instead of null returns for invalid checksums
  • Adds new error message resource for checksum validation failures

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs Implements checksum validation and adds helper method to calculate header checksums
src/libraries/System.Formats.Tar/src/Resources/Strings.resx Adds new error message resource for checksum validation failures
src/libraries/System.Formats.Tar/tests/TarReader/TarReader.Tests.cs Adds comprehensive test for invalid checksum scenarios
src/libraries/System.Formats.Tar/tests/TarReader/TarReader.File.Tests.cs Updates existing tests to expect exceptions for checksum failures
src/libraries/System.Formats.Tar/tests/TarReader/TarReader.File.Async.Tests.cs Updates async tests to expect exceptions for checksum failures
src/libraries/System.Formats.Tar/tests/TarWriter/TarWriter.Tests.cs Fixes checksum in existing test to maintain validity
src/libraries/System.Formats.Tar/tests/TarTestsBase.cs Removes "bad-cksum" test case from exclusion list

@ericstj ericstj added this to the 11.0.0 milestone Oct 15, 2025
@ericstj
Copy link
Member

ericstj commented Oct 15, 2025

📋 Breaking Change Documentation Required

Create a breaking change issue with AI-generated content

Generated by Breaking Change Documentation Tool - 2025-10-15 09:30:09

@rzikm
Copy link
Member Author

rzikm commented Oct 21, 2025

@ericstj I vectorized the implementation and ran some benchmarks (see PR description), there may be some small impact on read, but we are getting significant gains on the write path.

PTAL, and if it is fine then let's merge this.

@rzikm rzikm requested a review from ericstj October 21, 2025 12:54
@rzikm rzikm force-pushed the 117455-Tar-reader-checksum-checking branch from 63eb2fb to 2534092 Compare October 21, 2025 12:56
Copy link
Member

@ericstj ericstj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you for the perf tests.

@ericstj ericstj merged commit a1c4bad into dotnet:main Oct 21, 2025
83 of 85 checks passed
@rzikm
Copy link
Member Author

rzikm commented Oct 22, 2025

Created breaking change doc: dotnet/docs#49394

@ericstj ericstj removed the needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet label Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Formats.Tar breaking-change Issue or PR that represents a breaking API or functional change over a prerelease.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tar reader checksum checking

4 participants