Skip to content

Releases: iipc/webarchive-commons

webarchive-commons-3.0.2

14 Nov 01:33
@ato ato

Choose a tag to compare

Fixes

  • Avoid relying on the default locale or charset. #128
  • BasicURLCanonicalizer: more efficient normalization of dots in host names. #129

Dependency upgrades

  • commons-cli: 1.10.0 → 1.11.0
  • commons-codec: 1.19.0 → 1.20.0
  • commons-io: 2.20.0 → 2.21.0
  • junit-jupiter: 5.13.3 → 5.14.1
  • maven-release-plugin: 3.1.1 → 3.2.0

webarchive-commons-3.0.1

27 Oct 00:53
@ato ato

Choose a tag to compare

Fixes

  • Fixed a file handle leak in FileUtils.pagedLines() and FileUtils.appendTo() that could occur during I/O errors.

Dependency Upgrades

  • commons-codec: 1.18.0 → 1.19.0
  • commons-lang3: 3.18.0 → 3.19.0
  • commons-cli: 1.9.0 → 1.10.0
  • guava: 33.4.8-jre → 33.5.0-jre
  • hadoop: 3.4.1 → 3.4.2
  • pig: 0.17.0 → 0.18.0

webarchive-commons-3.0.0

21 Jul 07:44
@ato ato

Choose a tag to compare

Changes

FileUtils.pagedLines() and FileUtils.expandRange() now return the Apache Commons Lang 3 version of LongRange.
Users of these methods may need to make the following changes:

Old New
import org.apache.commons.lang.math.LongRange import org.apache.commons.lang3.LongRange
new LongRange(min, max) LongRange.of(min, max)
longRange.getMaximumLong() longRange.getMaximum()
longRange.getMinimumLong() longRange.getMinimum()

Dependency upgrades

  • commons-io: 2.19.0 → 2.20.0
  • commons-lang: 2.6 → 3.18.0

webarchive-commons-2.0.2

15 Jul 02:07
@ato ato

Choose a tag to compare

Fixes

  • Fixes for org.archive.net.PublicSuffixes #110
    • Updated to the latest version of the public suffix list.
    • Fixed parsing failures with newer list versions.
    • Moved effective_tld_names.dat to org/archive/effective_tld_names.dat to prevent conflict with crawler-commons.

webarchive-commons-2.0.1

21 May 07:56
@ato ato

Choose a tag to compare

Changes

  • Re-added Reporter.shortReportLineTo(PrintWriter) as it turned out to be important to Heritrix.

webarchive-commons-2.0.0

21 May 06:04
@ato ato

Choose a tag to compare

New features

  • Added RecordingInputStream.asOutputStream() for direct writing of recorded data without an input stream. #108

Removals

Removed Apache HttpClient 3.1

HTTPSeekableLineReaderFactory and ZipNumBlockLoader now default to HttpClient 4.3.

Removed Replacement
org.apache.commons.httpclient.URIException org.archive.url.URIException
org.apache.commons.httpclient.Header org.archive.format.http.HttpHeader
org.archive.httpclient.HttpRecorderGetMethod
org.archive.httpclient.HttpRecorderMethod
org.archive.httpclient.HttpRecorderPostMethod
org.archive.httpclient.SingleHttpConnectionManager
org.archive.httpclient.ThreadLocalHttpConnectionManager

Removed deprecated versions of renamed classes

Removed Replacement
org.archive.io.ArchiveFileConstants org.archive.format.ArchiveFileConstants
org.archive.io.GzipHeader org.archive.util.zip.GzipHeader
org.archive.io.GZIPMembersInputStream org.archive.util.zip.GZIPMembersInputStream
org.archive.io.NoGzipMagicException org.archive.util.zip.NoGzipMagicException
org.archive.io.arc.ARCConstants org.archive.format.arc.ARCConstants
org.archive.io.warc.WARCConstants org.archive.format.warc.WARCConstants
org.archive.url.DefaultIACanonicalizerRules org.archive.url.AggressiveIACanonicalizerRules
org.archive.url.DefaultIAURLCanonicalizer org.archive.url.AggressiveIAURLCanonicalizer
org.archive.url.GoogleURLCanonicalizer org.archive.url.BasicURLCanonicalizer

Removed deprecated methods

Removed Replacement
ANVLRecord(int) ANVLRecord()
DevUtils.betterPrintStack(RuntimeException) Throwable.printStackStrace()
Recorder.getReplayCharSequence() Recorder.getContentReplayCharSequence()
Reporter.shortReportLineTo(PrintWriter) Reporter.reportTo(PrintWriter)
Removed usages of constant interfaces

Static imports should be used instead.

  • ArchiveFileConstants is no longer implemented by:
    • ArchiveReader
    • ArchiveReaderFactory
    • WARCWriter
    • WriterPool
    • WriterPoolMember
  • ARCConstants is no longer implemented by:
    • ARCReader
    • ARCReaderFactory
    • ARCRecord
    • ARCRecordMetaData
    • ARCUtils
    • ARCWriter
  • WARCConstants is no longer implemented by:
    • WARCReader
    • WARCReaderFactory
    • WARCRecord
    • WARCWriter

Dependency upgrades

  • commons-io: 2.18.0 → 2.19.0
  • guava: 33.3.1-jre → 33.4.8-jre
  • json: 20240303 → 20250517
  • junit: 4.13.2 → 5.12.2

webarchive-commons-1.3.0

20 Dec 05:21
@ato ato

Choose a tag to compare

URL Canonicalization Changed

The output of WaybackURLKeyMaker and other canonicalizers based on BasicURLCanonicalizer has changed for URLs that
contain non UTF-8 percent encoded sequences. For example when a URL contains "%C3%23" it will now be normalised to
"%c3%23" whereas previous releases produced "%25c3%23". This change brings webarchive-commons more inline with pywb,
surt (Python), warcio.js and RFC 3986. While CDX file compatibility with these newer tools should improve, note that CDX
files generated by the new release which contain such URLs may not work correctly with existing versions of
OpenWayback that use the older webarchive-commons. #102

Bug fixes

  • WAT: Duplicated payload metadata values for "Actual-Content-Length" and "Trailing-Slop-Length" #103
  • ObjectPlusFilesOutputStream.hardlinkOrCopy now uses Files.createLink() instead of executing ln. This
    prevents the potential for security vulnerabilities from command line option injection and improves portability.

Dependency upgrades

  • fastutil removed
  • dsiutils removed

Deprecations

The following classes and enum members have been marked deprecated as a step towards removal of the dependency on
Apache Commons HttpClient 3.1.

  • org.archive.httpclient.HttpRecorderGetMethod
  • org.archive.httpclient.HttpRecorderMethod
  • org.archive.httpclient.HttpRecorderPostMethod
  • org.archive.httpclient.SingleHttpConnectionManager
  • org.archive.httpclient.ThreadLocalHttpConnectionManager
  • org.archive.util.binsearch.impl.http.ApacheHttp31SLR
  • org.archive.util.binsearch.impl.http.ApacheHttp31SLRFactory
  • org.archive.util.binsearch.impl.http.HTTPSeekableLineReaderFactory.HttpLibs.APACHE_31

webarchive-commons-1.2.0

29 Nov 07:43
@ato ato

Choose a tag to compare

New features

  • MetaData is now multivalued to support repeated WARC and HTTP headers. #98

Dependency upgrades

  • commons-io 2.18.0
  • commons-lang 2.6
  • guava 33.3.1-jre
  • hadoop 3.4.1
  • htmlparser 2.1
  • httpcore 4.4.16
  • json 20240303
  • junit 4.13.2

webarchive-commons-1.1.11

27 Nov 13:05
@ato ato

Choose a tag to compare

Bug fixes

  • Fixed URLParser and WaybackURLKeyMaker failing on URLs with IPv6 address hostnames #100

webarchive-commons-1.1.10

15 Oct 08:46
@ato ato

Choose a tag to compare

Fixes

Dependency Upgrades

  • commons-collections 3.2.2
  • commons-io 2.14.0
  • dsiutils 2.2.8
  • guava 33.3.0-jre
  • hadoop 3.4.0 (now optional)
  • pig 0.17.0
  • org.json 20231013

Dependency Removals

  • joda-time (was unused)