Skip to content

test wikipedia: CI has failed #203

@tikkss

Description

@tikkss

All CI has failed for same reasons. For example Ruby 3.0: macos-latest:

https://github.com/red-data-tools/red-datasets/actions/runs/10639536353/job/29497680645?pr=202#step:8:11

Run bundle exec rake
  bundle exec rake
  shell: /bin/bash -e {0}
  env:
    CACHE_VERSION: [2](https://github.com/red-data-tools/red-datasets/actions/runs/10639536353/job/29497680645?pr=202#step:8:2)022-08-27
    PATH: /Users/runner/.local/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/runner/.cargo/bin:/usr/local/opt/curl/bin:/usr/local/bin:/usr/local/sbin:/Users/runner/bin:/Users/runner/.yarn/bin:/Users/runner/Library/Android/sdk/tools:/Users/runner/Library/Android/sdk/platform-tools:/Library/Frameworks/Python.framework/Versions/Current/bin:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/usr/bin:/bin:/usr/sbin:/sbin:/Users/runner/.dotnet/tools
/Users/runner/hostedtoolcache/Ruby/[3](https://github.com/red-data-tools/red-datasets/actions/runs/10639536353/job/29497680645?pr=202#step:8:3).0.7/arm64/bin/ruby test/run-test.rb
/Users/runner/work/red-datasets/red-datasets/lib/datasets/tar-gz-readable.rb:7: warning: attempt to close unfinished zstream; reset forced.

bzcat: Data integrity error when decompressing.
	Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

Failed to read bzcat input: Errno::EPIPE: Broken pipe
===============================================================================
Failure: test: #each(WikipediaTest::en::articles)
/Users/runner/work/red-datasets/red-datasets/test/test-wikipedia.rb:36:in `block (3 levels) in <class:WikipediaTest>'
     33:         page.restrictions = nil
     3[4](https://github.com/red-data-tools/red-datasets/actions/runs/10639536353/job/29497680645?pr=202#step:8:4):         page.redirect = "Computer accessibility"
     3[5](https://github.com/red-data-tools/red-datasets/actions/runs/10639536353/job/29497680645?pr=202#step:8:5):         page.revision = revision
  => 3[6](https://github.com/red-data-tools/red-datasets/actions/runs/10639536353/job/29497680645?pr=202#step:8:6):         assert_equal(page, @dataset.each.first)
     37:       end
     38: 
     39:       sub_test_case("#metadata") do
<#<struct Datasets::Wikipedia::Page
 title="AccessibleComputing",
 namespace=0,
 id=10,
 restrictions=nil,
 redirect="Computer accessibility",
 revision=
  #<struct Datasets::Wikipedia::Revision
   id=1219062925,
   parent_id=1219062840,
   timestamp=2024-04-15 14:38:04 UTC,
   contributor=
    #<struct Datasets::Wikipedia::Contributor
     user_name="Asparagusus",
     id=43603280>,
   minor=nil,
   comment=
    "Restored revision 1002250816 by [[Special:Contributions/Elli|Elli]] ([[User talk:Elli|talk]]): Unexplained redirect breaking",
   model="wikitext",
   format="text/x-wiki",
   text=
    "#REDIRECT [[Computer accessibility]]\n" +
    "\n" +
    "{{rcat shell|\n" +
    "{{R from move}}\n" +
    "{{R from CamelCase}}\n" +
    "{{R unprintworthy}}\n" +
    "}}",
   sha1="kmysdltgexdwkv2xsml3j44jb56dxvn">>> expected but was
<nil>

diff:
- #<struct Datasets::Wikipedia::Page
-  title="AccessibleComputing",
-  namespace=0,
-  id=10,
-  restrictions=nil,
-  redirect="Computer accessibility",
-  revision=
-   #<struct Datasets::Wikipedia::Revision
-    id=1219062925,
-    parent_id=1219062840,
-    timestamp=2024-04-15 14:38:04 UTC,
-    contributor=
-     #<struct Datasets::Wikipedia::Contributor
-      user_name="Asparagusus",
-      id=43603280>,
?    minor=nil,
-    comment=
-     "Restored revision 1002250816 by [[Special:Contributions/Elli|Elli]] ([[User talk:Elli|talk]]): Unexplained redirect breaking",
-    model="wikitext",
-    format="text/x-wiki",
-    text=
-     "#REDIRECT [[Computer accessibility]]\n" +
-     "\n" +
-     "{{rcat shell|\n" +
-     "{{R from move}}\n" +
-     "{{R from CamelCase}}\n" +
-     "{{R unprintworthy}}\n" +
-     "}}",
-    sha1="kmysdltgexdwkv2xsml3j44jb56dxvn">>
===============================================================================
Finished in 22.85509[7](https://github.com/red-data-tools/red-datasets/actions/runs/10639536353/job/29497680645?pr=202#step:8:8) seconds.
199 tests, 240 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
rake aborted!
Command failed with status (1): [/Users/runner/hostedtoolcache/Ruby/3.0.7/arm64/bin/ruby test/run-test.rb]
/Users/runner/work/red-datasets/red-datasets/Rakefile:20:in `block in <top (required)>'
/Users/runner/hostedtoolcache/Ruby/3.0.7/arm64/bin/bundle:23:in `load'
/Users/runner/hostedtoolcache/Ruby/3.0.7/arm64/bin/bundle:23:in `<main>'
Tasks: TOP => default => test
(See full trace by running task with --trace)
Error: Process completed with exit code 1.

Could the Wikipedia cache on GitHub Actions be corrupted?
We can't download the cache, so debugging is difficult.

By the way, the Wikipedia test passes on my local machine:

$ ruby -I lib -e 'require "datasets"; Datasets::Wikipedia.new.clear_cache!' && ruby test/run-test.rb -t WikipediaTest --progress-row-max=72
Loaded suite test
Started
Failed to read bzcat input: Errno::EPIPE: Broken pipe
Finished in 2.234664 seconds.
------------------------------------------------------------------------
4 tests, 4 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed
------------------------------------------------------------------------
1.79 tests/s, 1.79 assertions/s

Should we clear cache on GitHub Actions?:

diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index c673b80..1fdd77b 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -27,7 +27,7 @@ jobs:
     runs-on: ${{ matrix.runs-on }}
     env:
       # We can invalidate the current cache by updating this.
-      CACHE_VERSION: "2022-08-27"
+      CACHE_VERSION: "2024-08-31"
     steps:
       - uses: actions/checkout@v4
       - uses: ruby/setup-ruby@v1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions