Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

validator version 1.16.16 crashed with "unable to freeze a block with bankhash" #33740

@bji

Description

@bji

Problem

I am running a primary/secondary validator setup with two physically separate validators. Both validators were running 1.16.16-jito w/local vote mods (the vote mods are not believed to alter any aspect of accounts state, because the mods only alter which slots are voted on in replay_stage.rs).

Both primary and secondary crashed simultaneously with the same error message. This is two separate validators experiencing the same issue. Note that the secondary likely took the tower from the primary shortly before the primary crashed as is common behavior when a secondary's tower gets out of sync with the primary.

The log message on both validators was:

[2023-10-16T22:47:33.894214958Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="solReplayStage" one=1i message="panicked at 'We have tried to repair duplicate slot: 224116920 more than 10 times and are unable to freeze a block with bankhash 38rFNeVT1FGHResoXsgTN2cyyjBiJd6ZFcXWCPnXzZn7, instead we have a block with bankhash Some(GVZoNMHn1o8CB76i3LLRHjTA1p8f4iyTa6hn9eopWJh1). This is most likely a bug in the runtime. At this point manual intervention is needed to make progress. Exiting', core/src/replay_stage.rs:1585:25" location="core/src/replay_stage.rs:1585:25" version="1.16.16 (src:00000000; feat:4033350765, client:JitoLabs)"

I have collected the following files from one of the validators:

snapshot:
https://s3.us-west-1.amazonaws.com/shinobi-systems.com/incident-2023.10.16/snapshot-224107039-GTo6rcFciwNew8Bx9Hggv7n9oS5AZVeQGSXQMkaFfkfy.tar.zst

incremental snapshot:
https://s3.us-west-1.amazonaws.com/shinobi-systems.com/incident-2023.10.16/incremental-snapshot-224107039-224116213-qg6B13UGAAMKDm6XqG6p6PchnZ7RAgjhbe759igiFce.tar.zst

ledger (copied 10,000 slots leading up to the crash):
https://s3.us-west-1.amazonaws.com/shinobi-systems.com/incident-2023.10.16/ledger.tar.gz

validator logs leading up to the crash:
https://s3.us-west-1.amazonaws.com/shinobi-systems.com/incident-2023.10.16/validator.log.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    communityCommunity contribution

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions