Skip to content

Conversation

@carllin
Copy link
Contributor

@carllin carllin commented Feb 18, 2025

Problem

Poh needlessly hashes/complicated for alpenglow

Summary of Changes

  1. On Alpenglow signal, PohService shuts down its current tick producer and migrates to a simple Alpenglow version that only records and ticks once to signal the end of the slot.
  2. Poh sets bank tick_count to max_tick_height - 1 right before it ticks to mimick bank being complete

Tested the above, this is sufficient for a single leader to continually produce Alpenglow blocks on a single node network because replay won't verify those blocks

TODO:
Change replay verification on migration to stop counting number of ticks/verifying hashes. Only thing necessary is entry count and reading the last tick as a signal for the end of the block

Fixes #

}
}

if let Some(CurrentLeaderBank { bank, start }) = &current_leader_bank {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let Some() else {
    continue;
}

would be nice here to reduce nesting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced!

last_reported_slot_for_pending_fork: Arc<Mutex<Slot>>,
pub is_exited: Arc<AtomicBool>,
pub is_alpenglow_enabled: bool,
pub is_poh_service_migrated: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not sure what "is_poh_service_migrated" means here, and how it's different from is_alpenglow_enabled above. How about names like "use_alpenglow_tick_produer"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced!

))
}

pub fn migrate_poh_to_alpenglow(&mut self) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Are we doing anything other than setting PoH to low power mode here? This name is vague.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just migrating to the alpenglow poh tick producer, renamed to migrate_to_alpenglow_poh

self.tick_lock_contention_us += tick_lock_contention_us;

if let Some(poh_entry) = poh_entry {
self.tick_height = slot_max_tick_height;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, so the plan is we will only have one tick forever in all Alpenglow blocks? Do we need to change tick verification in replay at the same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah just one tick in all alpenglow blocks to signal the end of a slot

Yeah, need to remove tick verification in replay when alpenglow is enabled, that can be done in another PR

}
Self::alpenglow_tick_producer(poh_recorder, &poh_exit, record_receiver);
}
//}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

while !poh_exit.load(Ordering::Relaxed) {
// Wait for a new leader bank to be set in PohRecorder
let leader_bank = leader_bank_notifier
.get_or_wait_for_in_progress(Duration::from_millis(50))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you can't do much if you don't have a leader bank, is 50ms a good time interval to wait here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think other components like banking_stage used 50ms, so I used that. It just needs to be reasonable enough where we can occasionally check the exit condition for poh_service

let tick_producer = Builder::new()
.name("solPohTickProd".to_string())
.spawn(move || {
if poh_config.hashes_per_tick.is_none() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Alpenglow is already enabled, should we skip this if statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, thats a good point, should skip the migration if we have a frozen bank on startup greater than the first alpenglow slot, added!

bank: leader_bank.clone(),
// By this point the leader should have committed their certificates,
// so it's safe to start the timer
start: Instant::now(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are packing a bank then the validator got restarted in the middle, what will happen? I'm guessing we will just repack the whole thing, so start will be correct again?

Is it possible that we already sent out some shreds then the validator is restarted before the leader bank finishes? Would the start be wrong here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like today we always check blockstore to see if we have existing shreds before we create a leader bank

// so it's safe to start the timer
start: Instant::now(),
});
info!("current ns per slot: {}", leader_bank.ns_per_slot);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this value change? Do we need to log every time we start a leader bank?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can adjust Poh difficulty today, so I think it would be nice to be able to adjust slot times if we ever want to shorten them, which was on the roadmap

Copy link
Contributor

@AshwinSekar AshwinSekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic looks correct for prototype, :shipit:

@carllin carllin merged commit 59469e6 into anza-xyz:master Feb 27, 2025
7 checks passed
carllin added a commit that referenced this pull request Mar 1, 2025
PohService needs to set `use_alpenglow_tick_producer` flag on startup  (#59)
carllin added a commit that referenced this pull request Mar 1, 2025
PohService needs to set `use_alpenglow_tick_producer` flag on startup  (#59)
@AshwinSekar AshwinSekar moved this to Pending migration in Alpenglow May 28, 2025
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
PohService needs to set `use_alpenglow_tick_producer` flag on startup  (anza-xyz#59)
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
PohService needs to set `use_alpenglow_tick_producer` flag on startup  (anza-xyz#59)
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
PohService needs to set `use_alpenglow_tick_producer` flag on startup  (anza-xyz#59)
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
PohService needs to set `use_alpenglow_tick_producer` flag on startup  (anza-xyz#59)
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
PohService needs to set `use_alpenglow_tick_producer` flag on startup  (anza-xyz#59)
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
PohService needs to set `use_alpenglow_tick_producer` flag on startup  (anza-xyz#59)
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 2, 2025
PohService needs to set `use_alpenglow_tick_producer` flag on startup  (anza-xyz#59)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Pending migration

Development

Successfully merging this pull request may close these issues.

4 participants