Skip to content

Conversation

cody-littley
Copy link
Contributor

@cody-littley cody-littley commented Jul 31, 2025

Why are these changes needed?

Adds the ability to override LittDB lock files in the advent of an emergency.
https://linear.app/eigenlabs/issue/EGDA-1795/flag-to-ignore-littdb-lock-files

@cody-littley cody-littley requested review from dmanc and litt3 July 31, 2025 16:47
@cody-littley cody-littley self-assigned this Jul 31, 2025
Copy link

github-actions bot commented Jul 31, 2025

The latest Buf updates on your PR. Results from workflow Buf Proto / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedAug 1, 2025, 6:58 PM

@@ -136,6 +136,10 @@ type Config struct {
// Directories do not need to be on the same filesystem.
LittDBStoragePaths []string

// If true, then purge LittDB locks on startup. Potentially useful to get rid of zombie lock files,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't quite understand the scenario in which zombie lock files appear? Is it due to some sort of ungraceful termination with containers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded the documentation here:

	// If true, then purge LittDB locks on startup. Potentially useful to get rid of zombie lock files,
	// but also dangerous (multiple LittDB processes operating on the same files can lead to data corruption).
	//
	// When LittDB starts up, it attempts to create lock files. When a validator is forcefully shut down, lock files 
	// may be left behind. At startup time, if LittDB observes existing lock files, it first checks to see
	// if the process that created the lock files is still running. The lock files contain the creator's PID, and so 
	// LittDB checks to see if there is any process with that PID still running.
	//
	// Although it should be rare, it's possible that another process may be started with the same PID as the
	// PID used to create the lock files. When this happens, LittDB will be prevented from starting up out of
	// fear of another process trying to access the same files, even though the original process that created the 
	// lock files is no longer running. If that happens, this flag is a safe way to force LittDB to start up
	// without being blocked by those lock files. BE VERY CERTAIN THAT THE OTHER PROCESS IS ACTUALLY DEAD!
	// If two instances of LittDB are running on the same files, it WILL lead to data corruption.
	//
	// An alternate way to clear the LittDB lock files is via the LittDB CLI with the "litt unlock" command.
	// Run "litt unlock --help" for more information.
	LittUnsafePurgeLocks bool

@@ -267,6 +277,15 @@ func buildCLIParser(logger logging.Logger) *cli.App {
},
Action: nil, // syncCommand, // TODO this will be added in a follow up PR
},
{
Name: "unlock",
Usage: "Manually delete LittDB lock files. Dangerous if used improperly, use with caution.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some sort of confirmation flow where the user has to say "yes" and if they want to skip there's the force-unlock or --force flag for skipping that flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. You either must type I know what I am doing, or include a --force tag.

Copy link
Contributor

@pschork pschork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes seems like it is attempting to serve 2 purposes

  1. deleting zombie locks that prevent the node from starting
  2. allowing operators to backup littdb

In the case of 1, is it possible for the operator to just manually delete the zombie lock file?

litt3
litt3 previously approved these changes Aug 1, 2025
litt3
litt3 previously approved these changes Aug 1, 2025
@cody-littley
Copy link
Contributor Author

cody-littley commented Aug 1, 2025

@pschork

deleting zombie locks that prevent the node from starting
allowing operators to backup littdb

This change is not needed for operators to back up LittDB. I'm currently merging a series of PRs that add this functionality, this change is tangential to that one.

In the case of 1, is it possible for the operator to just manually delete the zombie lock file?

Yes, it's completely possible to do this manually without this tool. The reason why I automate this process is that the lock file is buried in the LittDB file structure, and there may be multiple lock files that need to be deleted. If the user configures LittDB to store its data in N physical volumes, then there will be one lockfile for each physical volume.

A secondary use case for this utility is if an operator can't easily modify the validator's file system. The other day when Daniel and I encountered a zombie lock, it was a pain to find and delete the lock file. We ended up just deleting the entire volume, since it was just preprod data and we don't care about data loss there. If we would have had a flag (like the one added by this PR), we could have just redeployed the validator with the purge locks flag turned on, and the problem would have been resolved.

dmanc
dmanc previously approved these changes Aug 1, 2025
@cody-littley cody-littley dismissed stale reviews from dmanc and litt3 via b9a8f6d August 1, 2025 18:56
@cody-littley cody-littley added this pull request to the merge queue Aug 4, 2025
Merged via the queue into master with commit 71e7d73 Aug 4, 2025
21 of 23 checks passed
@cody-littley cody-littley deleted the 1795-litt-unlock branch August 4, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants