Skip to content

Support node recovery after ZooKeeper SESSION_EXPIRED event #43

@hsestupin

Description

@hsestupin

Hey, If one of the registered galaxy nodes receives SESSION_EXPIRED event from ZooKeeper client than all ephemeral info about this node will be deleted from ZooKeeper cluster. And moreover it breaks consistency of Galaxy cluster cause that particular node will think I am alive but other nodes will be pretty sure it's dead. This situation can be achieved quite easily if we set sessionTimeoutMs to small value like 500 ms. Anyway there should be a valid fail recovery strategy in case of ZooKeeper session expiration.

For more theoretical info about ZooKeeper internal you could read this https://wiki.apache.org/hadoop/ZooKeeper/FAQ#A3

If you can give any advices where I should look forward in order to fix this issue may be I will try to do that. It seems like recreating all ephemeral nodes will be enough as a simpliest solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions