Skip to content

Timeout for monitor operation #2730

@kvaps

Description

@kvaps

Description
When node have problem with stucked operations, it can brake OpenNebula itself, eg it may be broken disk subsutem, disconneted target or some other problem.
OpenNebula runs a lot of /var/lib/one/remotes/tm/<driver>/monitor operations but they are stuck forever.

To Reproduce
Eg right now I have broken LUN and any lvm command is stuck for ages. Try to reproduce that:

  • Connect iSCSI target
  • Create LVM group
  • Create VM in this LVM storage
  • Run VM
  • Try to disconnect LUN

Now you have broken host, and any lvm command will stuck forever.
Wait for a while, then check ps aux on the opennebula you will se a lots of hanged monitor comands

Expected behavior
OpenNebula will return ERROR on this host monitoring and continue monitoring of the rest hosts.

Details

  • Affected Component: Storage Drivers
  • Hypervisor: KVM
  • Version: 5.6.1

Additional context
Add any other context about the problem here.

Progress Status

  • Branch created
  • Code committed to development branch
  • Testing - QA
  • Documentation
  • Release notes - resolved issues, compatibility, known issues
  • Code committed to upstream release/hotfix branches
  • Documentation committed to upstream release/hotfix branches

Metadata

Metadata

Assignees

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions