Skip to content

3.12.8 Shovel plugin crashes after upgrade with existing shovels #9894

@gomoripeti

Description

@gomoripeti

Describe the bug

Commit ccc22cb changed the id format of the children in the mirrored supervisor rabbit_shovel_dyn_worker_sup. However the child spec of a mirrored supervisor is stored in Mnesia and survives a rolling restart. During an upgrade with existing dynamic shovels the below crash was observed on the first node that is upgraded because of the new code hitting old id format.

BOOT FAILED
===========
Error during startup: {error,
                       {rabbitmq_shovel,
                        {{shutdown,
                          {failed_to_start_child,
                           rabbit_shovel_dyn_worker_sup_sup,
                           {'EXIT',
                            {function_clause,
                             [{rabbit_shovel_dyn_worker_sup_sup,id,
                               [<<"shovel1">>],
                               [{file,"rabbit_shovel_dyn_worker_sup_sup.erl"},
                                {line,100}]},
                              {rabbit_shovel_dyn_worker_sup_sup,
                               '-cleanup_specs/0-fun-2-',2,
                               [{file,"rabbit_shovel_dyn_worker_sup_sup.erl"},
                                {line,90}]},
                              {sets,fold_bucket,3,
                               [{file,"sets.erl"},{line,503}]},
                              {sets,fold_seg,4,[{file,"sets.erl"},{line,499}]},
                              {sets,fold_segs,4,
                               [{file,"sets.erl"},{line,495}]},
                              {rabbit_shovel_dyn_worker_sup_sup,
                               cleanup_specs,0,
                               [{file,"rabbit_shovel_dyn_worker_sup_sup.erl"},
                                {line,93}]},
                              {rabbit_shovel_dyn_worker_sup_sup,start_child,
                               2,
                               [{file,"rabbit_shovel_dyn_worker_sup_sup.erl"},
                                {line,42}]},
                              {rabbit_shovel_dyn_worker_sup_sup,
                               '-start_link/0-lc$^0/1-0-',1,
                               [{file,"rabbit_shovel_dyn_worker_sup_sup.erl"},
                                {line,28}]}]}}}},
                         {rabbit_shovel,start,[normal,[]]}}}}

For the record on another node:

1> mirrored_supervisor:which_children(rabbit_shovel_dyn_worker_sup_sup).
[{{<<"/">>,<<"shovel1">>},
  <49058.4184.0>,worker,
  [rabbit_shovel_dyn_worker_sup]}]

I upgraded from 3.11.24 but I think one can start from any version prior to 3.12.8.

EDIT: I believe it only happens on multi-node clusters.

Reproduction steps

  1. Create a multi-node cluster with version prior to 3.12.8 (eg 3.11.24)
  2. Create a dynamic shovel that is not auto-deleted (eg shovelling between 2 local queues)
  3. Upgrade first node to 3.12.8 and restart.
  4. Observer the shovel plugin crash on the first node which prevents boot

Expected behavior

Existing shovels should still work after upgrade to 3.12.8, possibly by executing a DB migration converting the child IDs.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions