Parameter fusion support in Gluon

## Description
It's common that the parameters declared by a Block in Gluon don't exactly match the format used by operators in the backend. Thus we have examples where some parameters are concatenated every forward pass
- *RNN*
  https://github.com/apache/incubator-mxnet/blob/c3b0baaa27e2215eae7ed7676009ea5f4bf49013/python/mxnet/gluon/rnn/rnn_layer.py#L278
- *BERT*
  https://github.com/dmlc/gluon-nlp/pull/1136#discussion_r377480471

A naive approach is to refactor the respective Gluon Blocks, to declare the concatenated version of the parameter. This does not work in all cases, as we wish to initialize different parameters differently. For example, RNN biases should be initialized differently from RNN weights.

The status quo, where in such cases concatenation / fusion has to happen at every forward pass is not acceptable either.

Proposed solution: Introduce `Block.fuse()` and `Block.unfuse()` APIs. By default, they represent no-ops. User can overwrite `fuse` and `unfuse` to declare how to fuse the Block's parameters into a new set (or single) parameter. `fuse` is called prior to the first `forward`, after the `infer_shape`.
`export` will require fused parameters. Prior to `save_parameters` or `load_parameters`, the Block is unfused.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parameter fusion support in Gluon #18077

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parameter fusion support in Gluon #18077

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions