about MaxPool

@Nandan91 @rajatsaini0294   HI
For each subspace, the input is HxWxG, through DW + MaxPool + PW, the **middle attention map** is HxWx1, then through Softmax + Expand, the **final attention map** is HxWxG.

Because the output dimension of this PW operation is 1, the **final attention map** is equivalent to _one weight shared by all channels_. Why use this PW?? Why is it designed so that _all channels share one weight_?

If this PW operation is removed,  that is, treat the output of the MaxPool operation as the **final attention map**. In this case, it is equivalent to that _each point and each channel has its own independent weight_. Why not design it this way?

many many thanks!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

about MaxPool #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

about MaxPool #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions