Skip to content

about MaxPool #10

@foralliance

Description

@foralliance

@Nandan91 @rajatsaini0294 HI
For each subspace, the input is HxWxG, through DW + MaxPool + PW, the middle attention map is HxWx1, then through Softmax + Expand, the final attention map is HxWxG.

Because the output dimension of this PW operation is 1, the final attention map is equivalent to one weight shared by all channels. Why use this PW?? Why is it designed so that all channels share one weight?

If this PW operation is removed, that is, treat the output of the MaxPool operation as the final attention map. In this case, it is equivalent to that each point and each channel has its own independent weight. Why not design it this way?

many many thanks!!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions