[OpenCL][Kernel] Use FC replace conv1x1 #6365
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
【本PR工作】
对_特定情况_下的 conv2d_1x1 转换为 FC 计算,同时为了解决 input_channel 较大时单个线程需要遍历计算 input_channel 次乘累加操作,扩大了 4 倍线程数量,即将 input_channel 分成 4 部分,每个线程负责其中一部分的计算,然后 4 个线程通过 local memory 把中间乘累加结果再加在一起。
对比之前的方案,核心差异点:
【效果】

MobileNetV3_small_x1_0_infer 模型,其中 19 个 conv1x1 可以使用 FC 代替,模型整体加速比 和 kernel 加速比如下:
MobileNetV3_small_x1_0_infer kernel profiling on armv7 on 845

MobileNetV3_large_x1_0_infer 模型,其中 17 个 conv1x1 可以使用 FC 代替,模型整体加速比如下:
