-
Notifications
You must be signed in to change notification settings - Fork 4.4k
tanh avx512 mask optimization #6096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6096 +/- ##
==========================================
+ Coverage 95.59% 95.70% +0.10%
==========================================
Files 827 827
Lines 270116 270122 +6
==========================================
+ Hits 258226 258527 +301
+ Misses 11890 11595 -295 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
The binary size change of libncnn.so (bytes)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the TanH function for x86 by introducing an AVX512 mask‐based remainder handling block to process elements that do not fill a complete 16-element vector. Key changes include:
- Using AVX512 mask load/store instructions for the remainder elements.
- Removing the previously nested SSE2/AVX preprocessor directives for this block.
- Maintaining backwards compatibility with a fallback to SSE2/AVX code when AVX512F is not available.
Comments suppressed due to low confidence (1)
src/layer/x86/tanh_x86.cpp:53
- [nitpick] Consider renaming 'remain' to 'remaining_elements' to enhance clarity.
const unsigned int remain = size - i;
|
Thanks for your contribution ! |
#6061