-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Feature/use cudnn #7141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/use cudnn #7141
Conversation
|
|
||
|
|
||
| cc_library(init SRCS init.cc DEPS gflags device_context place stringpiece) | ||
| cc_library(init SRCS init.cc DEPS gflags device_context place stringpiece operator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init does rely on operator
| void DummyTrans(const platform::DeviceContext* ctx, | ||
| const KernelTypePair& kernel_pair, const Variable& in, | ||
| Variable* out) { | ||
| PADDLE_ENFORCE(in.IsType<Tensor>(), "Only Support Tensor transform!."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all Tensors are actually LoDTensors, here should be
in.IsType<LoDTensor>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Will fixed in next PR
| #endif | ||
| } | ||
|
|
||
| void UseALL() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since UseCUDNN calls UseCUDA
UseCUDA calls UseMKLDNN...
UseALL is no needed. We can call UseCUDNN directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, UseXXX is recursively called previous UseXXX.
But UseALL called UseCUDNN looks odd, just make it more clearly. And this interface should be removed in the future, we should ONLY allow user configure op in the attribute.
| if ((actual_kernel_key == candidate_key) || | ||
| (kernels.count(candidate_key) && | ||
| trans_map.GetNullable(candidate_pair))) { | ||
| expected_kernel_key = candidate_key; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default Priority will overwrite user's configuration.
We should strictly obey users' configuration first. If user does not provide a preference, then, we can find a kernel key under the guide of default Priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this does not obey the user configuration first rule. Will fix it in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost of DataTrans are different, and the cost from small to large is the following: DataType, Layout, CPU<->GPU. when choosing candidate_key, these cost should take into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我觉得我们把问题搞复杂了,目前为止只有 CPU <-> GPU的需求,MKLDNNLayout <-> kPlain的需求。下个PR里从op的attribute让用户选是否使用就可以,考虑cost就和tensorflow的cost model一样了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
premature optimization is the root of all evil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this PR will block another, some fix work will be done in another PR together.
|
Thanks! These fix will be done in #6660 |
No description provided.