Skip to content

Commit 1ff9cf1

Browse files
YuanRishengchenwhqlMingMingShangTianzyfncgShixiaowei02
authored andcommitted
Add Intermediate Kernel API for refactor Tensor Lib (PaddlePaddle#36914)
* initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (PaddlePaddle#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (PaddlePaddle#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (PaddlePaddle#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (PaddlePaddle#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (PaddlePaddle#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (PaddlePaddle#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 2309149. * Move cpu, cuda and other device code into kernels (PaddlePaddle#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (PaddlePaddle#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (PaddlePaddle#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (PaddlePaddle#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (PaddlePaddle#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (PaddlePaddle#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: Chen Weihang <[email protected]> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (PaddlePaddle#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (PaddlePaddle#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (PaddlePaddle#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (PaddlePaddle#27) * replace to small vector and change to const & * add std::move Co-authored-by: Chen Weihang <[email protected]> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * Add Intermediate API layer * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (PaddlePaddle#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: shixiaowei02 <[email protected]> * intermediate api adapt to new dense tensor * add some TODO and delete include header Co-authored-by: Chen Weihang <[email protected]> Co-authored-by: chentianyu03 <[email protected]> Co-authored-by: zyfncg <[email protected]> Co-authored-by: 石晓伟 <[email protected]>
1 parent 79203ec commit 1ff9cf1

File tree

15 files changed

+413
-8
lines changed

15 files changed

+413
-8
lines changed

paddle/fluid/framework/operator.cc

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ limitations under the License. */
2323
#include "paddle/fluid/framework/data_type_transform.h"
2424
#include "paddle/fluid/framework/details/nan_inf_utils.h"
2525
#include "paddle/fluid/framework/op_call_stack.h"
26-
#include "paddle/fluid/framework/pten_utils.h"
2726
#include "paddle/fluid/framework/shape_inference.h"
2827
#include "paddle/fluid/framework/transfer_scope_cache.h"
2928
#include "paddle/fluid/framework/unused_var_check.h"

paddle/fluid/imperative/prepared_operator.cc

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@
1616

1717
#include "paddle/fluid/framework/data_type_transform.h"
1818
#include "paddle/fluid/framework/details/nan_inf_utils.h"
19-
#include "paddle/fluid/framework/pten_utils.h"
2019
#include "paddle/fluid/imperative/infer_shape_context.h"
2120
#include "paddle/pten/common/scalar.h"
2221
#include "paddle/utils/small_vector.h"

paddle/pten/api/include/creation.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,26 @@
1414

1515
#pragma once
1616

17+
#include "paddle/pten/api/include/infershape.h"
18+
#include "paddle/pten/hapi/lib/utils/allocator.h"
1719
#include "paddle/pten/kernels/cpu/creation.h"
1820
#include "paddle/pten/kernels/cuda/creation.h"
21+
22+
namespace pten {
23+
24+
// TODO(YuanRisheng) This function name should be same as User API name.
25+
// TODO(zyfncg) Automatic code generation
26+
template <typename T, typename ContextT>
27+
DenseTensor FillAnyLike(const ContextT& dev_ctx,
28+
const DenseTensor& x,
29+
const Scalar& val) {
30+
auto out_meta = UnchangedInferShape(x.meta());
31+
const auto allocator =
32+
std::make_shared<paddle::experimental::DefaultAllocator>(
33+
dev_ctx.GetPlace());
34+
pten::DenseTensor dense_out(allocator, out_meta);
35+
FillAnyLike<T>(dev_ctx, x, val, &dense_out);
36+
return dense_out;
37+
}
38+
39+
} // namespace pten

paddle/pten/api/include/linalg.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,5 +15,24 @@
1515
#pragma once
1616

1717
// See Note: [ How do we organize the kernel directory ]
18+
#include "paddle/pten/api/include/infershape.h"
19+
#include "paddle/pten/hapi/lib/utils/allocator.h"
1820
#include "paddle/pten/kernels/cpu/linalg.h"
1921
#include "paddle/pten/kernels/cuda/linalg.h"
22+
23+
namespace pten {
24+
25+
template <typename T, typename ContextT>
26+
DenseTensor Dot(const ContextT& dev_ctx,
27+
const DenseTensor& x,
28+
const DenseTensor& y) {
29+
auto out_meta = DotInferShape(x.meta(), y.meta());
30+
const auto allocator =
31+
std::make_shared<paddle::experimental::DefaultAllocator>(
32+
dev_ctx.GetPlace());
33+
pten::DenseTensor dense_out(allocator, out_meta);
34+
Dot<T>(dev_ctx, x, y, &dense_out);
35+
return dense_out;
36+
}
37+
38+
} // namespace pten

paddle/pten/api/include/manipulation.h

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,5 +15,25 @@
1515
#pragma once
1616

1717
// See Note: [ How do we organize the kernel directory ]
18+
#include "paddle/pten/api/include/infershape.h"
19+
#include "paddle/pten/hapi/lib/utils/allocator.h"
1820
#include "paddle/pten/kernels/cpu/manipulation.h"
1921
#include "paddle/pten/kernels/cuda/manipulation.h"
22+
23+
namespace pten {
24+
25+
template <typename T, typename ContextT>
26+
DenseTensor Flatten(const ContextT& dev_ctx,
27+
const DenseTensor& x,
28+
int start_axis,
29+
int stop_axis) {
30+
auto out_meta = FlattenInferShape(x.meta(), start_axis, stop_axis);
31+
const auto allocator =
32+
std::make_shared<paddle::experimental::DefaultAllocator>(
33+
dev_ctx.GetPlace());
34+
pten::DenseTensor dense_out(allocator, out_meta);
35+
Flatten<T>(dev_ctx, x, start_axis, stop_axis, &dense_out);
36+
return dense_out;
37+
}
38+
39+
} // namespace pten

paddle/pten/api/include/math.h

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,5 +15,62 @@ limitations under the License. */
1515
#pragma once
1616

1717
// See Note: [ How do we organize the kernel directory ]
18+
#include "paddle/pten/api/include/infershape.h"
19+
#include "paddle/pten/hapi/lib/utils/allocator.h"
1820
#include "paddle/pten/kernels/cpu/math.h"
1921
#include "paddle/pten/kernels/cuda/math.h"
22+
23+
namespace pten {
24+
25+
template <typename T, typename ContextT>
26+
DenseTensor Sign(const ContextT& dev_ctx, const DenseTensor& x) {
27+
auto out_meta = UnchangedInferShape(x.meta());
28+
const auto allocator =
29+
std::make_shared<paddle::experimental::DefaultAllocator>(
30+
dev_ctx.GetPlace());
31+
pten::DenseTensor dense_out(allocator, out_meta);
32+
Sign<T>(dev_ctx, x, &dense_out);
33+
return dense_out;
34+
}
35+
36+
template <typename T, typename ContextT>
37+
DenseTensor Mean(const ContextT& dev_ctx, const DenseTensor& x) {
38+
auto out_meta = ReductionInferShape(x.meta());
39+
const auto allocator =
40+
std::make_shared<paddle::experimental::DefaultAllocator>(
41+
dev_ctx.GetPlace());
42+
pten::DenseTensor dense_out(allocator, out_meta);
43+
Mean<T>(dev_ctx, x, &dense_out);
44+
return dense_out;
45+
}
46+
47+
template <typename T, typename ContextT>
48+
DenseTensor Scale(const ContextT& dev_ctx,
49+
const DenseTensor& x,
50+
float scale,
51+
float bias,
52+
bool bias_after_scale) {
53+
auto out_meta = UnchangedInferShape(x.meta());
54+
const auto allocator =
55+
std::make_shared<paddle::experimental::DefaultAllocator>(
56+
dev_ctx.GetPlace());
57+
pten::DenseTensor dense_out(allocator, out_meta);
58+
Scale<T>(dev_ctx, x, scale, bias, bias_after_scale, &dense_out);
59+
return dense_out;
60+
}
61+
62+
template <typename T, typename ContextT>
63+
DenseTensor Scale(const ContextT& dev_ctx,
64+
const DenseTensor& x,
65+
const DenseTensor& scale,
66+
float bias,
67+
bool bias_after_scale) {
68+
auto out_meta = UnchangedInferShape(x.meta());
69+
const auto allocator =
70+
std::make_shared<paddle::experimental::DefaultAllocator>(
71+
dev_ctx.GetPlace());
72+
pten::DenseTensor dense_out(allocator, out_meta);
73+
ScaleHost<T>(dev_ctx, x, scale, bias, bias_after_scale, &dense_out);
74+
return dense_out;
75+
}
76+
} // namespace pten

paddle/pten/hapi/lib/utils/tensor_utils.cc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ std::unique_ptr<pten::DenseTensor> MakePtenDenseTensor(
4545
SetLoD(&meta.lod, src.lod());
4646
auto shared_storage =
4747
pten::make_intrusive<SharedStorage>(src.Holder(), src.offset());
48+
4849
return std::make_unique<pten::DenseTensor>(std::move(shared_storage),
4950
std::move(meta));
5051
}

paddle/pten/kernels/cpu/manipulation.cc

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,9 @@ void Flatten(const CPUContext& dev_ctx,
2424
int start_axis,
2525
int stop_axis,
2626
DenseTensor* out) {
27-
auto out_meta = FlattenInferShape(x.meta(), start_axis, stop_axis);
27+
auto out_dims = out->dims();
2828
pten::Copy(dev_ctx, x, out);
29-
out->set_lod(out_meta.lod);
30-
out->Resize(out_meta.dims);
29+
out->Resize(out_dims);
3130
}
3231

3332
// TODO(yuanrisheng): this kernel is for training and xshape is a Intermediate

paddle/pten/kernels/cuda/manipulation.cu

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,9 @@ void Flatten(const CUDAContext& dev_ctx,
2424
int start_axis,
2525
int stop_axis,
2626
DenseTensor* out) {
27-
auto out_meta = FlattenInferShape(x.meta(), start_axis, stop_axis);
27+
auto out_dims = out->dims();
2828
pten::Copy(dev_ctx, x, out);
29-
out->set_lod(out_meta.lod);
30-
out->Resize(out_meta.dims);
29+
out->Resize(out_dims);
3130
}
3231

3332
// TODO(yuanrisheng): this kernel is for training and xshape is a Intermediate

paddle/pten/tests/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@ cc_test(test_matmul_api SRCS test_matmul_api.cc DEPS linalg_api pten_hapi_utils)
1212
cc_test(test_fill_api SRCS test_fill_api.cc DEPS creation_api pten_hapi_utils)
1313
cc_test(test_copy_api SRCS test_copy_api.cc DEPS utils_cpu pten_hapi_utils)
1414
cc_test(test_flatten_api SRCS test_flatten_api.cc DEPS utils_cpu manipulation_api pten_hapi_utils)
15+
cc_test(test_scale_api SRCS test_scale_api.cc DEPS math_api pten_hapi_utils)

0 commit comments

Comments
 (0)