Skip to content

Conversation

@wanghuancoder
Copy link
Contributor

@wanghuancoder wanghuancoder commented Apr 25, 2021

PR types

Performance optimization

PR changes

Others

Describe

动态图对外暴露core.ops不再使用pybind11,改用Python_c_API。原因是Python_c_API的性能要更好一些,以降低动态图的调度成本。
本PR的修改的一些关键点:

  • 使用op_function_generator.cc生成代码的“框架”,微调生成代码的格式、内容。
  • 本次只修改了动态图API,使用Python_c_API。其余部分仍使用pybind11。
    • 在使用Python_c_API时,需将动态图API通过pybind11声明的core.ops向外暴露。
    • VarBase向Python的暴露仍是有pybind11完成的。因此,VarBase的输入输出需做特殊处理。
  • 由于用户传入的attr有各种类型,新的代码必须均能支持,如numpy.int64、numpy.ndarray、Tensor等特殊类型,也需支持
  • 报错信息,原本是在文件exception.cc中,通过pybind11将异常抛到python端的,因此Python_c_API也要有能力捕获异常、向python抛出异常

其中自动生成的op_function_impl.h代码请见:
https://gist.github.com/wanghuancoder/65be9f6d527e7da036f9ff5bc347c9bc

经测试,动态图性能略有提升:

对比项目 develop develop + Python_C_API 提升比例
core.ops.elementwise_add 184.046 s 137.275 s 25.4%
Linear+relu(GPU) 57.824 s 52.177 s 9.8%
Linear+relu(CPU) 64.545 s 59.769 s 7.4%
MaskRCNN_BS4_FP16 9.770 img/s 10.096 img/s 3.3%
PTB small 496.863 words/s 535.241 words/s 7.7%
Pahelix 237.306 s 235.473 s 0.8%

core.ops.elementwise_add耗时点对比:

耗时点 pybind11 Python_C_API
VarBase解析 4,757,120 2,031,758
Attr解析 38,639,099 7,841,931
GetCurrentTracer 2,391,581 1,569,099
Out map构造 16,708,953 14,194,468
in map构造 7,118,836 7,032,639
TraceOp 89,742,151 78,438,589
返回值打包 9,546,960
python交互 40,413,613 25,644,535

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot-old
Copy link

paddle-bot-old bot commented May 6, 2021

Sorry to inform you that 090a540's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

const char* IN_VAR_LIST_TYPE = R"(py::handle)";

const char* OUT_VAR_TYPE = R"(std::shared_ptr<imperative::VarBase>)";
const char* OUT_VAR_LIST_TYPE = R"(std::vector<std::shared_ptr<imperative::VarBase>>)";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在这个文件上面,output创建这个地方,貌似用make_shared更快一些,可以试一试

const char* OUT_INITIALIZER_TEMPLATE =
    R"({"%s", {std::shared_ptr<imperative::VarBase>(new imperative::VarBase(tracer->GenerateUniqueName()))}})";

image

image

image

https://stackoverflow.com/questions/20895648/difference-in-make-shared-and-normal-shared-ptr-in-c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我改完,使用core.ops.elementwise_add测了一下,没有发现明显收益。但既然理论上make_shared更优,应当在一些地方还是有收益的。只是在core.ops.elementwise_add这种极简单的测试场景下体现不出来。
现在已经改成了使用make_shared。

namespace paddle {
namespace pybind {

std::unordered_map<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

编程规范里有这样一条,禁止定义静态生存周期的对象,这个地方建议用单例模式管理,以明确对象的初始化顺序
image
https://zh-google-styleguide.readthedocs.io/en/latest/google-cpp-styleguide/scoping/#static-and-global-variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,thx!

if (PyObject_CheckLong(obj)) {
attrs[key] = (int)PyLong_AsLong(obj); // NOLINT
} else {
PADDLE_THROW(platform::errors::InvalidArgument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个THROW语句都是重复的,要不要写到一个地方统一调用,之后如果要完善报错信息会好维护一些

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是有些THROW是重复的,但有些地方有细微的差异。另外,抛出异常的行数是比较重要的调试信息。如果放到统一的地方THROW,行数信息就丢掉了。我保持了原样。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也可以写成宏,可以保留行号。

return result;
}

void init_ops_attrtype_map() {
Copy link
Contributor

@chenwhql chenwhql May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议函数命名风格统一,这里也使用驼峰式命名

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,thx!

PyObject* EnforceNotMetException =
PyErr_NewException("paddle.EnforceNotMet", PyExc_Exception, NULL);

void throw_exception_to_python(std::exception_ptr p) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,thx!

namespace paddle {
namespace pybind {

PyTypeObject *g_VarBase_PyType = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use either camelcase(variableName) or underscores(variable_name), not combined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,thx!

Comment on lines 17 to 18
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#define PY_ARRAY_UNIQUE_SYMBOL MY_PyArray_API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it seem no usage of these marcos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是为了解决include numpy的头文件导致的编译错误。

Comment on lines 19 to 22
#define INIT_NUMPY_ARRAY_CPP
#ifndef INIT_NUMPY_ARRAY_CPP
#define NO_IMPORT_ARRAY // for usual translation units
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need this macro?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不必要的已经删除

#include "paddle/fluid/imperative/type_defs.h"
#include "paddle/fluid/pybind/imperative.h"
#pragma GCC diagnostic ignored "-Wconversion-null"
#pragma GCC diagnostic ignored "-Wunused-variable"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ignore warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最早有些实现方式需要用到,现在已经不需要了,已经删除。

Comment on lines 43 to 47
int init_numpy() {
import_array();
return 0;
}
const static int numpy_initialized = init_numpy(); // NOLINT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest add some comments on this, it is not easy to follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最早有些实现方式需要用到,现在已经不需要了,已经删除。

}

inline bool PyObject_CheckFloat(PyObject* obj) {
return PyFloat_Check(obj) || PyLong_Check(obj) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better comment on why PyLong_Check is added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,thx!

PADDLE_ENFORCE_EQ(
(attr_end - attr_start + 1) % 2, 0,
platform::errors::InvalidArgument(
"The number of arguments for arributes should be even."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: arributes -> attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,thx!

const std::string& op_type, PyObject* args, ssize_t attr_start,
ssize_t attr_end, paddle::framework::AttributeMap& attrs) { // NOLINT
PADDLE_ENFORCE_EQ(
(attr_end - attr_start + 1) % 2, 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important. I suggest passing attr_end+1 directly to avoid some problems, [start, end) is commonly used (which means start is reachable and end is not).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,thx!

auto attr_type_map = &(OpAttrTypeMap::Instance().Map()[op_type]);

PyObject* obj = nullptr;
for (ssize_t arg_pos = attr_start; arg_pos < attr_end + 1; arg_pos += 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for attr_end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,thx!

py::gil_scoped_release release;
%s
framework::AttributeMap attrs;
ConstructAttrMapFromPyArgs("%s", args, %d, PyTuple_GET_SIZE(args)-1 , attrs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better pass PyTuple_GET_SIZE(args).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,thx!

const char* key_prt;
obj = PyTuple_GET_ITEM(args, arg_pos);
if (PyObject_CheckString(obj)) {
key_prt = PyUnicode_AsUTF8AndSize(obj, &key_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, key_prt -> key_ptr ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,thx!

if (PyObject_CheckLong(obj)) {
attrs[key] = (int)PyLong_AsLong(obj); // NOLINT
} else {
PADDLE_THROW(platform::errors::InvalidArgument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也可以写成宏,可以保留行号。

break;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding a function for each case to make this function short.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经为每个case创建了函数,PADDLE_THROW不仅是行号的问题,认真看报错内容的话,不同的PADDLE_THROW抛出的异常内容、格式略有差别,不适合封装公共异常抛出代码。

op_type, arg_name, arg_idx,
((PyTypeObject*)((PyObject*)item)->ob_type)->tp_name)); // NOLINT
}
void** vh = item->simple_layout ? item->simple_value_holder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better add comments on simple_layout?

void** vh = item->simple_layout ? item->simple_value_holder
: &item->nonsimple.values_and_holders[0];
result.emplace_back(
reinterpret_cast<std::shared_ptr<paddle::imperative::VarBase>&>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, what will be happended to the lifetime fo VarBase?

Copy link
Contributor

@jzhang533 jzhang533 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using NULL macro, using nullptr would be better.

@wanghuancoder
Copy link
Contributor Author

instead of using NULL macro, using nullptr would be better.

done.thx!

@phlrain phlrain self-requested a review June 11, 2021 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants