- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.9k
fea/init tensorrt engine #10003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fea/init tensorrt engine #10003
Conversation
dependency has been installed in docker image.
c64c5ac    to
    4da8cbd      
    Compare
  
    f7d80a2    to
    74ea1f6      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议issue里面对engine的设计初衷描述的详细些。因为没有设计文档,很难判断所有的接口是不是都是必须的。
        
          
                paddle/fluid/inference/engine.h
              
                Outdated
          
        
      | distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. */ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright的格式调整一下吧。另外新加入的文件copyright里面的年份应该是2018年。
| #include "paddle/fluid/framework/framework.pb.h" | ||
|  | ||
| namespace paddle { | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按照Paddle对于命名空间的使用规则,这里应该还有一层namespace inference。
另外,有个问题确认一下,TensorRT实现相关的所有代码,包括tensorrt_op,都放置在inference目录下吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可能会有下面的目录
- inference/engine.h, engine 宏观接口
- inference/tensorrt 装tensorrt 相关的内容
- engine_op[.h/.cc], 包含 TensorrtEngineOp
- convert[.h/.cc] 帮convert fluid op -> tensorrt layer
 
- inference/alajin 装 alajin的
 类似 tensorrt的
| virtual void Build(const PbType& paddle_model) = 0; | ||
|  | ||
| // Execute the engine, that will run the inference network. | ||
| virtual void Execute(int batch_size) = 0; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Execute函数是不是最好以Paddle的LoDTensor类型作为参数?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 engine 只是个工具类,会由 TensorrtEngineOp 调用。
我的理解, TensorrtEngineOp 应该会把 engine 封装成 fluid op
engine 的 input 和 output 以及个数都不固定,有 DeclInput 和  DeclOutput 两个分别来创建 tensorrt 的 Input  和 output 节点。
这个类会暴露出比较多的小接口,这些接口会帮助构建 tensorrt 的network 以及 runtime engine, 中间小接口基本是不可少的。 最重要的用处是在 TensorrtEngineOp 里,额外的会在各种 UT 里使用, 比如 fluid_op 与convert后的 tensorrt layer 之间做无diff, 用这个类可以帮助跑 tensorrt layer和取结果。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeclInput 和 DeclOutput 两个分别来创建 tensorrt 的 Input 和 output 节点
这两个和Convert类里的ConvertInput和ConvertOutput里,除了转的那步,功能很类似。有更好的设计办法么?
Decl这四个字母含义不够清晰,能否就叫Add呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
会改成 DeclareInput,表示在 TensorRT network中添加 data 节点
| 这个engine是工具类,会尽快加一个宏观的design,但这个类应该不需要涉及到接口设计,只会在  | 
        
          
                paddle/fluid/inference/engine.h
              
                Outdated
          
        
      |  | ||
| /* | ||
| * EngineBase is the base class of all inference engines. An inference engine | ||
| * takes a paddle program as input, and output the result in paddle Tensor | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- output->outputs
- paddle Tensor format->fluid tensor format
        
          
                paddle/fluid/inference/engine.h
              
                Outdated
          
        
      | * EngineBase is the base class of all inference engines. An inference engine | ||
| * takes a paddle program as input, and output the result in paddle Tensor | ||
| * format. It can be used to optimize performance of computation subgraphs, for | ||
| * example, break down the original model into subgraphs and execute each | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
使用block概念是不是更为统一,下同
- original model:original block?
- subgraphs->sub-block?
        
          
                paddle/fluid/inference/engine.h
              
                Outdated
          
        
      | * When inference, the resnet50 model can put most of the model into subgraph | ||
| * and run it on a TensorRT engine. | ||
| * | ||
| * There are several engines such as TensorRT and other internal frameworks, so | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other internal frameworks,可以去掉internal
        
          
                paddle/fluid/inference/engine.h
              
                Outdated
          
        
      | */ | ||
| class EngineBase { | ||
| public: | ||
| // TODO fix it latter | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请问39行是要fix什么呢?能否写的详细一点
PbType是指什么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里 PbType 表示 desc 的格式,会换一个明确的名字
| // them, and an macro like this is more extensible when underlying TensorRT | ||
| // library add new layer supports. | ||
| #define TRT_ENGINE_ADD_LAYER(engine__, layer__, ARGS...) \ | ||
| engine__->network()->add##layer__(ARGS); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请问这个宏定义可以去掉么?
- 直接用原来的函数也很清晰;
- 因为convert类里面也需要add不同的layer,那么convert类需要包含engine类的头文件,是不是不太合理?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个宏会
- 提供统一add layer 的接口,而不需要为每种layer增加一个函数,比如 addFullyConnected等
| PADDLE_ENFORCE(output != nullptr); | ||
| output->setName(name.c_str()); | ||
| infer_network_->markOutput(*output); | ||
| buffer_sizes_[name] = 0; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么156行设成0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ comment
| } | ||
|  | ||
| void*& TensorrtEngine::buffer(const std::string& name) { | ||
| PADDLE_ENFORCE(infer_engine_ != nullptr, "call freezenetwork first."); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
freeze network 中间加空格
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FreeNetwork
| } | ||
|  | ||
| void TensorrtEngine::DeclOutput(nvinfer1::ILayer* layer, int offset, | ||
| const std::string& name) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
前两个参数也是const类型
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const layer
| } | ||
|  | ||
| void TensorrtEngine::SetInputFromCPU(const std::string& name, void* data, | ||
| size_t size) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t size也是const类型
| TensorrtEngine::Weight bias(TensorrtEngine::data_type::kFLOAT, raw_bias, | ||
| size); | ||
| auto* x = engine_->DeclInput("x", TensorrtEngine::data_type::kFLOAT, | ||
| nvinfer1::DimsCHW{1, 1, 1}); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- data_type没有必要包装一遍,可以直接用nvinfer原来的格式,更加清晰。这里nvinfer1::DimsCHW也是用的nvinfer的格式。
- 如果要包装的话,要在TensorrtEngine类下么,那convert类使用这些type的时候,要调用TensorrtEngine类?
- weight类也是如此。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
engine 的.cc 里应该要调用 convert,两者交叉调用应该没有问题。
| float x_v = 1234; | ||
| engine_->SetInputFromCPU("x", (void*)&x_v, 1 * sizeof(float)); | ||
| LOG(INFO) << "to execute"; | ||
| engine_->Execute(1); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the model to create an engine and an execution context这一部分不封装了么?
Paddle/paddle/fluid/inference/tensorrt/test_tensorrt.cc
Lines 138 to 149 in 1866597
| Logger logger; | |
| nvinfer1::IRuntime* runtime = createInferRuntime(logger); | |
| nvinfer1::ICudaEngine* engine = | |
| runtime->deserializeCudaEngine(model->data(), model->size(), nullptr); | |
| model->destroy(); | |
| nvinfer1::IExecutionContext* context = engine->createExecutionContext(); | |
| // Execute the network. | |
| float input = 1234; | |
| float output; | |
| Execute(*context, &input, &output); | |
| EXPECT_EQ(output, input * 2 + 3); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里就不需要 serialize 了
| * There are two alternative ways to use it, one is to build from a paddle | ||
| * protobuf model, another way is to manully construct the network. | ||
| */ | ||
| class TensorrtEngine : public EngineBase { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TensorrtEngine->TensorRTEngine,rt大写,和TensorRT一致。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| @@ -1 +1,4 @@ | |||
| nv_test(test_tensorrt SRCS test_tensorrt.cc DEPS dynload_cuda device_context dynamic_loader) | |||
| if(WITH_TESTING) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里可以不加if(WITH_TESTING),因为在nv_test里面会做判断。可以之后的PR修改。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| 4 // kINT32 | ||
| }; | ||
|  | ||
| // The following two API are implemented in TensorRT's header file, cannot load | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API-》APIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
fixes: #10004