Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions contrib/inference/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Embed Paddle Inference in Your Application

Paddle inference offers the APIs in `C` and `C++` languages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有必要分C和C++两个么?目前只是C++ api,能否先只写C++ api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,另外加一个 c api,估计另外一个pr里

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c如果暂时不需要就先别写了


One can easily deploy a model trained by Paddle following the steps as below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paddle->PaddlePaddle


1. Optimize the native model;
2. Write some codes for deployment.


Let's explain the steps in detail.

## Optimize the native Fluid Model

The native model that get from the training phase needs to be optimized for that.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们是拿了train阶段的save_inference_model,这样会加入feed和fetch op,并做了一定的剪裁优化。如果直接拿train阶段的模型,没有feed和fetch op,就跑不了了。

这里提到的策略1,2,3,应该在save_inference_model的时候就做了。
这里是否应该只提供一些额外的优化策略,比如third-party engine, fuse operators等

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对,这里只是解释这个工具的必要性。


- Clean the noise such as the cost operators that do not need inference;
- Prune unnecessary computation fork that has nothing to do with the output;
- Remove extraneous variables;
- Memory reuse for native Fluid executor;
- Translate the model storage format to some third-party engine's, so that the inference API can utilize the engine for acceleration;

We have an official tool to do the optimization, call `paddle_inference_optimize --help` for more information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle_inference_optimize是binary还是python脚本?
比如python paddle_inference_optimize src_model_dir dst_model_dir --inference_optimize_method=2 代表使用第二种优化策略。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary或者脚本


## Write some codes

Read `paddle_inference_api.h` for more information.
69 changes: 69 additions & 0 deletions contrib/inference/paddle_inference_api.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#pragma once

#include <string>
#include <vector>

namespace paddle {

class Predictor {
public:
struct Attr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attr-》Network?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是Network,是 attribute

Predictor() = default;

// Build the network before inference.
bool Init(const Attr& attr);

// Predict an record.
// Arguments:
// inputs: the name of the input variables.
// outputs: the name of the output varaibles.
// input_shapes: the shape of the input variables.
// output_shapes: the shape of the output variables.
// input_data: the data of the input variables.
// output_data: the data of the output variables.
bool Run(const std::vector<std::string>& inputs,
const std::vector<std::string>& outputs,
const std::vector<std::vector<int>>& input_shapes,
const std::vector<std::vector<int>>& output_shapes,
const std::vector<std::vector<float>>& input_data,
std::vector<std::vector<float>>* output_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个接口,对NLP的已经不适用了。是否考虑接口中直接使用LoDTensor。
因为用户的数据格式千变万化,让用户自己转成LoDTensor比较合理。我们也可以给出一些转换的工具或函数,但run的接口里保持使用LoDTensor。

bool Run(const std::vector<LoDTensor>& input, 
         std::vector<LoDTensor>* output);

inputs和outputs不需要,feed和fetch op里面都有的。

void TestInference(const std::string& dirname,
const std::vector<paddle::framework::LoDTensor*>& cpu_feeds,
const std::vector<paddle::framework::LoDTensor*>& cpu_fetchs,
const int repeat = 1, const bool is_combined = false) {

单侧里面已经封装的比较干净了。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里还需要考虑多线程预测的情况,需要加一个const int thread_nums的参数。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

内部没有多线程,多线程是外面的线程调预测库。


// Clone a predictor that share the model weights.
Predictor* Clone();

// Destroy the Predictor.
~Predictor();

struct Attr {
enum class EngineKind;

std::string model_dir; // path to the model directory.
bool enable_engine{false}; // Enable to execute (part of) the model on
// third-party engines.
EngineKind engine_kind{Attr::EngineKind::kNone};

enum class EngineKind {
kNone = -1, // Use the native Fluid facility.
kAnakin, // Use Anakin for inference.
kTensorRT, // Use TensorRT for inference.
kAutoMixedAnakin, // Automatically mix Fluid with Anakin.
kAutoMixedTensorRT, // Automatically mix Fluid with TensorRT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • kAutoMixedAnakin和kAutoMixedTensorRT可以去掉,kAnakin应该就包括kAutoMixedAnakin
  • kNone里面应该还要分CPU模式,GPU模式
  • MKLDNN属于kNone还是单列?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不包括,这里 kTensorRT指的是全图用,子图那个是单独的开关kAutoMixedTensorRT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对用户来说,子图全图概念有点复杂,选了TensorRT,就理解为用TensorRT来做优化了,至于用子图还是全图优化(而且全图是子图的一部分),应该内部实现。

Copy link
Contributor Author

@Superjomn Superjomn May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

部分支持的feature现在还没有,放在这里只剩为了让业务方知道我们在做这个feature

};
};
};

} // namespace paddle