Skip to content
Merged
Show file tree
Hide file tree
Changes from 68 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
b1b4364
Rename PlainNet --> NetOp
reyoung Jul 26, 2017
ecf23ce
Update Backward
reyoung Jul 26, 2017
b1b13f8
Update Interface
reyoung Jul 26, 2017
00615eb
Refine OpRegistry::AddInput/AddOutput
reyoung Jul 26, 2017
a2dc961
Add fill_zeros_like op
JiayiFeng Jul 26, 2017
e32e306
Develop backward building precess of single op
JiayiFeng Jul 26, 2017
831d4e1
Refining Unittest
reyoung Jul 26, 2017
f77c63b
Merge branch 'feature/backward' of https://github.com/reyoung/Paddle …
JiayiFeng Jul 26, 2017
fa7cbfd
"backward is NetOp"
dzhwinter Jul 26, 2017
0ac79a3
Merge remote-tracking branch 'reyoung/feature/backward' into feature/…
dzhwinter Jul 26, 2017
292f2ab
"split to generic add PR"
dzhwinter Jul 26, 2017
05d9aff
Stash
reyoung Jul 27, 2017
fa6a46a
Merge branch 'feature/backward' of github.com:reyoung/Paddle into fea…
reyoung Jul 27, 2017
03f418c
Fix compile error
JiayiFeng Jul 27, 2017
5297bcb
Merge branch 'feature/backward' of https://github.com/reyoung/Paddle …
JiayiFeng Jul 27, 2017
9475972
Merge branch 'feature/backward' of github.com:reyoung/Paddle into fea…
reyoung Jul 27, 2017
f9fab14
Fix compile error
reyoung Jul 27, 2017
3d18737
Add unittest for part_of_output_are_not_need
reyoung Jul 27, 2017
70bd07a
Fix compile errors of FillZerosLikeOp
JiayiFeng Jul 27, 2017
63636d6
Stash for canpio
reyoung Jul 27, 2017
04db418
Add unitest of Backward.part_of_input_are_not_need
JiayiFeng Jul 27, 2017
28c0281
Stash
reyoung Jul 27, 2017
099bb53
Merge branch 'feature/backward' of github.com:reyoung/Paddle into fea…
reyoung Jul 27, 2017
3dd5fd0
Add unitest of Backward.intermediate_variable_not_need_in_linear_net
JiayiFeng Jul 27, 2017
84198f7
Add unittest
reyoung Jul 27, 2017
4461f3c
Merge branch 'feature/backward' of https://github.com/reyoung/Paddle …
JiayiFeng Jul 27, 2017
b1d8419
rename test
JiayiFeng Jul 27, 2017
d2583bd
InsertOp for NetOp
reyoung Jul 27, 2017
b9f2bb3
"wait add generic"
dzhwinter Jul 27, 2017
5713266
Merge remote-tracking branch 'reyoung/feature/backward' into feature/…
dzhwinter Jul 27, 2017
d4ab70a
Merge branch 'feature/backward' of github.com:reyoung/Paddle into fea…
reyoung Jul 27, 2017
a0669ea
Merge remote-tracking branch 'reyoung/feature/backward' into feature/…
dzhwinter Jul 27, 2017
7088654
"add duplicate"
dzhwinter Jul 27, 2017
404cc05
"reverse travesal"
dzhwinter Jul 27, 2017
65d2678
"add simple net test"
dzhwinter Jul 28, 2017
46d766e
Merge branch 'feature/unittest_for_inputs' into feature/backward
reyoung Jul 28, 2017
e1d1067
Merge branch 'feature/backward' of github.com:reyoung/Paddle into fea…
reyoung Jul 28, 2017
8bf0ca0
Fix unittest error
reyoung Jul 28, 2017
d0b25ac
Fix some unittest error
reyoung Jul 28, 2017
72839a7
fix conflict6
dzhwinter Jul 28, 2017
29d50ad
Refine unit-test
reyoung Jul 28, 2017
74cd9a7
"fix unittest"
dzhwinter Jul 28, 2017
7087a04
"add unittest"
dzhwinter Jul 28, 2017
b2e1c48
Merge remote-tracking branch 'reyoung/feature/backward' into feature/…
dzhwinter Jul 28, 2017
658588a
"format test case"
dzhwinter Jul 28, 2017
d6e0368
Add comment in backward.cc
reyoung Jul 28, 2017
e1cd719
Merge branch 'feature/backward' of https://github.com/reyoung/Paddle …
dzhwinter Jul 28, 2017
71bd439
Addjust Backward.linear_net_intermediate_variable_has_no_grad
JiayiFeng Jul 28, 2017
0da5cce
"fix test case"
dzhwinter Jul 28, 2017
52054af
"fix typo"
dzhwinter Jul 28, 2017
0e337be
Merge branch 'feature/backward' of https://github.com/reyoung/Paddle …
JiayiFeng Jul 28, 2017
1197420
Merge branch 'feature/backward' of https://github.com/reyoung/Paddle …
JiayiFeng Jul 28, 2017
302046a
"fix return net error"
dzhwinter Jul 28, 2017
1de465b
Change some `ASSERT_EQ` to `EXPECT_EQ`
JiayiFeng Jul 28, 2017
dc06eaa
Merge branch 'feature/backward' of https://github.com/reyoung/Paddle …
JiayiFeng Jul 28, 2017
39cd39e
Update test
JiayiFeng Jul 28, 2017
be52868
Fix net_input_of_network_not_need_grad
reyoung Jul 28, 2017
a2e2cd7
Fix bug of TEST Backwar.linear_net_intermediate_variable_has_no_grad
JiayiFeng Jul 28, 2017
2198963
Merge branch 'feature/backward' of https://github.com/reyoung/Paddle …
JiayiFeng Jul 28, 2017
42e2fa5
Fix unittest
reyoung Jul 28, 2017
48812cd
Merge branch 'feature/backward' of github.com:reyoung/Paddle into fea…
reyoung Jul 28, 2017
213fdad
adjust format
JiayiFeng Jul 28, 2017
f5636da
design doc
dzhwinter Jul 30, 2017
bd14660
"add part of design doc"
dzhwinter Jul 31, 2017
ca16c0d
Merge remote-tracking branch 'remotes/reyoung/feature/backward' into …
dzhwinter Jul 31, 2017
bc146e8
Merge branch 'develop' of github.com:baidu/Paddle into feature/backward
reyoung Aug 1, 2017
80baf86
Merge branch 'feature/backward' of github.com:reyoung/Paddle into fea…
reyoung Aug 1, 2017
e2fd2bd
Follow comments and merge develop
reyoung Aug 1, 2017
737ea05
Use static_cast, Fix unittest
reyoung Aug 1, 2017
9cc9907
Merge branch 'develop' of github.com:baidu/Paddle into feature/backward
reyoung Aug 1, 2017
051d6c8
Merge develop
reyoung Aug 1, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion paddle/framework/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,7 @@ add_custom_target(framework_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch
add_dependencies(framework_py_proto framework_py_proto_init)

cc_library(net SRCS net.cc DEPS op_registry)
cc_test(net_op_test SRCS net_op_test.cc DEPS net add_op mul_op sigmoid_op softmax_op fc_op)
cc_test(net_op_test SRCS net_op_test.cc DEPS net)

cc_library(backward SRCS backward.cc DEPS net)
cc_test(backward_test SRCS backward_test.cc DEPS backward)
178 changes: 178 additions & 0 deletions paddle/framework/backward.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/framework/backward.h"
#include <list>
#include "paddle/framework/net.h"
#include "paddle/framework/op_registry.h"

namespace paddle {
namespace framework {

static bool AllInSet(const std::vector<std::string>& names,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to use static?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we do not export them to global symbols.

const std::string& suffix,
const std::unordered_set<std::string>& set) {
for (auto& name : names) {
if (set.find(name + suffix) == set.end()) {
return false;
}
}
return true;
}

static std::shared_ptr<OperatorBase> NOP() {
auto net_op = std::make_shared<NetOp>();
net_op->type_ = "@NOP@";
net_op->CompleteAddOp();
return net_op;
}

// Get backward operator from a forward operator, recursively implementation.
//
// no_grad_names the gradient variable names without gradient calculating.
//
// uniq_id is a unique index used inside recursively calling BackwardRecursive.
// use `uid = uniq_id++;` to get the unique index, and pass `uniq_id` through
// recursive calling.
//
// returns The backward operator. For simple situation, it is a simple
// operator. For complex situation, it is a NetOp.
//
// See Backward.h for details
static std::shared_ptr<OperatorBase> BackwardRecursive(
const OperatorBase& forwardOp,
std::unordered_set<std::string>& no_grad_names, size_t& uniq_id);
std::shared_ptr<OperatorBase> BackwardRecursive(
const OperatorBase& forwardOp,
std::unordered_set<std::string>& no_grad_names, size_t& uniq_id) {
// If all input gradients of forwarding operator do not need to calculate,
// just return an NOP. Not return null ptr because NOP does not take
// too much time for calculation, but it is useful for simplifying logic.
if (AllInSet(forwardOp.inputs_, OperatorBase::GRAD_VAR_SUFFIX(),
no_grad_names)) {
return NOP();
}

// All output gradients of forwarding operator do not need to calculate. Then
// all input gradients cannot be computed at all, and we put them into
// `no_grad_names` set. Return an NOP.
if (AllInSet(forwardOp.outputs_, OperatorBase::GRAD_VAR_SUFFIX(),
no_grad_names)) {
for (auto& name : forwardOp.inputs_) {
// Mark all input is not need
no_grad_names.insert(name + OperatorBase::GRAD_VAR_SUFFIX());
}
return NOP();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我可以理解所有输入都不需要计算梯度时,可标记所有输出都不计算梯度。
但是,没有太想明白,什么情况下,依据输出不需要计算梯度,来标记输入不计算梯度。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在反向遍历的时候,如果使用输出Variable的Operator都没有计算梯度,自己也没办法计算梯度。


// Returned gradient network
auto net = std::make_shared<NetOp>();

if (forwardOp.IsNetOp()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么不能把获取NetOp的Backward Op写在NetOp对应的一个方法里?
会遇到其他复杂Op(SwitchOp? 或许举得例子不合适)还需要再写一个分支的情况吗?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qingqing01 我理解这里是Backward的生成逻辑,和NetOp自身的反向不在同一个层面。Gradient Operator是可插拔Unit,Backward是系统core的一部分。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

会遇到其他复杂Op(SwitchOp? 或许举得例子不合适)还需要再写一个分支的情况吗?

或许复杂Op的backward我们能注册到另一个地方。但是这个注册机制感觉非常不统一。。最简单的情况肯定是这么写,在没想到更好的办法的时候,先这么做吧。

// Because forwardOp is a net op, it can static_cast.
auto& forwardNet = static_cast<const NetOp&>(forwardOp);

// Map from output gradient variable name to operator's indices in backward
// net. That operator generates that variable.
std::unordered_map<std::string, std::vector<size_t>> dup_output_ops;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is dup mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated


size_t local_op_id = 0;
// reversely travel forwardNet
for (auto it = forwardNet.ops_.rbegin(); it != forwardNet.ops_.rend();
++it, ++local_op_id) {
auto fwd = *it;
auto bwd = BackwardRecursive(*fwd, no_grad_names, uniq_id);
net->AddOp(bwd);
for (auto& out : bwd->outputs_) {
dup_output_ops[out].emplace_back(local_op_id);
}
}
// Get unique ID for this method.
auto uid = uniq_id++;
// TODO(dzh): more comment
using Pos = std::pair<size_t, std::shared_ptr<OperatorBase>>;
std::list<Pos> insert_position;
for (auto& dup_output_op : dup_output_ops) {
const std::string& name = dup_output_op.first;
auto& dup_op = dup_output_op.second;
if (dup_op.size() == 1) continue;
std::vector<std::string> dup_outputs;

for (size_t i = 0; i < dup_op.size(); ++i) {
auto op_offset = dup_op[i];
dup_outputs.push_back(name + "@RENAME@" + std::to_string(uid) + "@" +
std::to_string(i));
net->ops_[op_offset]->Rename(name, dup_outputs.back());
}
insert_position.push_back(
{dup_op.back(),
OpRegistry::CreateOp(
"add", {dup_outputs}, {name},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个add op现在应该还没实现?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的。这个Op的实现不影响Backward算法的实现和单测。

{{"input_format",
std::vector<int>{0, (int)dup_outputs.size()}}})});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_cast

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}

insert_position.sort(
[](const Pos& l, const Pos& r) { return l.first > r.first; });

for (auto& pos : insert_position) {
net->InsertOp(pos.first + 1, pos.second);
}

} else {
std::shared_ptr<OperatorBase> grad_op = OpRegistry::CreateGradOp(forwardOp);
for (std::string& grad_input : grad_op->inputs_) {
if (no_grad_names.count(grad_input)) {
std::string prefix = grad_input.substr(
0, grad_input.size() - OperatorBase::GRAD_VAR_SUFFIX().size());
grad_input = prefix + OperatorBase::ZERO_VAR_SUFFIX();

// If part of input gradient of that operator is not calculated, fill
// zero variables to that input gradient.
net->AddOp(OpRegistry::CreateOp("fill_zeros_like", {prefix},
{grad_input}, {}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need more comments for fill_zeros_like op.

}
}

for (std::string& grad_output : grad_op->outputs_) {
if (no_grad_names.count(grad_output)) {
grad_output = OperatorBase::EMPTY_VAR_NAME();
}
}

if (net->ops_.empty()) { // Current no aux op is added to network
return grad_op;
}
net->AddOp(grad_op);
}
net->type_ = "@GENERATED_BACKWARD@";
net->CompleteAddOp();
return net;
}

// See header for comments
std::shared_ptr<OperatorBase> Backward(
const OperatorBase& forwardOp,
const std::unordered_set<std::string>& no_grad_vars) {
std::unordered_set<std::string> no_grad_names;
no_grad_names.reserve(no_grad_vars.size());

for (auto& name : no_grad_vars) {
no_grad_names.insert(name + OperatorBase::GRAD_VAR_SUFFIX());
}
size_t uid = 0;
return BackwardRecursive(forwardOp, no_grad_names, uid);
}
} // namespace framework
} // namespace paddle
27 changes: 27 additions & 0 deletions paddle/framework/backward.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#pragma once
#include <unordered_set>
#include "operator.h"
namespace paddle {
namespace framework {

// Create the backward operator from a forward operator.
// TODO(yuyang18): Add more API reference comment.
extern std::shared_ptr<OperatorBase> Backward(
const OperatorBase& forwardOp,
const std::unordered_set<std::string>& no_grad_vars);
} // namespace framework
} // namespace paddle
38 changes: 38 additions & 0 deletions paddle/framework/backward.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## Operator/expression 's Backward

### Motivation

In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the fundmental gradient operators/expressions together with chain rule . Every forward network need a backward network to construct the full computation lineage, the operator/ expression's Backward feature will generate the backward pass respect to forward pass.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computation lineage ==》 computation graph? I can find any definition about computation lineage


### Implement : gradient operator registry

| | forward operator | backward operator |
| ---------------------- | ---------------- | -------------------------------- |
| **Operator::inputs_** | Inputs | Inputs, Outputs, OutputGradients |
| **Operator::outputs_** | Outputs | InputGradients |

Inputs/Outputs means the input/output of the operator, InputGradients/OutputGradients is the gradient respect to forward opeartor. Forward operator and Backward operator are isomorphic, save their corresponding needs into member attribute.

We use a global hash map record the gradient operators available, follow the philosophy of minimum core, make operator pluggable unit. Each gradient is an operator and it needs to regist itself.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need not emphasize that we use hash map, map is enough, hash map is a way of optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regist ==> register


grad_op_builder(fengjiayi)

### Implement : Backward network

given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`,`InputGradients`.

1. bla bla bla (yuyang)

2. NetOp

when the input forward network is a NetOp, it need to call the sub NetOp/Operators backward function recursively and ensure them done. During the process, we need to collect the `OutputGradients` name.

We share variable in the same scope, as a result, duplicate operator `OutputGradients` will overwirte then duplicate variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overwirte => overwrite, then => the
I remember that we will add a add_op if some outputs are duplicated, and rename the duplicated ones.


![./images/duplicate_op]()
Copy link
Member

@jacquesqiao jacquesqiao Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

![./images/duplicate_op.png]()


Share variable between operators or same input variable used in multiple operators lead to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively, and add a generic add operator instead.

![./images/duplicate_op2]()

​ Then collect the sub graph OutputGradients/InputGradients as the NetOp's and return it.
Loading