Skip to content

Commit 8631e79

Browse files
authored
Merge pull request #4026 from dzhwinter/backward_graph
rewrite the document
2 parents 6d2e87f + a90274e commit 8631e79

File tree

3 files changed

+41
-23
lines changed

3 files changed

+41
-23
lines changed

paddle/framework/backward.md

Lines changed: 41 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,22 @@
22

33
## Motivation
44

5-
In Neural Network, the backpropagation algorithm follows the chain rule, so we need to compound the gradient operators/expressions together with the chain rule. Every forward network needs a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.
5+
In Neural Network, many model is solved by the the backpropagation algorithm(known as BP) at present. Technically it caculates the gradient of the loss function, then distributed back through the networks. Follows the chain rule, so we need a module chains the gradient operators/expressions together with to construct the backward pass. Every forward network needs a backward network to construct the full computation graph, the operator/expression's backward pass will be generated respect to forward pass.
66

7-
## Backward Operator Registry
7+
## Implementation
88

9-
A backward network is built up with several backward operators. Backward operators take forward operators' inputs outputs, and output gradients and then calculate its input gradients.
9+
In this design doc, we exported only one API for generating the backward pass.
10+
11+
```c++
12+
std::unique_ptr<OperatorBase> Backward(const OperatorBase& forwardOp,
13+
const std::unordered_set<std::string>& no_grad_vars);
14+
```
15+
16+
The implementation behind it can be divided into two parts, **Backward Operator Creating** and **Backward Operator Building**.
17+
18+
### Backward Operator Registry
19+
20+
A backward network is built up with several backward operators. Backward operators take forward operators' inputs, outputs, and output gradients and then calculate its input gradients.
1021
1122
| | forward operator | backward operator
1223
| ---------------------- | ---------------- |------------------------- |
@@ -25,7 +36,7 @@ REGISTER_OP(mul, MulOp, MulOpMaker, mul_grad, MulOpGrad);
2536

2637
`mul_grad` is the type of backward operator, and `MulOpGrad` is its class name.
2738

28-
## Backward Opeartor Creating
39+
### Backward Opeartor Creating
2940

3041
Given a certain forward operator, we can get its corresponding backward operator by calling:
3142

@@ -43,40 +54,47 @@ The function `BuildGradOp` will sequentially execute following processes:
4354

4455
4. Building backward operator with `inputs`, `outputs` and forward operator's attributes.
4556

46-
## Backward Network Building
47-
48-
A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and put them together.
57+
### Backward Network Building
4958

50-
In our design, the network itself is also a kind of operator. So the operators contained by a big network may be some small network.
51-
52-
given a forward network, it generates the backward network. We only care about the Gradients—`OutputGradients`, `InputGradients`.
59+
A backward network is a series of backward operators. The main idea of building a backward network is creating backward operators in the inverted sequence and append them together one by one. There is some corner case need to process specially.
5360

5461
1. Op
5562

56-
when the input forward network is an Op, return its gradient Operator Immediately.
63+
When the input forward network is an Op, return its gradient Operator Immediately. If all of its outputs are in no gradient set, then return a special `NOP`.
5764

5865
2. NetOp
5966

60-
when the input forward network is a NetOp, it needs to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to the forward NetOp.
67+
In our design, the network itself is also a kind of operator(**NetOp**). So the operators contained by a big network may be some small network. When the input forward network is a NetOp, it needs to call the sub NetOp/Operators backward function recursively. During the process, we need to collect the `OutputGradients` name according to the forward NetOp.
68+
69+
3. RnnOp
70+
71+
RnnOp is a nested stepnet operator. Backward module need to recusively call `Backward` for every stepnet.
72+
73+
4. Sharing Variables
74+
75+
**sharing variables**. As illustrated in the pictures, two operator's share the same variable name of W@GRAD, which will overwrite their sharing input variable.
76+
77+
<p align="center">
78+
<img src="./images/duplicate_op.png" width="50%" ><br/>
6179

62-
**shared variable**. As illustrated in the pictures, two operator's `Output` `Gradient` will overwrite their shared input variable.
80+
​ pic 1. Sharing variables in operators.
6381

64-
<p align="center">
65-
<img src="./images/duplicate_op.png" width="50%" ><br/>
82+
</p>
6683

67-
1. Shared variable in operators.
84+
​ Sharing variable between operators or same input variable used in multiple operators leads to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively and add a generic add operator to replace the overwrite links.
6885

69-
</p>
86+
<p align="center">
87+
<img src="images/duplicate_op2.png" width="40%" ><br/>
7088

71-
Share variable between operators or same input variable used in multiple operators leads to a duplicate gradient variable. As demo show above, we need to rename gradient name recursively and add a generic add operator replace the overwrite links.
89+
​ pic 2. Replace sharing variable's gradient with `Add` operator.
7290

73-
<p align="center">
74-
<img src="images/duplicate_op2.png" width="50%" ><br/>
91+
</p>
7592

76-
2. Replace shared variable's gradient with `Add` operator.
93+
​ Because our framework finds variables accord to their names, we need to rename the output links. We add a suffix of number to represent its position in clockwise.
7794

78-
</p>
95+
5. Part of Gradient is Zero.
7996

97+
In the whole graph, there is some case of that one operator's gradient is not needed, but its input's gradient is a dependency link of other operator, we need to fill a same shape gradient matrix in the position. In our implement, we insert a special `fillZeroLike` operator.
8098

8199

82-
​ Then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it.
100+
Follow these rules above, then collect the sub graph `OutputGradients`/`InputGradients` as the NetOp's and return it.
177 Bytes
Binary file not shown.
355 Bytes
Loading

0 commit comments

Comments
 (0)