Skip to content
Merged
Changes from 34 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
ea9eed0
add block
Superjomn Aug 27, 2017
7496f8b
add scope
Superjomn Aug 30, 2017
e69aa44
add some interfaces
Superjomn Aug 30, 2017
f1da88d
Merge branch 'block_design.md' of github.com:Superjom/Paddle into blo…
Superjomn Aug 30, 2017
6448339
rename VarDescLib to VarDescScope
Superjomn Sep 2, 2017
124dd0c
add SetScope interface
Superjomn Sep 2, 2017
e55a3d8
fix most spells
Superjomn Sep 2, 2017
d84bde3
add functions
Superjomn Sep 2, 2017
8ab4951
update
Superjomn Sep 2, 2017
2253219
update
Superjomn Sep 2, 2017
da4f2a6
rewrite block design
Superjomn Sep 4, 2017
64ed5dd
Merge branch 'block_design.md' of github.com:Superjom/Paddle into blo…
Superjomn Sep 4, 2017
247f4a9
fix grammer
Superjomn Sep 4, 2017
6e62cb2
fix grammer
Superjomn Sep 4, 2017
c59e697
move ScopeInit to protected
Superjomn Sep 4, 2017
3f3de4a
update
Superjomn Sep 4, 2017
6b6679a
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into block_d…
Superjomn Sep 4, 2017
5acd4bf
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into block_d…
Superjomn Sep 5, 2017
49606ad
add prememory to demo
Superjomn Sep 5, 2017
6672878
rewrite block
Superjomn Sep 6, 2017
1c729e3
add IsValid
Superjomn Sep 6, 2017
2a06c47
remote python
Superjomn Sep 6, 2017
c9c0898
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into block_d…
Superjomn Sep 11, 2017
b0785f6
remove scope from RuntimeTable
Superjomn Sep 11, 2017
6ab72db
clean grammer
Superjomn Sep 11, 2017
920a66f
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into block_d…
Superjomn Sep 11, 2017
b09d5db
update
Superjomn Sep 11, 2017
37b285a
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into block_d…
Superjomn Sep 12, 2017
f2e0a5e
delete RuntimeTable
Superjomn Sep 12, 2017
91888a9
replace father with parent
Superjomn Sep 12, 2017
69d44ca
subblock
Superjomn Sep 12, 2017
4a15597
rename pre_memory to pre
Superjomn Sep 12, 2017
f037b6e
add proto message description
Superjomn Sep 13, 2017
b9d9c2d
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into block_d…
Superjomn Sep 13, 2017
84850d3
Update block deisgn
Sep 17, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
215 changes: 215 additions & 0 deletions doc/design/block.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
# Design Doc: Use Block in RNNOp, While Op, IfElseOp

In C++ and Java programming language, a block is a lexical structure of source code which is grouped as one line of code.

RNNOp looks like the loop structure in programming languages.
And similarly, WhileOp and IfElseOp are like loop and conditions respectively.
So we want to verify if we should have a class Block in PaddlePaddle that works like a pair of curly braces in the loop and condition structures of programming languages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start from how RNN is described using PaddlePaddle:

v = some_op()
m_boot = some_op()

W = pd.Variable(shape=[20, 20])
U = pd.Variable(shape=[20, 20])

rnn0 = RNNOp()
with rnn0.stepnet() as net:
    # declare stepnet's inputs
    x = net.add_input(v)
    # declare memories
    h = net.add_memory(m_boot)

    fc_out = pd.matmul(W, x)
    hidden_out = pd.matmul(U, h)
    sum = pd.add_two(fc_out, hidden_out)
    act = pd.sigmoid(sum)

    # declare stepnet's outputs
    net.add_output(act, hidden_out)

acts, hs = rnn0()

Blocks do not only group source code, but also narrow the lexical scope of variables so that they do not conflict with variables having the same name used elsewhere in a program.

In Paddle, we need a similar concept called Block to support following scenes:

- define a PaddlePaddle program by writing blocks of codes, which includes the definitions of variables and operators.
- `RNNOp`, `SwitchOp`, `WhileOp` and `IfElseOp`, etc, need Block to help to define sub-block.
- help to execute multiple operators, blocks should group operators and runs like a single operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is not reasonable.
According to the design,

a block is a lexical structure of source code which is grouped as one line of code

We already differentiate the static phase and dynamic phase. In the words declared above, Block is designed for static phase. So it has nothing to do with dynamic phase. which means, block only group operators, but does not know their execute order, be partitioned into two Block or not.

## How to use Block
In `RNNOp`, `SwitchOp`, `WhileOp` and `IfElseOp`, a with-statement should be used to help to define a sub-block.

Let's start from how a RNNOp is described using Block:

```python
v = some_op()
m_boot = some_op()

W = pd.Variable(shape=[20, 20])
U = pd.Varable(shape=[20, 20])

rnn = create_rnn()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No input specified here.

with rnn.stepnet() as net:
# declare the input variables that need to be segmented into steps
x = net.set_inputs(v)
# declare rnn's memory (state)
h = net.add_memory(init=m_boot)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need some explanation about how does add_memory works?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is writing this memory? How to specify what is memorized?

For example, is that h memorize hidden_out or act?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the current timestamp is t, how to get the output from t-2. Or more general, t-n?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be supported in the future. currently only n-1 is supported

or we can add pre_memory(n=2)


fc_out = pd.matmul(W, x)
hidden_out = pd.matmul(U, h.pre(n=1))
sum = pd.add_two(fc_out, hidden_out)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum is a Python keyword.

act = pd.sigmoid(sum)
h.update(act) # update memory

# declare outputs that needs to be merged across all the steps
net.set_outputs(act, hidden_out)

acts, hs = rnn()
```

The with-statement above describes a `RNNOp`'s stepnet as a block, this description will be transformed to a protobuf message as follows

```
BlockDesc RNNOp_stepnet {
vars = {
x {...}
h {...}
fc_out {...}
hidden_out {...}
sum {...}
act {...}
}

ops = {
matmul,
add_two,
sigmoid
}
};

RNNOpDesc rnn {
inputs = {x};
outputs = {act, hidden_out};
attrs { memories={h} };
stepnet {RNNOp_stepnet};
};
```

and pass it to a C++ Block, the C++ Block will create the Variables and Operators.


## Block Implementation

During the generation of the Protobuf message, the Block should store VarDesc (the Protobuf message which describes Variable) and OpDesc (the Protobuf message which describes Operator).

VarDesc in a block should have its name scope to avoid local variables affect parent block's name scope.
Child block's name scopes should inherit the parent's so that OpDesc in child block can reference a VarDesc that stored in parent block. For example

```python
a = pd.Varaible(shape=[20, 20])
b = pd.fc(a, params=["fc.w", "fc.b"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should passing weight and bias to fc like that.
But for a demostration of Scope, it is OK.

Maybe adding a TODO or FIXME mark, to change that code to real one.


rnn = pd.create_rnn()
with rnn.stepnet() as net:
x = net.set_inputs(a)
# reuse fc's parameter
fc_without_b = pd.get_variable("fc.w")
net.set_outputs(fc_without_b)

out = rnn()
```
the method `pd.get_variable` can help retrieve a Variable by a name, a Variable may store in a parent block, but might be retrieved in a child block, so block should have a variable scope that supports inheritance.

In compiler design, the symbol table is an data structure created and maintained by compilers in order to store information about the occurrence of various entities such as variable names, function names, classes, etc.

To store the definition of Variables and Operators, a C++ class `SymbolTable` is introduced as a similar concept with compiler's symbol table.

`SymbolTable` will has following functions:

- store the definitions (some names and attributes) of variables and operators,
- to verify if a variable name has been declared,
- to make it possible to implement type checking (offer Protobuf message pointers to `InferShape` handlers).


```c++
// Information in SymbolTable is enough to trace the dependency graph. So maybe
// the Eval() interface takes a SymbolTable is enough.
class SymbolTable {
public:
SymbolTable(SymbolTable* parent) : parent_(parent) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should SymbolTable have parent even in compiler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scope has parent, SymbolTable stores VarDesc, and VarDesc may exists in SymbolTable's parent.


OpDesc* NewOp(const string& name="");

// TODO determine whether name is generated by python or C++
// currently assume that a unique name will be generated by C++ if the
// argument name left default.
VarDesc* NewVar(const string& name="");

// find a VarDesc by name, if recursive true, find parent's SymbolTable
// recursively.
// this interface is introduced to support InferShape, find protobuf messages
// of variables and operators, pass pointers into InferShape.
// operator
//
// NOTE maybe some C++ classes such as VarDescBuilder and OpDescBuilder should
// be proposed and embedded into pybind to enable python operate on C++ pointers.
VarDesc* FindVar(const string& name, bool recursive=true);

OpDesc* FindOp(const string& name);
Copy link
Contributor

@helinwang helinwang Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why FindOp does not have recursive parameter? (it looks very similar to FindVar, and FindVar has the parameter)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variables in parent block may be referenced by child block, but op won't.

So variables need a recursive Find, but op doesn't need.


BlockDesc Compile() const;

private:
SymbolTable* parent_;

map<string, OpDesc> ops_;
map<string, VarDesc> vars_;
};
```

After all the description of variables and operators is added into SymbolTable,
the block has enough information to run.

The `Block` class takes a `BlockDesc` as input, and provide `Run` and `InferShape` functions.


```c++
namespace {

class Block : OperatorBase {
public:
Block(const BlockDesc& desc) desc_(desc) {}

void InferShape(const framework::Scope& scope) const override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does InferShape do exactly? Does it just allocate temporary variables?

if (!symbols_ready_) {
CreateVariables(scope);
CreateOperators();
}
// should run InferShape first.
for (auto& op : runtime_table_.ops()) {
op->InferShape(scope);
}
}

void Run(const framework::Scope& scope,
Copy link
Contributor

@helinwang helinwang Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To harness multiple CPU cores, we need a scheduler to run the OPs (none interdependent OPs should be able to be scheduled concurrently on different thread in a thread pool).

However, making Run a method of Block, we are limiting ourselves to run the OPs sequentially. Feels like we should have a scheduler who runs the block (or graph), in my opinion Run should not be a method of Block. Block should be just a description of what to run, rather than how to run it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may add a DependencyEngine latter, that will make block paralelly run.

const platform::DeviceContext& dev_ctx) const override {
Copy link
Contributor

@helinwang helinwang Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if ops from a block require to run on different device contexts? (e.g., one OP can run on CPU only, other OPs must run on GPU).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a historical issue, Block inherent from OperatorBase, so the same interface of Run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think multi-device is a very important feature and we should take this into consideration at current desgin.
The basic elements of Block(or Graph) is actually Nodes and Edges. Node has a device attribute to decide which device to run. Device includes CPU/MKL/CUDA/FPGA/etc. Some developers from inf will contribute FPGA code for refactoring Paddle. And FPGA is not suitable for all operators, and will only optimize for specific operators. So, multi-device(CPU/FPGA) is needed.
Please refer to #3943

And, What's the relationship between these two concepts, Block and Graph? If Block is Graph, and the private member of Block should be Nodes and Edges. The Graph mainly describes data dependency and control dependency. And Graph will run by an Executor, the executor will support multi-device executing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to discuss this in another issue.

PADDLE_ENFORCE(symbols_ready_, "operators and variables should be created first.");
for (auto& op : runtime_table_.ops()) {
op->Run(scope, dev_ctx);
}
}

void CreateVariables(const framework::Scope& scope);
void CreateOperators();

// some other necessary interfaces of NetOp are list below
// ...

private:
BlockDesc desc_;
bool symbols_ready_{false};
};
```

## Run and Eval targets
Block inherits from OperatorBase, which has a Run method.
Block's Run method will run its operators sequentially.

There is another important interface called `Eval`, which take some arguments called targets, and generate a minimal graph which takes targets as the end points and creates a new Block,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we need a function maybe called Prune which generate a subgraph (or maybe block) from a graph. And then call Run to run it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, I will add a function called Prune

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to mention why Eval is needed and in what case it is used.

after `Run`, `Eval` will get the latest value and return the targets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we unify eval and run into one method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eval will create a new block, and Run it, so split Eval and Run seems better.


The definition of Eval is as follows:

```c++
// clean a block description by targets using the corresponding dependency graph.
// return a new BlockDesc with minial number of operators.
// NOTE not return a Block but the block's description so that this can be distributed
// to a cluster.
BlockDesc Prune(const BlockDesc& desc, vector<string> targets);

void Block::Eval(const vector<string>& targets,
const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) {
BlockDesc min_desc = Prune(desc_, targets);
Block min_block(min_desc);
min_block.Run(scope, dev_ctx);
}
```