PaddlePaddle · velconia · Jun 7, 2018 · Jun 8, 2018 · Jun 13, 2018 · Jun 28, 2018
diff --git a/doc/fluid/design/dist_train/fluid_parameter_split_strategy_cn.md b/doc/fluid/design/dist_train/fluid_parameter_split_strategy_cn.md
@@ -0,0 +1,67 @@
+# Fluid 分布式训练模型参数切分策略详解
+本篇文章将说明, 在使用 PaddlePaddle Fluid 进行基于 Parameter Server 的分布式训练时, 模型参数的切分方案设计, 并且举了一个如何应用这种切分方案的简单例子;
+
+## 模型参数切分策略设计
+### 切分原因
+
+在模型设计时, 我们通常不会限制模型各层使用的参数大小, 假设我们现在有3台参数服务器, 并且要训练如下的网络:
+
+![fluid_3_layer_network](src/fluid_3_layers_network.png)
+
+fluid.input 层非常宽, 导致 w1, b1 参数维度非常的大, 达到了 10 * 1000, 而 fluid.fc 层非常窄, 导致 w2, b2 参数维度特别小, 只有 1 * 10. 
+
+如果我们只是简单的将这些参数分配到参数服务器上, 会导致每个参数服务器拿到的参数大小并不均匀, 负载较轻的参数服务器等待负载较重的参数服务器;
+所以针对这种参数大小不均匀的情况, 在Distribute Transpiler中, 我们会对模型的参数和对应的梯度进行切分, 参数和梯度在切分后变为一个或多个参数块.
+
+### 切分方式
+
+在切分参数时, 如果切分的粒度过细会导致参数服务器的计算效率不高, 但如果切分的粒度过大又无法做到参数的均匀分配;
+所以为了在切分时控制粒度, 针对每个参数或梯度, 我们都会计算两个值, 最大切分数量和期望切分数量:
+
+* 最大切分数量
+
+为了避免切分粒度过细, 我们拟定了一个最小的参数块大小: 8192; 
+我们会对 参数大小 / 最小参数块大小 的结果向上取整, 得到这个参数的最大切分数量;
+所以在上面的例子中, 我们会得到的最大切分数量是 2;
+
+* 期望切分数量
+
+为了做到参数完全平均分配到每一个参数服务器上, 我们将参数服务器的总数作为期望切分数量;
+所以在上面的例子中, 我们会得到的期望切分数量是: 3;
+
+在计算完上述的两个值后, 我们会取两值中的较小值作为最后的切分数量, 确保在保证最小粒度的情况下, 参数被尽可能的平均分配了;
+在上面的例子中, 我们最后会将参数切分为2份;
+
+### 分配方式
+
+在将参数和梯度切分为多个参数块后, 我们还需要对将参数块均匀地分配到参数服务器上;
+
+我们现在支持两种简单而有效的参数块分配方式: [Round Robin](https://en.wikipedia.org/wiki/Round-robin_scheduling) 和 [Hash](https://en.wikipedia.org/wiki/Hash_function);
+
+在 Round Robin 模式中, 我们会 one-by-one 的将参数块分配到 Server 上;
+
+在 Hash 模式中, 我们会对参数块名称进行 Hash 操作然后对参数服务器总数取模, 得到具体的参数服务器id;
+
+### 整体切分流程
+
+至此, 我们对参数还有梯度的切分策略就结束了, 针对上面的例子, 我们会得到如下图所示的切分结果:
+
+![fluid_parameter_slice_up](src/fluid_parameter_slice_up.png)
+
+
+## 模型参数切分用例
+### 分布式实现
+
+PaddlePaddle Fluid 分布式训练的具体实现方式可以参考 [Fluid Cluster Train](../../howto/cluster/fluid_cluster_train_cn.md)
+
+### 参数详解
+我们主要的参数策略实现在了 [Distribute Transpiler](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/transpiler/distribute_transpiler.py) 中, 我们可以在```transpile```方法中指定```slice_var_up=True```来开启模型参数切分, 并且可以使用```split_method=RoundRobin```来指定模型参数的分配方式, 示例代码如下:
+
+```python
+transpiler.transpile(
+	trainer_id=trainer_id,
+	slice_var_up=True,
+	split_method=RoundRobin,
+	pservers=pservers,
+	trainers=trainers)
+```
diff --git a/doc/fluid/design/dist_train/fluid_parameter_split_strategy_en.md b/doc/fluid/design/dist_train/fluid_parameter_split_strategy_en.md
@@ -0,0 +1,67 @@
+# Fluid distributed parameter segmentation strategy
+In this article, we'll explain the design of parameters segmentaion when we do pserver-based distributed training with PaddlePaddle Fluid, we will give a case of how this segmentation scheme could be used in python code;
+
+## Model Parameter Segmentation Strategy Design
+### Reason for segmentation
+
+In the design of the model, we usually do not limit the size of the parameters used by each layer of the model. Suppose we have 3 parameter servers now and we want to train the following network:
+
+![fluid_3_layer_network](src/fluid_3_layers_network.png)
+
+The fluid.input layer is very wide, causing the w1, b1 parameter dimensions to be very large, reaching 10 * 1000, while the fluid.fc layer is very narrow, resulting in a 1 * 10 dimension of the w2, b2 parameter.
+
+If we simply assign these parameters to the parameter server, the parameter size obtained by each parameter server will not be uniform, and the lightly loaded parameter server will wait for the parameter server with heavy load.
+Therefore, for the case of non-uniform size of the parameters, in the Distribute Transpiler, we will segment the parameters of the model and the corresponding gradients into one or more parameter blocks.
+
+### Segmentation
+
+Take into account the grain size of segmentation, if the segmentation is fine-grained, then the calculation efficiency of the parameter server will be low, but if the segmentation is too coarse-grained, even distribution of the parameters cannot be achieved;
+So in order to control the grain size at the time of segmentation, we will calculate two values, the maximum segmentation number and the desired segmentation number for each parameter or gradient:
+
+* The maximum number of cuts
+
+In order to avoid the fine-grained granularity, we have formulated a minimum parameter block size: 8192;
+We will round up the result of parameter size / minimum parameter block size, and get the maximum number of segmentation of this parameter;
+In the above example, the maximum number of segmentation is 2;
+
+* Expected number of cuts
+
+In order to achieve an even distribution of parameters to each parameter server, we use the total number of parameter servers as the desired number of partitions;
+In the above example, the expected number of segmentation is 3;
+
+After calculating the above two values, we will take the smaller of the two values as the final number of cuts, ensuring that the parameters are evenly distributed as far as possible while guaranteeing the minimum granularity;
+So in the above example, we will finally divide the parameters into 2 parts;
+
+### Partition
+
+After segment the parameters and gradients into multiple parameter blocks, we also need to evenly partition the parameter blocks to the parameter servers.
+
+Now, we support two simple and effective partition methods: [Round Robin](https://en.wikipedia.org/wiki/Round-robin_scheduling) and [Hash](https://en.wikipedia.org/ Wiki/Hash_function);
+
+In Round Robin mode, we will one-by-one partition the parameter block to the Server;
+
+In Hash mode, we will perform Hash operation on parameter block names and then modulo the total number of parameter servers to obtain a specific parameter server id;
+
+### Overall Segmentation Process
+
+At this point, our strategy for segmenting parameters and gradients is over. For the above example, we will get the segmentation result as shown in the following figure:
+
+![fluid_parameter_slice_up](src/fluid_parameter_slice_up.png)
+
+
+## Model Parameter Segmentation Use Case
+### Distributed Implementation
+
+Specific implementation of PaddlePaddle Fluid distributed training can refer to [Fluid Cluster Train](../../howto/cluster/fluid_cluster_train_cn.md)
+
+### Parameter details
+Our main parameter strategy is implemented in [Distribute Transpiler] (https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/transpiler/distribute_transpiler.py), we can use the ```transpile``` method specifies ```slice_var_up=True``` to enable model parameter segmentation, and ```split_method=RoundRobin``` can be used to specify the partition of model parameters. Followings are the sample code:
+
+```python
+transpiler.transpile(
+	trainer_id=trainer_id,
+	slice_var_up=True,
+	split_method=RoundRobin,
+	pservers=pservers,
+	trainers=trainers)
+```
diff --git a/doc/fluid/design/dist_train/index_cn.rst b/doc/fluid/design/dist_train/index_cn.rst
@@ -7,3 +7,4 @@
   distributed_architecture.md
   distributed_lookup_table_design.md
   parameter_server.md
+  fluid_parameter_split_strategy_cn.md
diff --git a/doc/fluid/design/dist_train/index_en.rst b/doc/fluid/design/dist_train/index_en.rst
@@ -7,3 +7,4 @@ Distributed Training
   distributed_architecture.md
   distributed_lookup_table_design.md
   parameter_server.md
+  fluid_parameter_split_strategy_en.md
diff --git a/doc/fluid/design/dist_train/src/fluid_3_layers_network.png b/doc/fluid/design/dist_train/src/fluid_3_layers_network.png
diff --git a/doc/fluid/design/dist_train/src/fluid_parameter_slice_up.png b/doc/fluid/design/dist_train/src/fluid_parameter_slice_up.png
diff --git a/doc/v2/dev/write_docs_cn.rst b/doc/v2/dev/write_docs_cn.rst
@@ -76,8 +76,8 @@ PaddlePaddle.org工具可以配合Docker使用，需要在系统里先安装好D
    docker build -t paddle:dev .
    docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=OFF" -e "WITH_DOC=ON" paddle:dev /bin/bash
 
-   # 进入Docker容器后使用build.sh脚本构建PaddlePaddle文档
-   bash -x /paddle/paddle/scripts/docker/build.sh
+   # 进入Docker容器后使用paddle_build.sh脚本构建PaddlePaddle文档
+   bash -x /paddle/paddle/scripts/paddle_build.sh build
 
 注：上述命令把当前目录（源码根目录）映射为 container 里的 :code:`/paddle` 目录。
 

diff --git a/doc/v2/dev/write_docs_en.rst b/doc/v2/dev/write_docs_en.rst
@@ -68,7 +68,7 @@ Please `click here <https://github.com/PaddlePaddle/PaddlePaddle.org/blob/develo
 Manually Building the Documentation
 -------------------------------------
 
-Build PaddlePaddle's documentation with Docker，you need to install Docker first. Please refer to `Docker's official website <https://docs.docker.com/>`_ on how to install Docker. This method is quite similar to ` Build From Sources <http://paddlepaddle.org/docs/develop/documentation/en/build_and_install/build_from_source_en.html>`_ , by constructing, from source code, a docker image that can be used to build PaddlePaddle documentation. Enter the Docker container and use the script ``build.sh`` in the source directory to build the PaddlePaddle documentation. The specific steps are as follows:
+Build PaddlePaddle's documentation with Docker，you need to install Docker first. Please refer to `Docker's official website <https://docs.docker.com/>`_ on how to install Docker. This method is quite similar to ` Build From Sources <http://paddlepaddle.org/docs/develop/documentation/en/build_and_install/build_from_source_en.html>`_ , by constructing, from source code, a docker image that can be used to build PaddlePaddle documentation. Enter the Docker container and use the script ``paddle_build.sh`` in the source directory to build the PaddlePaddle documentation. The specific steps are as follows:
 
 .. code-block:: bash
 
@@ -79,8 +79,8 @@ Build PaddlePaddle's documentation with Docker，you need to install Docker firs
    docker build -t paddle:dev .
    docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_TESTING=OFF" -e "WITH_DOC=ON" paddle:dev /bin/bash
 
-   # Use build.sh to build PaddlePaddle documentation
-   bash -x /paddle/paddle/scripts/docker/build.sh
+   # Use paddle_build.sh to build PaddlePaddle documentation
+   bash -x /paddle/paddle/scripts/paddle_build.sh build
 
 Note: The above commands maps the current directory (source root directory) to the :code:`/paddle` directory in the container.