bcast_params should be properly implemented and called. Currently, a hack is used to test if ParallelExecutor is running distributed training and call it
num_trainers argument in ParallelExecutor is only valid for nccl-based distributed training. It's meaningless for pserver-based distributed training