Skip to content

Conversation

@chenwhql
Copy link
Contributor

@chenwhql chenwhql commented Nov 12, 2020

PR types

New features

PR changes

Others

Describe

Add basic hook classes for dygraph & implement reduce hook

执行逻辑设计

  1. 由前向VarBase拿到前向VariableWrapper, 通过VariableWrapper的接口注册LeafGradHook

    void AddGradVarLeafBackwardHook(std::unique_ptr<GradAccumulatorPostHook>&& hook) {
        auto leaf_hooks = GetGradVarLeafHooksSafely();
        leaf_hooks->add_backward_hook(std::move(hook));
    }
    
  2. 反向执行Engine准备执行环境时将hook关联到GradientAccumulator

    if (var->HasLeafHooks()) {
      VLOG(3) << "Grad variable wrapper (" << var->Name() <<") has leaf grad hooks.";
        PADDLE_ENFORCE_NE(
            var->HasGradNode(), true,
            platform::errors::PermissionDenied(
                "Only leaf Tensor's gradient can append hook to Gradientaccumulator."));
        accumulator->SetPostHooks(var->GetLeafHooks());
    }
    
  3. 当反向执行梯度累加完成时,执行关联的hook

// 无需梯度累加时
for (auto* accumulator : no_need_run_accumulators_) {
 if (accumulator->HasPostHooks()) {
    accumulator->CallPostHooks();
  }
}
// 需要梯度累加时
void IncreaseCurCnt() {
  ++cur_cnt_;
  VLOG(3) << "IncreaseCurCnt: cur_cnt " << cur_cnt_ << ", ref_cnt " << ref_cnt_;
  // After all tmp gradient being accumulated to grad var, run hooks
  if (AccumulateCompleted() && HasPostHooks()) {
    CallPostHooks();
  }
}

简单hook示例

// need impl in C++
// w is a parameter std::shared_ptr<VarBase>
auto w_shared = w->SharedVar();
w_shared->->AddGradVarLeafBackwardHook(
      std::unique_ptr<LambdaGradAccumulatorPostHook>(
          new LambdaGradAccumulatorPostHook([=](VariableWrapper* grad) {
            auto* grad_tensor =
                grad->MutableVar()->GetMutable<framework::LoDTensor>();
            for (int i = 0; i < grad_tensor->numel(); ++i) {
              grad_tensor->mutable_data<float>(place)[i] *= 2.0;
            }
          })));

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@chenwhql chenwhql requested review from hutuxian and phlrain November 13, 2020 08:42
}

private:
std::vector<std::unique_ptr<GradAccumulatorPostHook>> hooks_;
Copy link
Contributor

@zhwesky2010 zhwesky2010 Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be it can call 'leaf_var_hooks_' , and can be better distinguished from 'backward_hooks_' . After all, both of them are hooks for backward. Isn't 'backward_hooks_' here for Allreduce/Reduce only?

Copy link
Contributor Author

@chenwhql chenwhql Nov 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. my opinion: the class name LeafVarHookPackage alreaady hold theleaf var info, the hooks in LeafVarHookPackage are leaf_var_hooks_ , using long member name cause information redundancy and also make the interface name longer, such as LeafVarHookPackage.add_leaf_var_hook()
  2. backward_hooks_ mean the hooks of whole backward process, because it relay on leaf var, so we can only put it here now, may be we should add AccumulateGrad dummy OpNode and move backward_hooks_ outside, I wiil perfect the comments here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And backward_hooks_ may not only used for Allreduce/Reduce, we should keep scalability here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

<< ref_cnt_;
// After all tmp gradient being accumulated to grad var, run hooks
if (AccumulateCompleted() && HasPostHooks()) {
CallBackwardPostHooks();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here call backward_hooks_, how about when AccumulateCompleted, first call_hooks_ , then gradient_accumulation between batch, last call backward_hooks_ .
So We must have two function: CallPostHooks, and CallBackwardPostHooks. And this can changed after this PR merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree

}

private:
std::vector<std::unique_ptr<GradAccumulatorPostHook>> hooks_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hutuxian
Copy link
Contributor

LGTM for reduce hook part

@chenwhql chenwhql merged commit 7eeb99f into PaddlePaddle:develop Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants