Add basic hook classes for dygraph & implement reduce hook #28584

chenwhql · 2020-11-12T11:31:31Z

PR types

New features

PR changes

Others

Describe

Add basic hook classes for dygraph & implement reduce hook

执行逻辑设计

由前向VarBase拿到前向VariableWrapper, 通过VariableWrapper的接口注册LeafGradHook

void AddGradVarLeafBackwardHook(std::unique_ptr<GradAccumulatorPostHook>&& hook) {
    auto leaf_hooks = GetGradVarLeafHooksSafely();
    leaf_hooks->add_backward_hook(std::move(hook));
}

反向执行Engine准备执行环境时将hook关联到GradientAccumulator

if (var->HasLeafHooks()) {
  VLOG(3) << "Grad variable wrapper (" << var->Name() <<") has leaf grad hooks.";
    PADDLE_ENFORCE_NE(
        var->HasGradNode(), true,
        platform::errors::PermissionDenied(
            "Only leaf Tensor's gradient can append hook to Gradientaccumulator."));
    accumulator->SetPostHooks(var->GetLeafHooks());
}

当反向执行梯度累加完成时，执行关联的hook

// 无需梯度累加时
for (auto* accumulator : no_need_run_accumulators_) {
 if (accumulator->HasPostHooks()) {
    accumulator->CallPostHooks();
  }
}
// 需要梯度累加时
void IncreaseCurCnt() {
  ++cur_cnt_;
  VLOG(3) << "IncreaseCurCnt: cur_cnt " << cur_cnt_ << ", ref_cnt " << ref_cnt_;
  // After all tmp gradient being accumulated to grad var, run hooks
  if (AccumulateCompleted() && HasPostHooks()) {
    CallPostHooks();
  }
}

简单hook示例

// need impl in C++
// w is a parameter std::shared_ptr<VarBase>
auto w_shared = w->SharedVar();
w_shared->->AddGradVarLeafBackwardHook(
      std::unique_ptr<LambdaGradAccumulatorPostHook>(
          new LambdaGradAccumulatorPostHook([=](VariableWrapper* grad) {
            auto* grad_tensor =
                grad->MutableVar()->GetMutable<framework::LoDTensor>();
            for (int i = 0; i < grad_tensor->numel(); ++i) {
              grad_tensor->mutable_data<float>(place)[i] *= 2.0;
            }
          })));

paddle-bot-old · 2020-11-12T11:31:35Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zhwesky2010 · 2020-11-13T14:32:38Z

paddle/fluid/imperative/hooks.h

+  }
+
+ private:
+  std::vector<std::unique_ptr<GradAccumulatorPostHook>> hooks_;


May be it can call 'leaf_var_hooks_' , and can be better distinguished from 'backward_hooks_' . After all, both of them are hooks for backward. Isn't 'backward_hooks_' here for Allreduce/Reduce only?

my opinion: the class name LeafVarHookPackage alreaady hold theleaf var info, the hooks in LeafVarHookPackage are leaf_var_hooks_ , using long member name cause information redundancy and also make the interface name longer, such as LeafVarHookPackage.add_leaf_var_hook()

backward_hooks_ mean the hooks of whole backward process, because it relay on leaf var, so we can only put it here now, may be we should add AccumulateGrad dummy OpNode and move backward_hooks_ outside, I wiil perfect the comments here

And backward_hooks_ may not only used for Allreduce/Reduce, we should keep scalability here

zhwesky2010 · 2020-11-17T03:49:59Z

paddle/fluid/imperative/gradient_accumulator.h

+            << ref_cnt_;
+    // After all tmp gradient being accumulated to grad var, run hooks
+    if (AccumulateCompleted() && HasPostHooks()) {
+      CallBackwardPostHooks();


Here call backward_hooks_, how about when AccumulateCompleted, first call_hooks_ , then gradient_accumulation between batch, last call backward_hooks_ .
So We must have two function: CallPostHooks, and CallBackwardPostHooks. And this can changed after this PR merged.

yes, I agree

zhwesky2010 · 2020-11-17T03:50:12Z

paddle/fluid/imperative/hooks.h

+  }
+
+ private:
+  std::vector<std::unique_ptr<GradAccumulatorPostHook>> hooks_;


zhwesky2010

LGTM

hutuxian · 2020-11-17T13:43:17Z

LGTM for reduce hook part

add base hook classes and reduce hook impl

44a6bed

chenwhql added 3 commits November 12, 2020 12:21

fix constructor typo

a588f28

polish comment format

7b5683e

refactor baisc hook class design

f9f44c5

chenwhql requested review from hutuxian and phlrain November 13, 2020 08:42

zhwesky2010 reviewed Nov 13, 2020

View reviewed changes

polish design details

a1cdd52

zhwesky2010 reviewed Nov 17, 2020

View reviewed changes

zhwesky2010 approved these changes Nov 17, 2020

View reviewed changes

hutuxian approved these changes Nov 17, 2020

View reviewed changes

phlrain approved these changes Nov 18, 2020

View reviewed changes

chenwhql merged commit 7eeb99f into PaddlePaddle:develop Nov 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add basic hook classes for dygraph & implement reduce hook #28584

Add basic hook classes for dygraph & implement reduce hook #28584

Uh oh!

chenwhql commented Nov 12, 2020 •

edited

Loading

Uh oh!

paddle-bot-old bot commented Nov 12, 2020

Uh oh!

zhwesky2010 Nov 13, 2020 •

edited

Loading

Uh oh!

chenwhql Nov 16, 2020 •

edited

Loading

Uh oh!

chenwhql Nov 16, 2020

Uh oh!

zhwesky2010 Nov 17, 2020

Uh oh!

zhwesky2010 Nov 17, 2020

Uh oh!

chenwhql Nov 17, 2020

Uh oh!

zhwesky2010 Nov 17, 2020

Uh oh!

zhwesky2010 left a comment

Uh oh!

hutuxian commented Nov 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add basic hook classes for dygraph & implement reduce hook #28584

Add basic hook classes for dygraph & implement reduce hook #28584

Uh oh!

Conversation

chenwhql commented Nov 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

Uh oh!

paddle-bot-old bot commented Nov 12, 2020

Uh oh!

zhwesky2010 Nov 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenwhql Nov 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenwhql Nov 16, 2020

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

chenwhql Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

zhwesky2010 left a comment

Choose a reason for hiding this comment

Uh oh!

hutuxian commented Nov 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chenwhql commented Nov 12, 2020 •

edited

Loading

zhwesky2010 Nov 13, 2020 •

edited

Loading

chenwhql Nov 16, 2020 •

edited

Loading