Skip to content

Conversation

wanghaox
Copy link
Contributor

resolve #5788

Add CPU forward and backward kernel code
Add GPU forward and backward kernel code
Add unitest

wangkuiyi
wangkuiyi previously approved these changes Nov 22, 2017
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://google.github.io/styleguide/cppguide.html#Constant_Names, constants should be named as

constexpr int kPaddleOperatorsROIPoolNumCUDAThreads = 512;

Also, I vaguely remember (but not sure) that we can declare this variable static so to limit its usage within this file, and we can have a much shorter name for it:

static constexpr int kNumCUDAThreads = 512;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of __FLT_MAX__ is not a good idea because it is a GCC-specific predefined macro. If we are using another compiler other than GCC, it is probable that we don't have __FLT_MAX__ defined.

It seems to me that the solution could be using standard C library

#include <float.h>
printf("%f", FLT_MAX);

or the standard C++ library

#include <limits>
printf("%f", numeric_limits<float>::max());

This problem doesn't block the merge of this PR. I created an issue reminding us to fix it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that you might want to mark this function returning a constant value, but according to the Google C++ Style Guide, we should name functions something like

static inline int NumBlocks(const int N) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that in many places of our codebase, we haven't strictly followed the English writing rule that acronyms must be capitalized, but let us start doing so from here by changing Roi into ROI. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add a comment about the magic number 5, or define it a constexpr value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wangkuiyi
Copy link
Collaborator

Is this ROIPool operator being added into CMakeLists.txt? @wanghaox

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be better to add dim ENFORCE of input before or in inferShape since the usage of in_dim.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar with in_dims, might it be better to add dim ENFORCE before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be better to declare what each of [batch_id, x1, y1, x2, y2] represents for. And some spelling like Should should be correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addition of the shape and meaning(like feature maps from conv) of X might be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be better to set shape in inferShape rather than shape resizing in here, but I am not sure whether actual shapes are necessary for inferShape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "The output of ROIPoolOp is a 4-D tensor with shape" is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the CPU kernel, might it be better to add dim ENFORCE before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about deleting the return in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about deleting the return in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the CPU kernel, might it be better to add dim ENFORCE before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

Copy link
Contributor Author

@wanghaox wanghaox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added at ROIPoolOp::InferShape()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python needs the '\'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ops::CPUROIPoolOpKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(
roi_pool_grad,
ops::CPUROIPoolGradOpKernel<paddle::platform::CPUPlace, float>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to register double type. And please check whether the slice_sequence_op registered double type, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, slice_sequence_op will be fixed in another commit.

#pragma once
#include "paddle/framework/op_registry.h"
#include "paddle/operators/math/math_function.h"
#include "paddle/operators/strided_memcpy.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header file is not used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
using LoD = framework::LoD;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments: #5826 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Define an empty pooling region to be zero
bool is_empty = (hend <= hstart) || (wend <= wstart);
output_data[pool_index] =
is_empty ? 0 : -std::numeric_limits<float>::max();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-std::numeric_limits<T>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

int64_t* argmax_data = argmax->mutable_data<int64_t>(ctx.GetPlace());

math::SetConstant<Place, T> set_zero;
set_zero(ctx.device_context(), out, static_cast<T>(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to set out zero here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"The format of input tensor is NCHW.");
PADDLE_ENFORCE(rois_dims.size() == 2,
"ROIs should be a 2-D tensor of shape (num_rois, 5)"
"given as [[batch_id, x1, y1, x2, y2], …].");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs to check rois_dims[1] == kROISize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

set_zero(ctx.device_context(), out, static_cast<T>(0));
argmax->mutable_data<int64_t>(ctx.GetPlace());
math::SetConstant<Place, int64_t> set_init;
set_init(ctx.device_context(), argmax, static_cast<int64_t>(-1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to set out zero and set argmax -1 here. For GPU, will launch two kernels twice here. Please remove these and handle them in GPUROIPoolForward correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (x_grad) {
x_grad->mutable_data<T>(ctx.GetPlace());
math::SetConstant<Place, T> set_zero;
set_zero(ctx.device_context(), x_grad, static_cast<T>(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, there is no need to set zero here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked, bp needs to set zero.

ops::GPUROIPoolOpKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(
roi_pool_grad,
ops::GPUROIPoolGradOpKernel<paddle::platform::GPUPlace, float>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register double type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @guoshengCS said, why the code here is not consistant with C++:

// Define an empty pooling region to be zero
bool is_empty = (hend <= hstart) || (wend <= wstart);
output_data[pool_index] = is_empty ? 0 : -std::numeric_limits<float>::max();

Copy link
Contributor Author

@wanghaox wanghaox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update code

"The format of input tensor is NCHW.");
PADDLE_ENFORCE(rois_dims.size() == 2,
"ROIs should be a 2-D tensor of shape (num_rois, 5)"
"given as [[batch_id, x1, y1, x2, y2], …].");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ops::CPUROIPoolOpKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(
roi_pool_grad,
ops::CPUROIPoolGradOpKernel<paddle::platform::CPUPlace, float>);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, slice_sequence_op will be fixed in another commit.

static inline int NumBlocks(const int N) {
return std::min((N + kNumCUDAThreads - 1) / kNumCUDAThreads,
kNumMaxinumNumBlocks);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ops::GPUROIPoolOpKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(
roi_pool_grad,
ops::GPUROIPoolGradOpKernel<paddle::platform::GPUPlace, float>);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#pragma once
#include "paddle/framework/op_registry.h"
#include "paddle/operators/math/math_function.h"
#include "paddle/operators/strided_memcpy.h"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

set_zero(ctx.device_context(), out, static_cast<T>(0));
argmax->mutable_data<int64_t>(ctx.GetPlace());
math::SetConstant<Place, int64_t> set_init;
set_init(ctx.device_context(), argmax, static_cast<int64_t>(-1));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (x_grad) {
x_grad->mutable_data<T>(ctx.GetPlace());
math::SetConstant<Place, T> set_zero;
set_zero(ctx.device_context(), x_grad, static_cast<T>(0));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked, bp needs to set zero.


using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;
using LoD = framework::LoD;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

int64_t* argmax_data = argmax->mutable_data<int64_t>(ctx.GetPlace());

math::SetConstant<Place, T> set_zero;
set_zero(ctx.device_context(), out, static_cast<T>(0));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@wanghaox wanghaox merged commit 0690cca into PaddlePaddle:develop Nov 24, 2017
@wanghaox wanghaox deleted the roi_pool branch November 24, 2017 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ROI Pooling operator

4 participants