Related to #7402 and #7488
The greedy bi-partite matching operator is used in objection detection to obtain the matching with maximum distance, especially in SSD algorithm. The doc in TensorFlow is very good. But our implementation is different from TF, we need to support batched input which is a LoDTensor.