The results are known to be quite different from official ROUGE scoring script. It has been discussed here: https://github.com/google/seq2seq/issues/89