Skip to content

Conversation

@lugimzzz
Copy link
Contributor

@lugimzzz lugimzzz commented Dec 21, 2022

PR types

New features

PR changes

APIs

Description

新增句子和字级别数据增强策略,新增单词级别词表。

  • 句子:相似句生成、回译、句子摘要、句子续写
  • 字:替换、删除、插入、交换
  • 词:新增基于word-embedding的近义词词表、反义词词表
  • 文档:新增文档输入增强。

@paddle-bot
Copy link

paddle-bot bot commented Dec 21, 2022

Thanks for your contribution!

@lugimzzz lugimzzz changed the title add sentence level data augmentation api add sentence & character level data augmentation api Dec 22, 2022
@lugimzzz lugimzzz requested a review from wawltor December 22, 2022 09:38
@codecov
Copy link

codecov bot commented Dec 22, 2022

Codecov Report

Merging #4194 (1b106e2) into develop (0b1d706) will increase coverage by 1.44%.
The diff coverage is 67.96%.

@@             Coverage Diff             @@
##           develop    #4194      +/-   ##
===========================================
+ Coverage    40.10%   41.54%   +1.44%     
===========================================
  Files          439      438       -1     
  Lines        61568    62142     +574     
===========================================
+ Hits         24689    25816    +1127     
+ Misses       36879    36326     -553     
Impacted Files Coverage Δ
paddlenlp/dataaug/base_augment.py 63.30% <21.95%> (+35.67%) ⬆️
paddlenlp/dataaug/word.py 67.24% <67.24%> (ø)
paddlenlp/dataaug/sentence.py 68.52% <68.52%> (ø)
paddlenlp/dataaug/char.py 73.38% <73.38%> (ø)
paddlenlp/dataaug/__init__.py 100.00% <100.00%> (ø)
paddlenlp/taskflow/task.py 35.90% <0.00%> (-0.43%) ⬇️
paddlenlp/transformers/roberta/modeling.py 89.81% <0.00%> (-0.37%) ⬇️
paddlenlp/transformers/gpt/modeling.py 78.42% <0.00%> (+0.20%) ⬆️
...lenlp/ops/fast_transformer/transformer/decoding.py 7.72% <0.00%> (+0.22%) ⬆️
paddlenlp/transformers/albert/modeling.py 84.63% <0.00%> (+0.23%) ⬆️
... and 21 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里需要把tests/dataaug加入https://github.com/PaddlePaddle/PaddleNLP/blob/develop/pyproject.toml#L15 ,单测才能跑起来

@@ -0,0 +1,103 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2022->2023

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@@ -0,0 +1,560 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2022->2023

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

@lugimzzz
Copy link
Contributor Author

这里需要把tests/dataaug加入https://github.com/PaddlePaddle/PaddleNLP/blob/develop/pyproject.toml#L15 ,单测才能跑起来

已添加

Copy link
Contributor

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lugimzzz lugimzzz merged commit 58eb2be into PaddlePaddle:develop Jan 12, 2023
@lugimzzz lugimzzz deleted the dataaug branch January 12, 2023 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants