Skip to content

fix:add batchSize to semantic embedStrings #224 #225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zzy2210
Copy link

@zzy2210 zzy2210 commented Apr 22, 2025

为 semantic splitter 组件增加 batchSize 配置
作用: 使组件进行 embedStrings 操作时,不超出batchSize,以通过部分API提供商的批量最大值限制

@CLAassistant
Copy link

CLAassistant commented Apr 22, 2025

CLA assistant check
All committers have signed the CLA.

end = total
}

batch := combinedSentences[start:end]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这样处理会影响效果,目前主流 embedding 会嵌入句子间的联系,分批嵌入会影响下边计算的准确性。 这里我更倾向于用户使用其他split 方法先把输入切分成 embedding 支持的范围内再使用 semtic splitter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants