Skip to content

Conversation

telesoho
Copy link
Contributor

… embedding extraction

  • Updated the EmbeddingExtractor class to check if the embedding model supports async operations.
  • If async is supported, it uses the async method aembed_text; otherwise, it falls back to the sync method embed_text.
  • This change improves flexibility and compatibility with different client implementations.

Issue Link / Problem Description

Screenshots/Examples (if applicable)

image

… embedding extraction

- Updated the EmbeddingExtractor class to check if the embedding model supports async operations.
- If async is supported, it uses the async method `aembed_text`; otherwise, it falls back to the sync method `embed_text`.
- This change improves flexibility and compatibility with different client implementations.
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Sep 16, 2025
@jjmachan
Copy link
Member

hey @telesoho thank you so much for raising the fix ❤️

would you fix the CI issue if you get a chance, we'll love to merge it in after that.

also I was wondering if I could learn more about how you are using testset generation - we are planning to do some improvements to it and would love your feedback on it. can you share a few timeslots that work or a calendar I can use? alternatively this is mine if it helps https://cal.com/jjmachan/huddle

- Updated the EmbeddingExtractor class to enhance code readability by standardizing string quotes and formatting.
- Adjusted the conditional checks for async support to improve clarity and maintainability.
- These changes contribute to a cleaner codebase while preserving existing functionality.
@telesoho
Copy link
Contributor Author

Hi @jjmachan, thanks for the quick feedback!
I’ve fixed the CI issue and pushed the update—please let me know if anything else is needed.

Regarding testset generation, I’m using Ragas to automatically create test sets from an Odoo model and business-documentation knowledge base to evaluate retrieval + generation accuracy.
I’m still in the early stage of using it, so I don’t have specific feedback or suggestions yet, but I’ll be happy to share thoughts later as I gain more experience.

Thanks again for reviewing and maintaining this project!

- Updated the conditional check for async support in the EmbeddingExtractor class to improve readability.
- The check now uses `getattr` to safely access the `is_async` attribute, ensuring it defaults to False if not present.
- These changes enhance code maintainability while preserving existing functionality.
Copy link
Contributor

@anistark anistark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should add a warning when using sync clients in async contexts.

- Added a warning when using a synchronous embedding model in an asynchronous context to inform users about potential performance impacts.
- Wrapped the synchronous embedding method in a thread executor to prevent blocking, improving compatibility with async workflows.
- These changes enhance the flexibility of the EmbeddingExtractor while maintaining existing functionality.
@anistark anistark merged commit 6db09a2 into explodinggradients:main Sep 22, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to create Test data with TestsetGenerator.

3 participants