Skip to content

Conversation

IanHoang
Copy link
Collaborator

@IanHoang IanHoang commented Aug 11, 2025

Description

This PR adds pydantic to OSB's SDG feature that will debut with OSB 2.0. Pydantic ensures that data conforms to the format OSB expects before it uses it to generate data. It uses type hints to enforce data validation at runtime and provides robust error handling. This will overall make SDG more robust, prevent bugs, and make it easier to add new configurations that users can toggle with.

What's changed?

  • Not much has changed. Essentially did a swap-in-and-replace. I've renamed types.py to models.py. This module originally contained just a dictionary containing default values for generation settings and a dataclass that tracks SDG metadata. The dataclass has been converted to a pydantic model and there are new models that relate to the sdg-config.yml file users provide. This limits the chance of users introducing subtle bugs.
  • Updated pylint from 2.6.0 to 2.9.0 to address compatibility issues with newer libraries like pydantic. While doing this, I updated .pylintrc to include more lint ids to ignore that are not necessary. These came up with the update from 2.6.0 to 2.9.0

Ran several E2E tests with different scenarios. SDG performance remains the same. This is an interim PR that lays strong foundation for subsequent PRs related to user sdg-config changes.

Issues Resolved

#930

Testing

  • Tested with custom sdg with config
  • Tested with custom sdg without config
  • Tested with basic mappings without config
  • Tested with basic mappings with config
  • Tested with complex mappings without config
  • Tested with complex mappings with config
  • Tested rollovers

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…tadat from dataclass to Pydantic cmodel for robustness

Signed-off-by: Ian Hoang <[email protected]>
…tness, and extensibility. Tested E2E and works for all case scenarios

Signed-off-by: Ian Hoang <[email protected]>
Signed-off-by: Ian Hoang <[email protected]>
@IanHoang IanHoang changed the title Add Pydantic to SDG for stronger validation and extensibility Add Pydantic to SDG for stronger validation, error handling, and extensibility Aug 12, 2025
@IanHoang IanHoang added the 2.0 label Aug 12, 2025
@IanHoang IanHoang merged commit f94e5aa into opensearch-project:2.0-beta Aug 13, 2025
10 checks passed
gkamat pushed a commit to gkamat/opensearch-benchmark that referenced this pull request Aug 18, 2025
gkamat pushed a commit to gkamat/opensearch-benchmark that referenced this pull request Aug 18, 2025
gkamat pushed a commit that referenced this pull request Aug 19, 2025
IanHoang added a commit that referenced this pull request Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants