-
Notifications
You must be signed in to change notification settings - Fork 67
feat(generator): emit temporal checks for date/datetime min/max #624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(generator): emit temporal checks for date/datetime min/max #624
Conversation
- Update dq_generate_min_max to pass through datetime.date and datetime.datetime without stringification - Emit is_in_range / is_not_less_than / is_not_greater_than based on provided bounds - Add unit tests for DateType and TimestampType - Preserve existing numeric behavior
… and fix logging capture - Update test_generate_dq_rules_warn to expect the temporal is_not_less_than check for product_launch_date when level="warn" - Make test_generate_dq_rules_logging deterministic by adding an unknown rule and capturing the generator logger at INFO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds temporal checks support to the DQ generator, enabling date and datetime min/max validation with proper type handling. The changes extend the generator to emit temporal-specific checks while preserving numeric behavior and adding comprehensive test coverage.
- Emit temporal checks for date/datetime min/max bounds using appropriate check functions
- Pass Python date/datetime objects through without stringification
- Add unit tests for temporal pass-through and integration test updates
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/databricks/labs/dqx/profiler/generator.py | Updated min/max generator to handle temporal types and emit appropriate checks |
| tests/unit/test_generator_temporal.py | Added unit tests for date/datetime temporal check generation |
| tests/integration/test_rules_generator.py | Updated integration test to expect temporal checks and improved logging test |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| def _is_num(value): | ||
| return isinstance(value, int) |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _is_num function only checks for int type but doesn't include float or other numeric types like Decimal. This could miss valid numeric values for min/max checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i guess we should also support float and decimal here
| import logging | ||
| import datetime |
Copilot
AI
Oct 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The logging import is added but datetime import should come first according to PEP 8 import ordering (standard library imports should be alphabetically ordered).
| import logging | |
| import datetime | |
| import datetime | |
| import logging |
|
I closed it because I want to be sure all integration tests are running successfully. |
It's ok, you can also leave a comment that you are still working on it. No need to close. |
| } | ||
|
|
||
| @staticmethod | ||
| def dq_generate_min_max(column: str, level: str = "error", **params: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def dq_generate_min_max(column: str, level: str = "error", **params: dict) -> list[dict]: |
please add return types to other generate methods as well
| # numeric with numeric OR temporal with temporal | ||
| if value_a is None or value_b is None: | ||
| return True | ||
| return (_is_num(value_a) and _is_num(value_b)) or (_is_temporal(value_a) and _is_temporal(value_b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return (_is_num(value_a) and _is_num(value_b)) or (_is_temporal(value_a) and _is_temporal(value_b)) | |
| return any([ | |
| _is_num(value_a) and _is_num(value_b), | |
| _is_temporal(value_a) and _is_temporal(value_b), | |
| ]) |
to simplify and make it easier to extend
| parameters={"min": datetime.date(2020, 1, 1), "max": None}, | ||
| description="Real min/max values were used", | ||
| ), | ||
| DQProfile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these cases remove?
| "min_limit": val_maybe_to_str(min_limit, include_sql_quotes=False), | ||
| "max_limit": val_maybe_to_str(max_limit, include_sql_quotes=False), | ||
| # pass through Python ints or datetime/date without stringification | ||
| "min_limit": min_limit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't that cause issues down the line? shouldn't the val_maybe_to_str still be used?
| def _is_num(value): | ||
| return isinstance(value, int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i guess we should also support float and decimal here
Changes
Notes:
Linked issues
Resolves databrickslabs/dqx#71
Tests
Additional details: