Skip to content

Conversation

@bsr-the-mngrm
Copy link
Contributor

@bsr-the-mngrm bsr-the-mngrm commented Oct 18, 2025

Changes

  • Update profiler and generator to emit temporal checks when profiles provide datetime.date or datetime.datetime:
    • both bounds → is_in_range(column, min_limit, max_limit)
    • only min → is_not_less_than(column, limit)
    • only max → is_not_greater_than(column, limit)
  • Pass Python date/datetime objects through without stringification (delegating rendering to existing execution paths).
  • Update integration test expectations to include a temporal check.
  • Add unit tests for date/datetime pass-through in the generator.
  • Preserve numeric behavior; no changes to DLT generator or check funcs.

Notes:

  • Serializer/round-trip for JSON and checks table remains string-only for temporal values; follow-ups planned to add typed encoding.

Linked issues

Resolves databrickslabs/dqx#71

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • added end-to-end tests
  • added performance tests

Additional details:

  • Unit tests verify generator pass-through of datetime.date and datetime.datetime.
  • Integration test updated to expect is_not_less_than for product_launch_date.

- Update dq_generate_min_max to pass through datetime.date and datetime.datetime without stringification
- Emit is_in_range / is_not_less_than / is_not_greater_than based on provided bounds
- Add unit tests for DateType and TimestampType
- Preserve existing numeric behavior
… and fix logging capture

- Update test_generate_dq_rules_warn to expect the temporal is_not_less_than check for product_launch_date when level="warn"
- Make test_generate_dq_rules_logging deterministic by adding an unknown rule and capturing the generator logger at INFO
@bsr-the-mngrm bsr-the-mngrm requested a review from a team as a code owner October 18, 2025 20:42
@bsr-the-mngrm bsr-the-mngrm requested review from nehamilak-db and removed request for a team October 18, 2025 20:42
@mwojtyczka mwojtyczka requested a review from Copilot October 18, 2025 21:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds temporal checks support to the DQ generator, enabling date and datetime min/max validation with proper type handling. The changes extend the generator to emit temporal-specific checks while preserving numeric behavior and adding comprehensive test coverage.

  • Emit temporal checks for date/datetime min/max bounds using appropriate check functions
  • Pass Python date/datetime objects through without stringification
  • Add unit tests for temporal pass-through and integration test updates

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/databricks/labs/dqx/profiler/generator.py Updated min/max generator to handle temporal types and emit appropriate checks
tests/unit/test_generator_temporal.py Added unit tests for date/datetime temporal check generation
tests/integration/test_rules_generator.py Updated integration test to expect temporal checks and improved logging test

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +86 to +87
def _is_num(value):
return isinstance(value, int)
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _is_num function only checks for int type but doesn't include float or other numeric types like Decimal. This could miss valid numeric values for min/max checks.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i guess we should also support float and decimal here

Comment on lines +1 to 2
import logging
import datetime
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The logging import is added but datetime import should come first according to PEP 8 import ordering (standard library imports should be alphabetically ordered).

Suggested change
import logging
import datetime
import datetime
import logging

Copilot uses AI. Check for mistakes.
@bsr-the-mngrm
Copy link
Contributor Author

I closed it because I want to be sure all integration tests are running successfully.

@mwojtyczka
Copy link
Contributor

I closed it because I want to be sure all integration tests are running successfully.

It's ok, you can also leave a comment that you are still working on it. No need to close.

@bsr-the-mngrm bsr-the-mngrm reopened this Oct 19, 2025
}

@staticmethod
def dq_generate_min_max(column: str, level: str = "error", **params: dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def dq_generate_min_max(column: str, level: str = "error", **params: dict) -> list[dict]:

please add return types to other generate methods as well

# numeric with numeric OR temporal with temporal
if value_a is None or value_b is None:
return True
return (_is_num(value_a) and _is_num(value_b)) or (_is_temporal(value_a) and _is_temporal(value_b))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (_is_num(value_a) and _is_num(value_b)) or (_is_temporal(value_a) and _is_temporal(value_b))
return any([
_is_num(value_a) and _is_num(value_b),
_is_temporal(value_a) and _is_temporal(value_b),
])

to simplify and make it easier to extend

parameters={"min": datetime.date(2020, 1, 1), "max": None},
description="Real min/max values were used",
),
DQProfile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these cases remove?

"min_limit": val_maybe_to_str(min_limit, include_sql_quotes=False),
"max_limit": val_maybe_to_str(max_limit, include_sql_quotes=False),
# pass through Python ints or datetime/date without stringification
"min_limit": min_limit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't that cause issues down the line? shouldn't the val_maybe_to_str still be used?

Comment on lines +86 to +87
def _is_num(value):
return isinstance(value, int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i guess we should also support float and decimal here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Handle timestamps and dates in the checks generator

2 participants