You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Load and Save checks from file in Unity Catalog Volume (#512)
## Changes
* Added support for a new storage type for checks: Unity Catalog Volume.
* Unified checks location into a single configuration field:
`checks_location`. This replaces the previous `checks_file` and
`checks_table` fields, removing ambiguity by ensuring only one storage
location can be defined per run configuration.
BREAKING CHANGES!
* The `checks_file` and `checks_table` fields have been removed from the
installation run configuration. They are now consolidated into the
single `checks_location` field. This change simplifies the configuration
and clearly defines where checks are stored.
### Linked issues
This PR addresses the [FEATURE]: Load and Save quality checks from/in UC
Volume #386
### Tests
- [X] manually tested
- [x] added unit tests
- [X] added integration tests
---------
Co-authored-by: Marcin Wojtyczka <[email protected]>
Co-authored-by: Copilot <[email protected]>
Copy file name to clipboardExpand all lines: docs/dqx/docs/guide/data_profiling.mdx
+10-11Lines changed: 10 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -219,9 +219,9 @@ You can use `options` parameter to pass a dictionary with custom options when pr
219
219
220
220
The following DQX configuration from 'config.yml' is used by the profiler workflow:
221
221
- 'input_config': configuration for the input data.
222
-
- 'checks_file': relative location of the generated quality rule candidates as `yaml` or `json` file inside the installation folder (default: `checks.yml`).
222
+
- 'checks_location': relative location of the generated quality rule candidates as `yaml` or `json` file inside the workspace installation folder (default: `checks.yml`).
223
223
- 'profiler_config': configuration for the profiler containing:
224
-
- 'summary_stats_file': relative location of the summary statistics (default: `profile_summary.yml`) inside the installation folder
224
+
- 'summary_stats_file': relative location of the summary statistics (default: `profile_summary.yml`) inside the workspace installation folder
225
225
- 'sample_fraction': fraction of data to sample for profiling.
226
226
- 'sample_seed': seed for reproducible sampling.
227
227
- 'limit': maximum number of records to analyze.
@@ -348,7 +348,7 @@ The DLT generator creates Lakeflow Pipelines expectation statements from profile
348
348
349
349
## Storing Quality Checks
350
350
351
-
You can save checks defined in code or generated by the profiler to a table or file as `yaml` or `json` in the local path, workspace or installation folder.
351
+
You can save checks defined in code or generated by the profiler to a table or file as `yaml` or `json` in the local path, workspace, installation folder or Unity Catalog Volume file.
352
352
353
353
<Tabs>
354
354
<TabItemvalue="Python"label="Python"default>
@@ -358,7 +358,8 @@ You can save checks defined in code or generated by the profiler to a table or f
358
358
FileChecksStorageConfig,
359
359
WorkspaceFileChecksStorageConfig,
360
360
InstallationChecksStorageConfig,
361
-
TableChecksStorageConfig
361
+
TableChecksStorageConfig,
362
+
VolumeFileChecksStorageConfig
362
363
)
363
364
from databricks.sdk import WorkspaceClient
364
365
@@ -383,11 +384,6 @@ You can save checks defined in code or generated by the profiler to a table or f
Copy file name to clipboardExpand all lines: docs/dqx/docs/guide/quality_checks.mdx
+44-4Lines changed: 44 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ There are several ways to define and apply quality checks in DQX:
21
21
22
22
## Quality Rules defined in a File
23
23
24
-
Quality rules can be defined declaratively as part of `yaml` or `json` file and stored in the installation folder, workspace, or local file system.
24
+
Quality rules can be defined declaratively as part of `yaml` or `json` file and stored in the installation folder, workspace, local file system or Unity Catalog Volume file.
25
25
26
26
Below is an example `yaml` file ('checks.yml') defining several checks:
27
27
```yaml
@@ -175,7 +175,7 @@ In addition, you can also perform a standalone syntax validation of the checks a
175
175
176
176
dq_engine = DQEngine(WorkspaceClient())
177
177
178
-
# Load check from the installation (from file defined in 'checks_file' in the run config)
178
+
# Load check from the installation (from file or table defined in 'checks_location' in the run config)
Quality rules can be stored in a Delta table in Unity Catalog. Each row represents a check with column values for the `name`, `check`, `criticality`, `filter`, and `run_config_name`.
@@ -452,7 +492,7 @@ In addition, you can also perform a standalone syntax validation of the checks a
@@ -1025,7 +1065,7 @@ The validation cannot be used for checks defined programmatically using [DQX cla
1025
1065
```
1026
1066
1027
1067
The following DQX configuration from 'config.yml' will be used by default:
1028
-
- 'checks_file': relative location of the quality rules defined declaratively as `yaml` or `json` inside the installation folder (default: `checks.yml`).
1068
+
- 'checks_location': relative location of the quality rules defined declaratively as `yaml` or `json` inside the workspace installation folder (default: `checks.yml`).
Copy file name to clipboardExpand all lines: docs/dqx/docs/installation.mdx
+5-6Lines changed: 5 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -140,18 +140,17 @@ run_configs:
140
140
trigger: # <- streaming trigger, only applicable if input_config.is_streaming is enabled
141
141
availableNow: true
142
142
143
-
quarantine_config: # <- quarantine data configuration, if specified, bad data is written to quarantine table
143
+
quarantine_config: # <- optional quarantine data configuration, if specified, bad data is written to quarantine table
144
144
location: main.iot.silver_quarantine # <- quarantine location (table), used as input for quality dashboard
145
-
format: delta # <- format of the quarantine table
146
-
mode: append # <- write mode for the quarantine table (append or overwrite)
145
+
format: delta # <- format of the quarantine table (default: delta)
146
+
mode: append # <- write mode for the quarantine table (append or overwrite, default: append)
147
147
options: # <- additional options for writing to the quarantine table (optional)
148
148
mergeSchema: 'true'
149
149
#checkpointLocation: /Volumes/catalog1/schema1/checkpoint # <- only applicable if input_config.is_streaming is enabled
150
-
trigger: # <- streaming trigger, only applicable if input_config.is_streaming is enabled
150
+
trigger: # <- optional streaming trigger, only applicable if input_config.is_streaming is enabled
151
151
availableNow: true
152
152
153
-
checks_file: iot_checks.yml # <- relative location of the quality rules (checks) defined in json or yaml file
154
-
checks_table: main.iot.checks # <- table storing the quality rules (checks)
153
+
checks_location: iot_checks.yml # <- Quality rules (checks) can be stored in a table or defined in JSON or YAML files, located at a relative workspace file path or volume file path.
155
154
156
155
profiler_config: # <- profiler configuration
157
156
summary_stats_file: iot_summary_stats.yml # <- relative location of profiling summary stats
0 commit comments