Skip to content

Commit f3f23f5

Browse files
authored
feat: Validation script rewrite with more checks. (#237)
### Summary Adding a new script (replacing previously existing one) for customers to be able to run after applying CFT in order to validate and help troubleshoot any issues. Checks included: * Cross-Account Role Policy Simulation (Applies for Rift and EMR/DBX) * Rift Compute roles policy simulation * Rift VPC (Subnets, Security Group) validation * EMR VPC (Subnets) validation Included a readme with instructions - script can be run most easily with `uv`, but also can be run in virtualenv (with dependencies listed in readme, directly in [cli.py](https://app.graphite.dev/github/pr/tecton-ai/tecton-terraform-setup/237/feat-Validation-script-rewrite-with-more-checks.?org=tecton-ai#file-scripts/validate-tecton.py), and also a requirements file). Running the script looks like: ```sh uv run scripts/validate-tecton.py \ --compute-engine rift \ --terraform-outputs outputs.json ``` Also added [readme within script directory ](https://github.com/tecton-ai/tecton-terraform-setup/blob/validate-v2/scripts/tecton_validate/README.md)describing implementation/how to add more checks. ### Testing ![image.png](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/oEOVnVDr6Oa3V9nHrGBV/b16660a5-3cfd-4993-a381-eaa5db03c139.png) ![image.png](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/oEOVnVDr6Oa3V9nHrGBV/1c1d8281-a562-4699-a28c-2a6c9d7d6afe.png)
1 parent 70b7713 commit f3f23f5

21 files changed

+1312
-426
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,5 @@
22
*.iml
33
**/.terraform
44

5+
scripts/.venv/
6+
*.pyc

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,3 +62,7 @@ module "tecton" {
6262
```
6363

6464
Please refer to the specific `README.md` within each module's directory for detailed instructions and the full list of variables for that module.
65+
66+
67+
### Validation Script
68+
There is a validation script ([details here](./scripts/README.md)) that can be run after applying one of the above modules, to check that the expected resources are in lace.

modules/dataplane_rift_with_emr/README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ module "tecton" {
3838
tecton_control_plane_account_id = "987654321098" # Replace with Tecton's Control Plane Account ID
3939
cross_account_external_id = "your-external-id" # Replace with the External ID from Tecton
4040
tecton_control_plane_role_name = "TectonControlPlaneRole" # Role name from Tecton
41-
include_crossaccount_bucket_access = false
4241
4342
# Get outputs destination URL from Tecton
4443
outputs_location_config = {

modules/tecton_outputs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ Refer to `variables.tf` for full documentation of new parameters.
104104
|------|-------------|------|---------|:--------:|
105105
| <a name="input_control_plane_account_id"></a> [control\_plane\_account\_id](#input\_control\_plane\_account\_id) | The AWS account ID of the Tecton control plane | `string` | n/a | yes |
106106
| <a name="input_deployment_name"></a> [deployment\_name](#input\_deployment\_name) | The name of the Tecton deployment | `string` | n/a | yes |
107-
| <a name="input_outputs_data"></a> [outputs\_data](#input\_outputs\_data) | Tecton deployment outputs data to store in S3. Different deployment types (controlplane\_rift, dataplane\_rift, emr, databricks, etc.) will provide different subsets of these fields. | <pre>object({<br/> # Core fields - present in all deployment types<br/> deployment_name = string<br/> region = string <br/> cross_account_role_arn = string<br/> cross_account_external_id = string<br/> kms_key_arn = optional(string)<br/><br/> # Rift compute fields - present in dataplane_rift and dataplane_rift_with_emr<br/> compute_manager_arn = optional(string)<br/> compute_instance_profile_arn = optional(string) <br/> compute_arn = optional(string)<br/> vm_workload_subnet_ids = optional(string)<br/> anyscale_docker_target_repo = optional(string)<br/> nat_gateway_public_ips = optional(list(string))<br/> rift_compute_security_group_id = optional(string)<br/><br/> # EMR/Spark fields - present in emr and dataplane_rift_with_emr<br/> spark_role_arn = optional(string)<br/> spark_instance_profile_arn = optional(string)<br/> emr_master_role_arn = optional(string)<br/> notebook_cluster_id = optional(string)<br/> vpc_id = optional(string)<br/> emr_subnet_id = optional(string)<br/> emr_subnet_route_table_ids = optional(list(string))<br/> emr_security_group_id = optional(string)<br/> emr_service_security_group_id = optional(string)<br/><br/> # Databricks-specific fields - present in databricks module<br/> spark_role_name = optional(string)<br/> spark_instance_profile_name = optional(string)<br/> databricks_workspace_url = optional(string)<br/> })</pre> | n/a | yes |
107+
| <a name="input_outputs_data"></a> [outputs\_data](#input\_outputs\_data) | Tecton deployment outputs data to store in S3. Different deployment types (controlplane\_rift, dataplane\_rift, emr, databricks, etc.) will provide different subsets of these fields. | <pre>object({<br/> # Core fields - present in all deployment types<br/> deployment_name = string<br/> region = string <br/> cross_account_role_arn = string<br/> cross_account_external_id = string<br/> kms_key_arn = optional(string)<br/> dataplane_account_id = optional(string)<br/><br/> # Rift compute fields - present in dataplane_rift and dataplane_rift_with_emr<br/> compute_manager_arn = optional(string)<br/> compute_instance_profile_arn = optional(string) <br/> compute_arn = optional(string)<br/> vm_workload_subnet_ids = optional(string)<br/> anyscale_docker_target_repo = optional(string)<br/> nat_gateway_public_ips = optional(list(string))<br/> rift_compute_security_group_id = optional(string)<br/><br/> # EMR/Spark fields - present in emr and dataplane_rift_with_emr<br/> spark_role_arn = optional(string)<br/> spark_instance_profile_arn = optional(string)<br/> emr_master_role_arn = optional(string)<br/> notebook_cluster_id = optional(string)<br/> vpc_id = optional(string)<br/> emr_subnet_id = optional(string)<br/> emr_subnet_route_table_ids = optional(list(string))<br/> emr_security_group_id = optional(string)<br/> emr_service_security_group_id = optional(string)<br/><br/> # Databricks-specific fields - present in databricks module<br/> spark_role_name = optional(string)<br/> spark_instance_profile_name = optional(string)<br/> databricks_workspace_url = optional(string)<br/> })</pre> | n/a | yes |
108108
| <a name="input_outputs_location_config"></a> [outputs\_location\_config](#input\_outputs\_location\_config) | Configuration for where to store the outputs. | <pre>object({<br/> type = string # "new_bucket", "offline_store_bucket_path", or "tecton_hosted_presigned"<br/> <br/> # For offline_store_bucket_path<br/> offline_store_bucket_name = optional(string)<br/> offline_store_bucket_path_prefix = optional(string, "internal/tecton-outputs/")<br/> <br/> # For tecton_hosted_presigned<br/> tecton_presigned_write_url = optional(string)<br/> trigger_upload = optional(bool, false)<br/> })</pre> | <pre>{<br/> "tecton_presigned_write_url": "",<br/> "trigger_upload": false,<br/> "type": "tecton_hosted_presigned"<br/>}</pre> | no |
109109
| <a name="input_tags"></a> [tags](#input\_tags) | A map of tags to assign to resources | `map(string)` | `{}` | no |
110110
## Outputs

modules/tecton_outputs/variables.tf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ variable "outputs_data" {
2323
cross_account_role_arn = string
2424
cross_account_external_id = string
2525
kms_key_arn = optional(string)
26+
dataplane_account_id = optional(string)
2627

2728
# Rift compute fields - present in dataplane_rift and dataplane_rift_with_emr
2829
compute_manager_arn = optional(string)

scripts/README.md

Lines changed: 54 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,61 @@
1-
# Tecton Terraform Setup Scripts
1+
# Tecton Terraform Setup Validation
22

33
## Validate
44

5-
Databricks example:
5+
The `validate-tecton.py` script checks/validates your Tecton AWS setup based on the compute engine you're using.
66

7-
- This script should be run as a role which:
8-
- can assume the cross-account role passed in as `--ca-role`
9-
- has the permission: `iam:SimulateCustomPolicy` on `*`
10-
- If necessary, use of `aws-vault exec <some_other_role> -- ...` may be done as well for role chaining
7+
### Prerequisites
118

9+
- Python 3.9+
10+
- [uv](https://docs.astral.sh/uv/) (recommended) or Python environment with the following dependencies installed: `boto3`, `rich`, `jinja2`, `requests` ([requirements.txt](./requirements.txt))
11+
- AWS credentials configured (via CLI, environment variables, or IAM role) with [permissions required for simulating policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_testing-policies.html#permissions-required_policy-simulator), along with permissions to view S3 and VPC resources. Ideally, the same role you used to run the terraform modules.
12+
13+
### Usage
14+
15+
After _applying_ the terraform module (e.g `dataplane_rift`, `emr`, etc..), write the outputs to a json file with `terraform output -json > outputs.json`.
16+
17+
#### Quick Start with uv (Recommended)
18+
19+
The easiest way to run the validation script is with `uv`, which automatically handles all dependencies:
20+
21+
22+
**Rift Compute Engine:**
1223
```shell
13-
python3 validate.py \
14-
--region us-west-2 \
15-
--external-id 'abd123' \
16-
--account-id '1234567890' \
17-
--ca-role 'my-tecton-deployment-ca-role' \
18-
--deployment-name 'my-tecton-deployment' \
19-
--spark-role 'my-tecton-deployment-spark-role'
24+
uv run scripts/validate-tecton.py \
25+
--compute-engine rift \
26+
--terraform-outputs outputs.json
2027
```
28+
29+
**Databricks Compute Engine:**
30+
```shell
31+
uv run scripts/validate-tecton.py \
32+
--compute-engine databricks \
33+
--terraform-outputs outputs.json
34+
```
35+
36+
**Spark/EMR Compute Engine:**
37+
```shell
38+
uv run scripts/validate-tecton.py \
39+
--compute-engine emr \
40+
--terraform-outputs outputs.json
41+
```
42+
43+
#### Alternative: Traditional Python
44+
45+
If you prefer not to use uv, you can install dependencies manually and run with Python:
46+
47+
You can find the [requirements.txt](./requirements.txt) file in this repo.
48+
49+
```shell
50+
# (In virtual env) Install dependencies
51+
pip install -r requirements.txt
52+
53+
# Run validation
54+
python3 scripts/validate-tecton.py \
55+
--compute-engine rift \
56+
--terraform-outputs outputs.json
57+
```
58+
59+
### Contributing
60+
61+
See [./tecton_validate/README.md](./tecton_validate/README.md) for details on how to add new checks.

scripts/requirements.txt

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,18 @@
1-
boto3==1.25.5
2-
Jinja2==3.0.3
3-
Requests==2.31.0
1+
boto3==1.38.40
2+
botocore==1.38.40
3+
certifi==2025.6.15
4+
charset-normalizer==3.4.2
5+
idna==3.10
6+
Jinja2==3.1.6
7+
jmespath==1.0.1
8+
markdown-it-py==3.0.0
9+
MarkupSafe==3.0.2
10+
mdurl==0.1.2
11+
Pygments==2.19.1
12+
python-dateutil==2.9.0.post0
13+
requests==2.32.4
14+
rich==14.0.0
15+
s3transfer==0.13.0
16+
six==1.17.0
17+
typing_extensions==4.14.0
18+
urllib3==1.26.20

scripts/tecton_validate/README.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
2+
## Checks
3+
4+
The validation system uses a modular check architecture that automatically discovers and runs validation checks based on the compute engine.
5+
6+
### Core Components
7+
8+
- **ValidationResult**: Represents the outcome of a single check with `name`, `success` (bool), `details`, and optional `remediation`
9+
- **ValidationCheck**: Couples a validation function with metadata (`name`, `run`, `remediation`, `only_for`)
10+
- **Check modules**: Python files in `tecton_validate/checks/` that define validation logic
11+
12+
### How Checks Work
13+
14+
1. **Auto-discovery**: All `.py` files in `tecton_validate/checks/` are automatically imported
15+
2. **Aggregation**: Each module's `CHECKS` list is collected into a master list
16+
3. **Filtering**: Only checks applicable to the specified `--compute-engine` are executed
17+
4. **Execution**: Each check receives CLI args, boto3 session, and Rich console for output
18+
19+
### Adding a New Check
20+
21+
Create a new check by adding a function and registering it in any check module:
22+
23+
```python
24+
# In tecton_validate/checks/my_new_checks.py
25+
from tecton_validate.validation_types import ValidationCheck, ValidationResult
26+
import argparse
27+
import boto3
28+
from rich.console import Console
29+
30+
def _check_my_feature(args: argparse.Namespace, session: boto3.Session, console: Console) -> ValidationResult:
31+
"""Check some aspect of the infrastructure."""
32+
try:
33+
# Your validation logic here
34+
if everything_looks_good:
35+
return ValidationResult(
36+
name="My Feature Check",
37+
success=True,
38+
details="Feature is properly configured."
39+
)
40+
else:
41+
return ValidationResult(
42+
name="My Feature Check",
43+
success=False,
44+
details="Feature misconfiguration detected.",
45+
remediation="Run 'terraform apply' to fix the configuration."
46+
)
47+
except Exception as e:
48+
return ValidationResult(
49+
name="My Feature Check",
50+
success=False,
51+
details=f"Error during validation: {e}",
52+
remediation="Check AWS permissions and network connectivity."
53+
)
54+
55+
# Register the check
56+
CHECKS = [
57+
ValidationCheck(
58+
name="My Feature Check",
59+
run=_check_my_feature,
60+
remediation="Ensure feature is enabled in your Terraform configuration.",
61+
)
62+
]
63+
```
64+
65+
### Restricting Checks to Specific Compute Engines
66+
67+
To make a check run only for certain compute engines, add an `only_for` attribute to the ValidationCheck object:
68+
69+
```python
70+
# Check runs only for EMR
71+
MyCheck.only_for = ["emr"]
72+
73+
# Check runs for multiple engines
74+
MyCheck.only_for = ["emr", "databricks"]
75+
76+
# No only_for attribute = runs for all engines (default)
77+
```
78+
79+
Available compute engines: `"rift"`, `"emr"`, `"databricks"`
80+
81+
### Expected Function Signature
82+
83+
All check functions must follow this signature:
84+
85+
```python
86+
def check_function(
87+
args: argparse.Namespace, # CLI arguments
88+
session: boto3.Session, # Configured AWS session
89+
console: Console # Rich console for output
90+
) -> ValidationResult:
91+
pass
92+
```
93+
94+
### Common Patterns
95+
96+
**AWS Resource Checks:**
97+
```python
98+
def _check_s3_bucket(args, session, console):
99+
s3 = session.client("s3")
100+
bucket_name = f"tecton-{args.cluster_name}"
101+
try:
102+
s3.head_bucket(Bucket=bucket_name)
103+
return ValidationResult("S3 Bucket", True, f"Bucket {bucket_name} exists")
104+
except ClientError:
105+
return ValidationResult("S3 Bucket", False, f"Bucket {bucket_name} not found")
106+
```
107+
108+
**IAM Policy Validation:**
109+
```python
110+
from tecton_validate.policy_test import test_policy
111+
112+
def _check_iam_permissions(args, session, console):
113+
result = test_policy(session, role_arn, policy_document, actions_to_test)
114+
return ValidationResult("IAM Permissions", result.success, result.details)
115+
```
116+
117+
**Terraform Output Integration:**
118+
```python
119+
from tecton_validate.terraform import load_terraform_outputs
120+
121+
def _check_terraform_resource(args, session, console):
122+
if args.terraform_outputs:
123+
outputs = load_terraform_outputs(args.terraform_outputs)
124+
resource_id = outputs.get("my_resource_id")
125+
# Validate the resource exists...
126+
```
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
from importlib import import_module
2+
from pkgutil import iter_modules
3+
from pathlib import Path
4+
5+
# Public types re-exported for convenience
6+
from .validation_types import ValidationResult, ValidationCheck # noqa: F401
7+
8+
# Build aggregated CHECKS list dynamically so cli can pull everything automatically.
9+
CHECKS = []
10+
_checks_path = Path(__file__).with_suffix("").parent / "checks"
11+
for _m in iter_modules([str(_checks_path)]):
12+
mod = import_module(f".{_m.name}", package="tecton_validate.checks")
13+
CHECKS.extend(getattr(mod, "CHECKS", []))
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
CHECKS = [] # populated by sub-modules

0 commit comments

Comments
 (0)