Skip to content

Conversation

kaikaila
Copy link
Contributor

@kaikaila kaikaila commented Aug 20, 2025

Introduce an internal LargeText model type and replace all gorm: type:longtext usages. LargeText implements GormDBDataType() to map to MySQL=LONGTEXT and Postgres=TEXT. This unblocks Postgres AutoMigrate while keeping MySQL schemas unchanged.

Motivation

Postgres fails on migrate with type "longtext" does not exist (SQLSTATE 42704). Making the type dialect-aware removes this blocker.

Scope

Although this PR is “models-first”, some call sites needed type conversions/wiring only so code compiles and tests run. No query semantics or business logic were changed.

Backward compatibility

  • MySQL: No schema drift (still LONGTEXT); existing data unaffected.
  • Postgres: Fresh installs migrate cleanly (no longtext errors).
  • External API / user-visible behavior: No change (internal refactor only).

Testing

  • MySQL backend integration tests remain green (no expected diffs).
  • Postgres (pgx) CI job is currently marked allowed to fail; this PR should remove the early longtext migration failure and let tests proceed further.
  • Local reproduction (example):
# MySQL
DB_TYPE=mysql go test -v ./backend/test/integration/...

# Postgres (pgx)
DB_TYPE=postgres DB_DRIVER=pgx go test -v ./backend/test/integration/... \
  -args -runIntegrationTests=true -isDevMode=true -runPostgreSQLTests=true -localTest=true

Notes for reviewers

  • This PR is intentionally models-only to keep review surface small.
  • No query/dialect logic changes here; follow-ups will migrate stores to the injected sqBuilder gradually.

Changelog

Internal refactor; no user-facing change. No release note needed.

Part of #12063.

Reference

This PR is part of RFC discussion in #12047.

Technical Appendix

Click to expand full investigation summary

🔍 Background & Rationale

Kubeflow Pipelines previously relied on gorm:"size:65535" to represent large string fields (e.g., descriptions, manifests, parameters).
In GORM v1, this produced compatible behavior:

However, in GORM v2, internal dialect logic has changed:

  • MySQL now interprets size:65535 as MEDIUMTEXT, which may be too small for our payloads
  • PostgreSQL still uses TEXT, but will fail if type:longtext is explicitly used (since Postgres doesn’t recognize it)
  • Using VARCHAR(65535) in MySQL can exceed row size limits and lead to errors

Thus, the old approach is no longer reliable for cross-database compatibility.

✅ Chosen Solution: Define a LargeText Custom Type

Following GORM official recommendation, We introduce a dialect-aware LargeText type:

type LargeText string

func (LargeText) GormDBDataType(db *gorm.DB, field *schema.Field) string {
	switch db.Dialector.Name() {
	case "mysql":
		return "LONGTEXT"
	case "postgres", "pgx":
		return "TEXT"
	default:
		return "TEXT"
	}
}

This approach is :
• Ensures correct behavior for both MySQL and PostgreSQL
• Avoids manual ALTER COLUMN migrations
• Makes the semantic intent (this is a large string field) explicit in code

🛠 Refactor Scope

We’ve audited existing fields and begun migrating high-risk ones like:
• Description
• Parameters
• PipelineSpecURI

This surfaced ~13 compile-time errors due to type mismatches, primarily in:
• Struct literal initializations
• Function parameter type checks

Most of these are resolved with explicit conversions like LargeText(x) or string(x).

🚨 Fallback Plan

If this refactor proves too invasive, we will fall back to a simpler (but less elegant) approach, which currenly is adopted in the GORM-v1 kfp:
• Keep gorm:"type:longtext" for MySQL only fields
• Use db.Migrator().AlterColumn(...) during DB init to patch schema post-facto

Copy link

Hi @kaikaila. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

🚫 This command cannot be processed. Only organization members or owners can use the commands.

@kaikaila kaikaila changed the title [WIP] refactor(backend): introduce dialect-aware LargeText to replace longtext (unblocks Postgres AutoMigrate). [WIP] refactor(backend): introduce dialect-aware LargeText and replace longtext (+call-site type adjustments). Aug 20, 2025
@kaikaila kaikaila changed the title [WIP] refactor(backend): introduce dialect-aware LargeText and replace longtext (+call-site type adjustments). [WIP] refactor(backend): introduce dialect-aware LargeText and replace longtext (+call-site type adjustments). Part of #12063 Aug 20, 2025
@mprahl
Copy link
Collaborator

mprahl commented Aug 21, 2025

/ok-to-test

@mprahl
Copy link
Collaborator

mprahl commented Aug 21, 2025

@kaikaila is this PR still WIP?

@kaikaila kaikaila changed the title [WIP] refactor(backend): introduce dialect-aware LargeText and replace longtext (+call-site type adjustments). Part of #12063 refactor(backend): introduce dialect-aware LargeText and replace longtext (+call-site type adjustments). Part of #12063 Aug 21, 2025
@kaikaila
Copy link
Contributor Author

Hi @mprahl , thanks for checking in!
I’ve just removed the WIP from the title—this PR is now ready for review.

Copy link
Collaborator

@mprahl mprahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

It's unfortunate that we have to change Go types but it seems there's no other way. Thanks for digging into this!

@mprahl
Copy link
Collaborator

mprahl commented Aug 28, 2025

@kaikaila it looks like there are some merge conflicts. Let me know when they are resolved.

Copy link
Collaborator

@mprahl mprahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Aug 28, 2025
@kaikaila
Copy link
Contributor Author

Thanks a lot @mprahl ! I really appreciate your review and the LGTM.
There was a conflict after master moved forward, so I rebased and resolved it. The CI is re-running now — once it finishes, if everything is good, I'll ping you.

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mprahl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mprahl
Copy link
Collaborator

mprahl commented Aug 28, 2025

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Aug 28, 2025
@google-oss-prow google-oss-prow bot merged commit 645aef5 into kubeflow:master Aug 28, 2025
86 checks passed
@kaikaila kaikaila deleted the pr/largetext branch August 28, 2025 23:07
VaniHaripriya pushed a commit to VaniHaripriya/data-science-pipelines that referenced this pull request Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants