chore: Seaweedfs as Minio replacement #11965

juliusvonkohout · 2025-06-10T16:29:56Z

Description of your changes:

Thank you everyone!
I am pushing this for 4 years or so and even had google and redhat employees involved, back then even Amazon. It is fundamental for CVEs, maintainabiliy (minio is now stuck for 5 years or so) and hard multi-tenancy as basic requirement for an enterprise platform. We also had approaches there for several years with minio. It started all in 2020 here #4649 and went via #7725 (2022) and kubeflow/manifests#2826 (October 2024) to kubeflow/manifests#3051 (2025). Without that experimental and extended tests it would have been very hard to pull of and coordinate. I want to especially highlight @pschoen-itsc who spent his effort here for the public health sector in Germany where many insurances need hard multi-tenancy to process data.

We evaluated many alternatives and now we have something S3 and IAM policy compatible, scalable and with hard multi-tenancy.

@akagami-harsh you can create branches against this PR.

use seaweedfs
replace the old sync.py etc.
refactor our tests to be usable within KFP
Remove Minio (including /env/azure and other legacy stuff)
Explain how to use the seaweedfs gateway for AWS/GCP/Azure S3

google-oss-prow · 2025-06-10T16:29:59Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

manifests/kustomize/third-party/seaweedfs/base/pipeline-profile-controller/sync.py

manifests/kustomize/third-party/seaweedfs/base/pipeline-profile-controller/deployment.yaml

manifests/kustomize/third-party/seaweedfs/base/seaweedfs/seaweedfs-create-admin-user-job.yaml

manifests/kustomize/third-party/seaweedfs/base/seaweedfs/seaweedfs-deployment.yaml

juliusvonkohout · 2025-06-13T11:05:57Z

closes #7725

.github/workflows/e2e-seaweedfs-test.yml

juliusvonkohout · 2025-07-03T14:28:07Z

We are still missing
Explain how to use the seaweedfs gateway for AW3/GCP/Azure S3 (Harshvir)
add an architectural diagram here for minio and in general for kubeflow/manifest as from my kubecon presentations and blogs (Julius)

but it is at least ready for a first review @HumairAK

.github/resources/manifests/argo/overlays/proxy/proxy-env.yaml

.github/resources/manifests/argo/overlays/no-proxy/workflow-disable-logs-patch.yaml

manifests/kustomize/hack/test.sh

juliusvonkohout · 2025-07-03T16:06:03Z

/retest

droctothorpe · 2025-07-08T18:55:53Z

This is incredibly comprehensive and impressive, @juliusvonkohout and @akagami-harsh. What obstacles do you anticipate for end users upgrading from Minio to Seawead? Does it make sense to provide migration documentation or automation?

juliusvonkohout · 2025-07-09T16:44:48Z

This is incredibly comprehensive and impressive, @juliusvonkohout and @akagami-harsh. What obstacles do you anticipate for end users upgrading from Minio to Seawead? Does it make sense to provide migration documentation or automation?

If users really need the old data then the cluster administrator needs to handle it anyway by copying via boto3 from minio to seaweedfs. So that is something we could deal with in follow up PRs. We could provide a job/cronjob that does this automatically. i think even LLMS can write this.

More interesting is probaly the seaweedfs gateway documentation link that shows how users can connect seaweefdfs to AWS/GCP/Azure/S3 compatible object storage. But also there i prefer a follow up PR. Lets /approve and merge what we have @HumairAK @hbelmiro @droctothorpe and continue in follow up PRs.

juliusvonkohout · 2025-08-18T14:34:33Z

/lgtm
@HumairAK for approval

akagami-harsh · 2025-08-18T14:35:01Z

/lgtm
@HumairAK for approval

Hopefully everything adressed

akagami-harsh · 2025-08-18T14:56:33Z

@akagami-harsh I noticed this flakiness occur in this CI failure https://github.com/kubeflow/pipelines/actions/runs/16993993913/job/48180413365?pr=11965

I noticed the seaweedfs pod was filled with:
2025-08-15T16:40:38.8534567Z I0815 16:39:12.797939 master_grpc_server_volume.go:141 volume grow &{Option:{"collection":"mlpipeline","replication":{},"ttl":{"Count":0,"Unit":0},"preallocate":1073741824,"version":3} Count:0 Force:false Reason:grpc assign}
2025-08-15T16:40:38.8535444Z E0815 16:39:12.801327 volume_grpc_admin.go:59 assign volume volume_id:82  collection:"mlpipeline"  preallocate:1073741824  replication:"000"  version:3: No more free space left
2025-08-15T16:40:38.8536338Z W0815 16:39:12.802934 volume_growth.go:273 Failed to assign volume 82 on topo:DefaultDataCenter:DefaultRack:10.244.0.26:8080: rpc error: code = Unknown desc = No more free space left
2025-08-15T16:40:38.8537464Z I0815 16:39:12.803731 volume_growth.go:120 create 7 volume, created 0: failed to assign volume 82 on topo:DefaultDataCenter:DefaultRack:10.244.0.26:8080: rpc error: code = Unknown desc = No more free space left
2025-08-15T16:40:38.8538277Z I0815 16:39:12.815178 master_grpc_server_volume.go:141 volume grow &{Option:{"collection":"mlpipeline","replication":{},"ttl":{"Count":0,"Unit":0},"preallocate":1073741824,"version":3} Count:0 Force:false Reason:grpc assign}
I also notice that the "Free up Disk Space" step took 8 min 39s to complete, which is odd given the others complete much faster. Still, the final disk usage reports the same - so I'm not sure it's related but it was the only one that stood out.

Not sure if this is seaweedFS specific, but looking at the history of this workflow, I couldn't find us encountering this behavior with minio in recent history. Thoughts?

I think lots of files were deleted in the free-disk space step, creating fragmented free space, there is a preallocation feature in seaweedfs. Preallocation needs large contiguous blocks, not just total free space. logs show preallocate:1073741824 (1GB) - it needs 1GB of contiguous space

2025-08-15T16:40:38.8534567Z I0815 16:39:12.797939 master_grpc_server_volume.go:141 volume grow &{Option:{"collection":"mlpipeline","replication":{},"ttl":{"Count":0,"Unit":0},"preallocate":1073741824,"version":3} Count:0 Force:false Reason:grpc assign}
2025-08-15T16:40:38.8535444Z E0815 16:39:12.801327 volume_grpc_admin.go:59 assign volume volume_id:82  collection:"mlpipeline"  preallocate:1073741824  replication:"000"  version:3: No more free space left
2025-08-15T16:40:38.8536338Z W0815 16:39:12.802934 volume_growth.go:273 Failed to assign volume 82 on topo:DefaultDataCenter:DefaultRack:10.244.0.26:8080: rpc error: code = Unknown desc = No more free space left
2025-08-15T16:40:38.8537464Z I0815 16:39:12.803731 volume_growth.go:120 create 7 volume, created 0: failed to assign volume 82 on topo:DefaultDataCenter:DefaultRack:10.244.0.26:8080: rpc error: code = Unknown desc = No more free space left
2025-08-15T16:40:38.8538277Z I0815 16:39:12.815178 master_grpc_server_volume.go:141 volume grow &{Option:{"collection":"mlpipeline","replication":{},"ttl":{"Count":0,"Unit":0},"preallocate":1073741824,"version":3} Count:0 Force:false Reason:grpc assign}

After disk cleanup, the disk likely has plenty of total free space but it's fragmented, i think this would explain why MinIO don't hit this issue

HumairAK · 2025-08-19T19:49:38Z

@akagami-harsh I'm skeptical that this is due to defragmentation, and it's likely due to the storage size we are setting in the pvc for seaweedfs, see here. Can we set it to the same size as the minio pvc at 20Gi here?

akagami-harsh · 2025-08-19T20:56:32Z

@akagami-harsh I'm skeptical that this is due to defragmentation, and it's likely due to the storage size we are setting in the pvc for seaweedfs, see here. Can we set it to the same size as the minio pvc at 20Gi here?

@HumairAK, Opened a pr to update pvc volume size #12156

Signed-off-by: Harshvir Potpose <[email protected]>

google-oss-prow · 2025-08-19T21:31:10Z

New changes are detected. LGTM label has been removed.

HumairAK · 2025-08-20T13:58:01Z

thanks @akagami-harsh, this lgtm, can you squash your commits and add a meaningful clean commit message?

juliusvonkohout · 2025-08-20T14:29:17Z

thanks @akagami-harsh, this lgtm, can you squash your commits and add a meaningful clean commit message?

The commits will be automatically squashed on merge with the PR title as commit message, so i would like to avoid any further changes.

also given multiple authors it is a bit more complicated.

HumairAK · 2025-08-20T14:49:06Z

I updated the commit message to a more meaningful message - in general I prefer to have the PR authors curate their message as they have more domain knowledge.

Thank you everyone for all your hard work on this - a much needed change, well done all around !

droctothorpe · 2025-08-20T15:55:28Z

Congrats, everyone! Phenomenal (and overdue) enhancement! You should consider submitting a talk on this to the next Kubeflow Summit in Europe.

add seaweedFS to KFP as default object store This change switches KFP's default objectstore deployment to Seaweedfs instead of Minio. Minio is still kept as an optional deployment to help users with migrating. CI is updated to accommodate testing for Minio and SeaweedFS. Multi-User testing is also introduced, which also includes namespace authorization testing in Seaweedfs. Some more work is needed to completely rid the KFP backend and frontend code of Minio specific code and labeling, which we accrue as tech debt as part of this change. Signed-off-by: juliusvonkohout <[email protected]> Signed-off-by: Julius von Kohout <[email protected]> Signed-off-by: Harshvir Potpose <[email protected]> Co-authored-by: Harshvir Potpose <[email protected]> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: pschoen-itsc <[email protected]>

google-oss-prow bot added the do-not-merge/work-in-progress label Jun 10, 2025

google-oss-prow bot requested review from hbelmiro and mprahl June 10, 2025 16:30

google-oss-prow bot added the size/XL label Jun 10, 2025

juliusvonkohout commented Jun 10, 2025

View reviewed changes

manifests/kustomize/third-party/seaweedfs/base/pipeline-profile-controller/sync.py Outdated Show resolved Hide resolved

juliusvonkohout commented Jun 10, 2025

View reviewed changes

manifests/kustomize/third-party/seaweedfs/base/pipeline-profile-controller/deployment.yaml Outdated Show resolved Hide resolved

juliusvonkohout commented Jun 10, 2025

View reviewed changes

manifests/kustomize/third-party/seaweedfs/base/seaweedfs/seaweedfs-create-admin-user-job.yaml Outdated Show resolved Hide resolved

juliusvonkohout commented Jun 10, 2025

View reviewed changes

manifests/kustomize/third-party/seaweedfs/base/seaweedfs/seaweedfs-deployment.yaml Outdated Show resolved Hide resolved

google-oss-prow bot added size/XXL and removed size/XL labels Jun 10, 2025

juliusvonkohout mentioned this pull request Jun 13, 2025

Migrate to gateway api kubeflow/manifests#3094

Draft

2 tasks

HumairAK added this to KFP Project Tracker Jun 17, 2025

HumairAK added this to the KFP 2.6.0 milestone Jun 17, 2025

HumairAK moved this to In Review in KFP Project Tracker Jun 17, 2025

akagami-harsh mentioned this pull request Jun 17, 2025

chore: Replace minio with seaweedfs #11987

Merged

4 tasks

github-advanced-security bot found potential problems Jul 3, 2025

View reviewed changes

.github/workflows/e2e-seaweedfs-test.yml Fixed Show fixed Hide fixed

juliusvonkohout marked this pull request as ready for review July 3, 2025 14:27

google-oss-prow bot removed the do-not-merge/work-in-progress label Jul 3, 2025

google-oss-prow bot requested review from droctothorpe and rimolive July 3, 2025 14:27

juliusvonkohout commented Jul 3, 2025

View reviewed changes

.github/resources/manifests/argo/overlays/proxy/proxy-env.yaml Outdated Show resolved Hide resolved

juliusvonkohout commented Jul 3, 2025

View reviewed changes

.github/resources/manifests/argo/overlays/no-proxy/workflow-disable-logs-patch.yaml Outdated Show resolved Hide resolved

juliusvonkohout commented Jul 3, 2025

View reviewed changes

manifests/kustomize/hack/test.sh Show resolved Hide resolved

HumairAK removed the request for review from rimolive July 9, 2025 16:57

kubeflow deleted a comment from google-oss-prow bot Aug 18, 2025

google-oss-prow bot assigned akagami-harsh Aug 18, 2025

google-oss-prow bot added the lgtm label Aug 18, 2025

juliusvonkohout requested a review from HumairAK August 18, 2025 14:36

akagami-harsh mentioned this pull request Aug 19, 2025

chore: Update seaweedfs pvc volume size #12156

Merged

chore: Update seaweedfs pvc volume size (#12156)

6d211e3

Signed-off-by: Harshvir Potpose <[email protected]>

google-oss-prow bot removed the lgtm label Aug 19, 2025

HumairAK merged commit 25af89c into master Aug 20, 2025
70 of 71 checks passed

HumairAK deleted the seaweedfs branch August 20, 2025 14:47

github-project-automation bot moved this from In Review to Done in KFP Project Tracker Aug 20, 2025

This was referenced Aug 20, 2025

[feature] Seek alternative for object store solution other than MinIO #7878

Closed

[Test] Backend integration tests for multi-user scenarios #3289

Closed

This was referenced Aug 20, 2025

support separate pipeline for each namespace #4197

Closed

[feature] Improved User Isolation in Kubeflow Pipelines #8406

Closed

[backend] ml-pipeline-visualizationserver and ml-pipeline-ui-artifact per user namespace resource allocation #9555

Closed

juliusvonkohout mentioned this pull request Aug 28, 2025

Fix seaweedfs flaky test #12175

Merged

akagami-harsh mentioned this pull request Sep 22, 2025

Blog: Add post on migrating from minio to seaweedfs in KFP kubeflow/blog#179

Draft

chore: Seaweedfs as Minio replacement #11965

chore: Seaweedfs as Minio replacement #11965

Uh oh!

Conversation

juliusvonkohout commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-oss-prow bot commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

juliusvonkohout commented Jun 13, 2025

Uh oh!

Uh oh!

juliusvonkohout commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

juliusvonkohout commented Jul 3, 2025

Uh oh!

droctothorpe commented Jul 8, 2025

Uh oh!

juliusvonkohout commented Jul 9, 2025

Uh oh!

juliusvonkohout commented Aug 18, 2025

Uh oh!

akagami-harsh commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akagami-harsh commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HumairAK commented Aug 19, 2025

Uh oh!

akagami-harsh commented Aug 19, 2025

Uh oh!

google-oss-prow bot commented Aug 19, 2025

Uh oh!

HumairAK commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juliusvonkohout commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

HumairAK commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

droctothorpe commented Aug 20, 2025

Uh oh!

Uh oh!

juliusvonkohout commented Jun 10, 2025 •

edited

Loading

juliusvonkohout commented Jul 3, 2025 •

edited

Loading

akagami-harsh commented Aug 18, 2025 •

edited

Loading

akagami-harsh commented Aug 18, 2025 •

edited

Loading

HumairAK commented Aug 20, 2025 •

edited

Loading

juliusvonkohout commented Aug 20, 2025 •

edited

Loading

HumairAK commented Aug 20, 2025 •

edited

Loading