Skip to content

Conversation

deepanker13
Copy link
Contributor

What this PR does / why we need it:

  1. This pr contains the train api function which will be called by the user to run the training job.
  2. Constants have been added to access them at multiple places.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Partially Fixes #1945

Checklist:

  • Docs included if any changes are user facing

@coveralls
Copy link

coveralls commented Dec 11, 2023

Pull Request Test Coverage Report for Build 7477774579

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.01%) to 42.896%

Totals Coverage Status
Change from base Build 7477136161: 0.01%
Covered Lines: 3756
Relevant Lines: 8756

💛 - Coveralls

@deepanker13
Copy link
Contributor Author

/hold depends on #1959 and #1958

@deepanker13 deepanker13 force-pushed the train_api branch 2 times, most recently from ff86df6 to 8cbe61e Compare December 13, 2023 11:28
@deepanker13
Copy link
Contributor Author

/hold cancel

@andreyvelich andreyvelich changed the title Train api [SDK] Train API Dec 14, 2023
@andreyvelich
Copy link
Member

/assign @andreyvelich
@deepanker13 Please can you rebase this PR ?

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @deepanker13 !

@deepanker13
Copy link
Contributor Author

/assign @andreyvelich
@deepanker13 Please can you rebase this PR ?
done

@deepanker13 deepanker13 force-pushed the train_api branch 2 times, most recently from a64b243 to e4df199 Compare December 21, 2023 18:59
@deepanker13 deepanker13 force-pushed the train_api branch 2 times, most recently from 8493b38 to cd811cd Compare January 4, 2024 21:35
@deepanker13
Copy link
Contributor Author

@andreyvelich I have a reason to keep the download dir field, as there will be a single place where we define the default value and that same value will be passed through the code flow, else we will have to verify the values are same or not through the entire code flow.

@@ -1,40 +1,51 @@
from abstract_model_provider import modelProvider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name the directory storage_initiailizer rather than storage_init_container to be consistent with init container name ?
WDYT @deepanker13 ?

@andreyvelich
Copy link
Member

@andreyvelich I have a reason to keep the download dir field, as there will be a single place where we define the default value and that same value will be passed through the code flow, else we will have to verify the values are same or not through the entire code flow.

@deepanker13 Can you just have 3 constant variables in storage_initializer/constants.py:

INIT_CONTAINER_MOUNT_PATH = "/workspace"
VOLUME_PATH_DATASET = INIT_CONTAINER_MOUNT_PATH + "/dataset"
VOLUME_PATH_MODEL = INIT_CONTAINER_MOUNT_PATH + "/model"

And then just re-use these constants in SDK and storage_initializer.
Does it work for you @deepanker13 ?

@google-oss-prow google-oss-prow bot added size/XL and removed size/L labels Jan 10, 2024
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we are ready to merge this PR.
Thanks again @deepanker13 for all of this work!
/lgtm
/assign @johnugeorge @tenzen-y for the final review.

@google-oss-prow google-oss-prow bot added lgtm and removed lgtm labels Jan 10, 2024
@deepanker13
Copy link
Contributor Author

I think, we are ready to merge this PR.
Thanks again @deepanker13 for all of this work!
/lgtm
/assign @johnugeorge @tenzen-y for the final review.

@andreyvelich @johnugeorge @tenzen-y thanks for all the help

@johnugeorge
Copy link
Member

Awesome work @deepanker13
/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deepanker13, johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 006dda4 into kubeflow:master Jan 10, 2024
szaher pushed a commit to szaher/sdk that referenced this pull request Jun 4, 2025
* adding constant for init container name

* github workflow fixes

* removing constants file changes from this pr

* code review changes

* initial skeleton of train api

* train api updated

* fixes

* code review changes

* code review changes

* code review changes

* code review changes

* fixing python library requirements

* adding hugging face dataset download class

* code review changes

* fixing github workflow

* code review comments

* import fixes

* integration test fix for python3.7

* torch version fix for python3.7

* removing unused variable

* fixing library versions for python3.7

* removing alpine distribution

* removing torch ad dependency

* removing literal usage as python 3.7 doesn't support it

* adding types.py

* ci fix

* storage init container changes, fixing imports

* adding extra requires in setup.py, fixinf ci

* adding commit to retrigger go test

* renaming folder to storage initalizer

* bug fix

* removing extra gpu check as discussed with johnu

* retriggering ci
szaher pushed a commit to szaher/sdk that referenced this pull request Jun 4, 2025
* adding constant for init container name

* github workflow fixes

* removing constants file changes from this pr

* code review changes

* initial skeleton of train api

* train api updated

* fixes

* code review changes

* code review changes

* code review changes

* code review changes

* fixing python library requirements

* adding hugging face dataset download class

* code review changes

* fixing github workflow

* code review comments

* import fixes

* integration test fix for python3.7

* torch version fix for python3.7

* removing unused variable

* fixing library versions for python3.7

* removing alpine distribution

* removing torch ad dependency

* removing literal usage as python 3.7 doesn't support it

* adding types.py

* ci fix

* storage init container changes, fixing imports

* adding extra requires in setup.py, fixinf ci

* adding commit to retrigger go test

* renaming folder to storage initalizer

* bug fix

* removing extra gpu check as discussed with johnu

* retriggering ci
szaher pushed a commit to szaher/sdk that referenced this pull request Jun 4, 2025
* adding constant for init container name

* github workflow fixes

* removing constants file changes from this pr

* code review changes

* initial skeleton of train api

* train api updated

* fixes

* code review changes

* code review changes

* code review changes

* code review changes

* fixing python library requirements

* adding hugging face dataset download class

* code review changes

* fixing github workflow

* code review comments

* import fixes

* integration test fix for python3.7

* torch version fix for python3.7

* removing unused variable

* fixing library versions for python3.7

* removing alpine distribution

* removing torch ad dependency

* removing literal usage as python 3.7 doesn't support it

* adding types.py

* ci fix

* storage init container changes, fixing imports

* adding extra requires in setup.py, fixinf ci

* adding commit to retrigger go test

* renaming folder to storage initalizer

* bug fix

* removing extra gpu check as discussed with johnu

* retriggering ci
szaher pushed a commit to szaher/sdk that referenced this pull request Jun 5, 2025
* adding constant for init container name

* github workflow fixes

* removing constants file changes from this pr

* code review changes

* initial skeleton of train api

* train api updated

* fixes

* code review changes

* code review changes

* code review changes

* code review changes

* fixing python library requirements

* adding hugging face dataset download class

* code review changes

* fixing github workflow

* code review comments

* import fixes

* integration test fix for python3.7

* torch version fix for python3.7

* removing unused variable

* fixing library versions for python3.7

* removing alpine distribution

* removing torch ad dependency

* removing literal usage as python 3.7 doesn't support it

* adding types.py

* ci fix

* storage init container changes, fixing imports

* adding extra requires in setup.py, fixinf ci

* adding commit to retrigger go test

* renaming folder to storage initalizer

* bug fix

* removing extra gpu check as discussed with johnu

* retriggering ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants