-
Notifications
You must be signed in to change notification settings - Fork 5
set up for new gpu class #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
55c27ca
to
45a2bde
Compare
This is a good start! Left some inline comments. In addition, could you rename the |
gpu-class/resource.yaml
Outdated
annotations: | ||
notebooks.opendatahub.io/inject-oauth: 'true' | ||
notebooks.opendatahub.io/last-image-selection: ${IMAGE_NAME} | ||
notebooks.opendatahub.io/last-size-selection: X Small |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be Small
gpu-class/resource.yaml
Outdated
preferredDuringSchedulingIgnoredDuringExecution: | ||
- preference: | ||
matchExpressions: | ||
- key: nvidia.com/gpu.present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need gpu in dev notebook.
gpu-class/resource.yaml
Outdated
- Tesla-V100-PCIE-32GB | ||
weight: 1 | ||
containers: | ||
- resources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resources for Small
container are: Limits: 2 CPU, 8GiB Memory; Requests: 1 CPU, 8GiB Memory
gpu-class/rb.yaml
Outdated
name: ${SERVICE_ACCOUNT} | ||
namespace: ${NAMESPACE} | ||
|
||
# CREATE CLUSTERQUEUE ROLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cluster role is a static role. You'll just need to create once rather than create each clusterrole per namespace.
gpu-class/gpu-class-setup.sh
Outdated
namespace=$1 | ||
|
||
#get student sername from namespace | ||
username=$(echo "$ns" | awk -F'-' '{print $2}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be possible that TAs and instructors also are added to student namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be $namespace instead of $ns?
d3009e9
to
9f7fcff
Compare
gpu-class/rb.yaml
Outdated
- apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: RoleBinding | ||
metadata: | ||
name: edit-jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be better to name the RB as ${SERVICE_ACCOUNT_NB}-edit-jobs
, in case when a new RB is created, it will raise an error indicating RB name already exists
gpu-class/rb.yaml
Outdated
roleRef: | ||
apiGroup: rbac.authorization.k8s.io | ||
kind: Role | ||
name: default-kueue-localqueue-reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename it to ${SERVICE_ACCOUNT_NB}-kueue-localqueue-reader
…int to clusterqueues, and observability for jobs through rolebinding
setting up for new gpu class by creating notebooks for students to use, setting up localqueue so they can submit gpu jobs, and setting up rolebindings so students can see what nodes are using gpus, what jobs are running, etc