Skip to content

Conversation

memalhot
Copy link
Contributor

setting up for new gpu class by creating notebooks for students to use, setting up localqueue so they can submit gpu jobs, and setting up rolebindings so students can see what nodes are using gpus, what jobs are running, etc

@memalhot memalhot force-pushed the gpu_class branch 4 times, most recently from 55c27ca to 45a2bde Compare August 20, 2025 13:47
@DanNiESh
Copy link
Collaborator

This is a good start! Left some inline comments. In addition, could you rename the resource.yaml to a more specific name such as notebook_resource.yaml?
I think you can also create a separate file for cluster role kueue-clusterqueue-reader because it's static role that just needs to be created once and all rbs point to the same role.

annotations:
notebooks.opendatahub.io/inject-oauth: 'true'
notebooks.opendatahub.io/last-image-selection: ${IMAGE_NAME}
notebooks.opendatahub.io/last-size-selection: X Small
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be Small

preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: nvidia.com/gpu.present
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need gpu in dev notebook.

- Tesla-V100-PCIE-32GB
weight: 1
containers:
- resources:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resources for Small container are: Limits: 2 CPU, 8GiB Memory; Requests: 1 CPU, 8GiB Memory

name: ${SERVICE_ACCOUNT}
namespace: ${NAMESPACE}

# CREATE CLUSTERQUEUE ROLE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cluster role is a static role. You'll just need to create once rather than create each clusterrole per namespace.

namespace=$1

#get student sername from namespace
username=$(echo "$ns" | awk -F'-' '{print $2}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be possible that TAs and instructors also are added to student namespace.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be $namespace instead of $ns?

@memalhot memalhot force-pushed the gpu_class branch 3 times, most recently from d3009e9 to 9f7fcff Compare August 26, 2025 13:05
- apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: edit-jobs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be better to name the RB as ${SERVICE_ACCOUNT_NB}-edit-jobs, in case when a new RB is created, it will raise an error indicating RB name already exists

roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: default-kueue-localqueue-reader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename it to ${SERVICE_ACCOUNT_NB}-kueue-localqueue-reader

…int to clusterqueues, and observability for jobs through rolebinding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants