[FEAT] Terraform Quickstart Tutorials for Google GKE #250

falconlee236 · 2025-03-09T06:10:13Z

Add complete guide for vLLM Production Stack deployment on GKE with Terraform

This PR adds comprehensive documentation for deploying a GPU-accelerated vLLM Production Stack on Google Kubernetes Engine (GKE) using Terraform. The implementation creates a production-ready infrastructure with specialized node pools for ML workloads and management services.

Key Features

Complete GKE Infrastructure: Configures a GKE cluster with regular release channel, comprehensive logging/monitoring, VPC-native networking, and managed Prometheus
Specialized Node Pools:
- GPU-accelerated nodes with NVIDIA L4 GPUs (G2-standard-8 instances)
- Cost-effective management nodes (E2-standard-4 instances)
vLLM Stack Deployment: Includes NVIDIA Device Plugin and vLLM with OpenAI-compatible API endpoints
Automated Deployment: Makefile with commands for easy deployment and cleanup
Comprehensive Testing: Includes instructions for testing model availability and running inference
Troubleshooting Guide: Detailed troubleshooting section with helpful kubectl commands

Project Structure

gke/
├── credentials.json
├── gke-infrastructure/
│   ├── backend.tf
│   ├── cluster.tf
│   ├── node_pools.tf
│   ├── outputs.tf
│   ├── providers.tf
│   └── variables.tf
├── Makefile
├── production-stack/
│   ├── backend.tf
│   ├── helm.tf
│   ├── production_stack_specification.yaml
│   ├── providers.tf
│   └── variables.tf
└── README.md

Testing Done

Successfully deployed the complete stack on GKE
Verified GPU detection and utilization
Tested model inference with the facebook/opt-125m model
Validated automatic scaling of the infrastructure

Considerations for Reviewers

This implementation requires increased GPU quota on GCP (explicit quota increase request required)
Cost management considerations are included to help users minimize expenses when not actively using the infrastructure

FIX #172

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE

Make sure the code changes pass the pre-commit checks.
Sign-off your commit by using -s when doing git commit
Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].

Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

Signed-off-by: falconlee236 <[email protected]>

falconlee236 · 2025-03-09T06:18:04Z

Hi @Hanchenli @YuhanLiu11

production-stack with GKE Terraform integration is finished just now.
Feel free to feedback this PR.

I am curious about this project and all about vllm's projects.
So I am going to add following Observability stack features after this PR Merged.

I have one more question about this project.
Q : How is the progress of this content going?

we plan to write a blog on Terraform in the upcoming month so a rough estimate would be very helpful!
we haven't decided what the specific blog content is. Do you want to get involved in the blog writing as well?

#172 (comment)
#172 (comment)

YuhanLiu11 · 2025-03-09T06:34:38Z

hey @falconlee236 Thanks so much for your contribution! This is awesome! will take a look soon.
@Hanchenli, can you update on the plans for the blog on Terraform?

Signed-off-by: falconlee236 <[email protected]>

falconlee236 · 2025-03-09T07:02:34Z

@YuhanLiu11
Add observability stack code in production-stack/helm.tf just now.

so

I am curious about this project and all about vllm's projects.
So I am going to add following Observability stack features after this PR Merged.

this comments is resolved

Hanchenli

Hi @falconlee236, thank you for submitting this PR. This is AWESOME!!! Could you add in the readme about some ways to customize the cluster (num of GPUs, GPU type, model type...)?

Hanchenli · 2025-03-09T18:55:09Z

tutorials/terraform/gke/README.md

+# This command provisions the GKE cluster, node pools, and deploys the vLLM stack in one step
+
+# Deploy just the GKE infrastructure
+make create-gke-infra


Could you also add in the Readme about choosing the GPUs I want to use in the beginning or the model I want to deploy?

@Hanchenli
Add <🎮 GPU and Model Selection> Section in READEME.md

and add gpu variable to custom settings in gke-infrastructure/variables.tf

Hanchenli · 2025-03-09T19:00:02Z

Hi @falconlee236, we are waiting on the AWS terraform tutorial to be finished. Do you want to talk about the blog some day? Are you in LMCache or vLLM channel? Can you send me your user name in the channel and I will send you a message to chat about it.

falconlee236 · 2025-03-09T23:01:14Z

Hi @falconlee236, we are waiting on the AWS terraform tutorial to be finished. Do you want to talk about the blog some day? Are you in LMCache or vLLM channel? Can you send me your user name in the channel and I will send you a message to chat about it.

Hi @Hanchenli
What's the behavioral difference between vLLM and LMCache channel?
I just send request to join vLLM channel

My username in requested channel is Sangyun Lee. THX

WIP) add settings about gpus and machine type of gcp until today.

Signed-off-by: falconlee236 <[email protected]>

…ginning or the model I want to deploy Signed-off-by: falconlee236 <[email protected]>

YuhanLiu11

LGTM! Thanks a lot for this great contribution!

falconlee236 added 2 commits March 9, 2025 14:01

complete guide with gke terraform

51088cc

Signed-off-by: falconlee236 <[email protected]>

finish lint and proudction-stack gke terrafom support

b3d3331

Signed-off-by: falconlee236 <[email protected]>

add observlity stack

a45b2b2

Signed-off-by: falconlee236 <[email protected]>

Hanchenli reviewed Mar 9, 2025

View reviewed changes

add variable with gpu node pool settings

69ece3a

Signed-off-by: falconlee236 <[email protected]>

falconlee236 changed the title ~~[FEAT]: Terraform Quickstart Tutorials for Google GKE~~ [FEAT] Terraform Quickstart Tutorials for Google GKE Mar 10, 2025

falconlee236 added 2 commits March 10, 2025 15:30

add gpu variable and machine variable to customize node resource

0f1cee8

Signed-off-by: falconlee236 <[email protected]>

add add in the Readme about choosing the GPUs I want to use in the be…

519f13d

…ginning or the model I want to deploy Signed-off-by: falconlee236 <[email protected]>

YuhanLiu11 approved these changes Mar 12, 2025

View reviewed changes

YuhanLiu11 merged commit 1cbf92f into vllm-project:main Mar 12, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEAT] Terraform Quickstart Tutorials for Google GKE #250

[FEAT] Terraform Quickstart Tutorials for Google GKE #250

Uh oh!

falconlee236 commented Mar 9, 2025

Uh oh!

falconlee236 commented Mar 9, 2025

Uh oh!

YuhanLiu11 commented Mar 9, 2025

Uh oh!

falconlee236 commented Mar 9, 2025 •

edited

Loading

Uh oh!

Hanchenli left a comment

Uh oh!

Hanchenli Mar 9, 2025

Uh oh!

falconlee236 Mar 10, 2025 •

edited

Loading

Uh oh!

Hanchenli commented Mar 9, 2025

Uh oh!

falconlee236 commented Mar 9, 2025 •

edited

Loading

Uh oh!

YuhanLiu11 left a comment

Uh oh!

Uh oh!

Uh oh!

[FEAT] Terraform Quickstart Tutorials for Google GKE #250

[FEAT] Terraform Quickstart Tutorials for Google GKE #250

Uh oh!

Conversation

falconlee236 commented Mar 9, 2025

Add complete guide for vLLM Production Stack deployment on GKE with Terraform

Key Features

Project Structure

Testing Done

Considerations for Reviewers

PR Title and Classification

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

Uh oh!

falconlee236 commented Mar 9, 2025

Uh oh!

YuhanLiu11 commented Mar 9, 2025

Uh oh!

falconlee236 commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hanchenli left a comment

Choose a reason for hiding this comment

Uh oh!

Hanchenli Mar 9, 2025

Choose a reason for hiding this comment

Uh oh!

falconlee236 Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Hanchenli commented Mar 9, 2025

Uh oh!

falconlee236 commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YuhanLiu11 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

falconlee236 commented Mar 9, 2025 •

edited

Loading

falconlee236 Mar 10, 2025 •

edited

Loading

falconlee236 commented Mar 9, 2025 •

edited

Loading