How to Dynamically Adjust Resource Allocations for Suspended Kubernetes Jobs (v1.36 Beta)

By ✦ min read

Introduction

Kubernetes v1.36 introduces a powerful enhancement for batch and machine learning workloads: the ability to modify container resource requests and limits in the pod template of a suspended Job. Now in beta (first introduced as alpha in v1.35), this feature lets queue controllers and administrators fine-tune CPU, memory, GPU, and extended resource specifications on a Job while it's suspended, before it starts or resumes running. This means you can adapt resource allocations without deleting and recreating the Job, preserving all metadata and status.

How to Dynamically Adjust Resource Allocations for Suspended Kubernetes Jobs (v1.36 Beta)

In this step-by-step guide, you'll learn how to leverage this feature to dynamically adjust resources for suspended Jobs, ensuring efficient cluster utilization and smoother operation of resource‑intensive workloads.

What You Need

Step-by-Step Guide

Step 1: Verify the Feature is Enabled

In Kubernetes v1.36, this feature is beta, so it's enabled by default. To confirm, run:

kubectl api-versions | grep batch/v1

If you're on v1.35, you may need to enable the JobMutablePodTemplate feature gate. In v1.36, no manual action is required.

Step 2: Create a Suspended Job

Define a Job manifest with the spec.suspend: true field. This suspends the Job immediately after creation, allowing you to modify its resources before any Pods are launched. Below is an example of a machine learning training Job requesting 4 GPUs:

apiVersion: batch/v1
kind: Job
metadata:
  name: ml-training-suspended
spec:
  suspend: true
  template:
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:latest
        resources:
          requests:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
          limits:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
      restartPolicy: Never

Apply it with kubectl apply -f job-suspended.yaml.

Step 3: Modify Resource Requests/Limits While Suspended

Once the Job is created and in a suspended state, you can update its pod template's resources. Use kubectl edit or kubectl patch. For example, to reduce GPU count from 4 to 2 and adjust CPU/memory:

kubectl patch job ml-training-suspended --type='json' -p='[
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/example-hardware-vendor.com~1gpu", "value": "2"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/example-hardware-vendor.com~1gpu", "value": "2"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "4"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "16Gi"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "4"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value": "16Gi"}
]'

Note: The tilde (~1) in the GPU field escapes the slash in the resource name. Ensure the new values are valid (e.g., non‑negative, within cluster capacity).

Step 4: Resume the Job

After adjusting resources, unsuspend the Job by setting spec.suspend to false:

kubectl patch job ml-training-suspended -p '{"spec":{"suspend":false}}'

The Job will start creating Pods with the updated resource specifications. You can monitor progress with kubectl get pods -w.

Step 5: Verify Resource Allocation

Check that the running Pods reflect the new resources:

kubectl get pod ml-training-suspended-xxxxx -o jsonpath='{.spec.containers[0].resources}'

You should see the adjusted requests and limits. If a queue controller is managing the Job, it can also perform these updates automatically.

Tips and Best Practices

This feature dramatically improves flexibility for batch and ML workloads, letting you adapt to changing cluster conditions without disruption. Embrace it to make your Kubernetes environment more resilient and efficient.

Tags:

Recommended

Discover More

Trump Mobile T1 Phone: Long-Awaited Shipment Finally Arrives This WeekSignal Privacy Guide: Everything You Need to KnowFitbit Air First Look: Why This Screenless Trailer Is Already Winning FansRust Secures 13 Google Summer of Code 2026 Slots Amid Record Proposal SurgeHashiCorp Vault Introduces Purpose-Built Security Controls for AI Agents