Kubernetes v1.36 Beta: Dynamically Adjusting Pod Resources for Suspended Jobs

By ✦ min read

Introduction

Kubernetes v1.36 introduces a powerful beta feature that allows modifying container resource requests and limits in the pod template of a suspended Job. This capability, initially released as alpha in v1.35, empowers queue controllers and cluster administrators to fine-tune CPU, memory, GPU, and extended resource specifications on a Job while it remains suspended—before it starts or resumes execution. By eliminating the need to recreate Jobs for resource adjustments, this feature greatly improves operational flexibility in dynamic cluster environments.

Kubernetes v1.36 Beta: Dynamically Adjusting Pod Resources for Suspended Jobs

Why Mutable Pod Resources Matter

Batch and machine learning workloads often face uncertain resource requirements at Job creation time. The optimal allocation depends on real-time cluster capacity, queue priorities, and the availability of specialized hardware such as GPUs. Previously, once a Job's pod template resource fields were set, they became immutable—any change required deleting and recreating the entire Job, which caused loss of metadata, status, and history. For queue controllers like Kueue, this was a significant limitation.

With the new beta feature, queue controllers can now:

Consider a machine learning training Job initially requesting 4 GPUs:

apiVersion: batch/v1
kind: Job
metadata:
  name: training-job-example-abcd123
  labels:
    app.kubernetes.io/name: trainer
spec:
  suspend: true
  template:
    metadata:
      annotations:
        kubernetes.io/description: "ML training, ID abcd123"
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:2026-04-23T150405.678
        resources:
          requests:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
          limits:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
      restartPolicy: Never

A queue controller evaluating cluster capacity might discover only 2 GPUs available. With this feature, it can update the Job’s resource requests before resuming:

apiVersion: batch/v1
kind: Job
metadata:
  name: training-job-example-abcd123
  labels:
    app.kubernetes.io/name: trainer
spec:
  suspend: true
  template:
    metadata:
      annotations:
        kubernetes.io/description: "ML training, ID abcd123"
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:2026-04-23T150405.678
        resources:
          requests:
            cpu: "4"
            memory: "16Gi"
            example-hardware-vendor.com/gpu: "2"
          limits:
            cpu: "4"
            memory: "16Gi"
            example-hardware-vendor.com/gpu: "2"
      restartPolicy: Never

Once updated, the controller resumes the Job by setting spec.suspend to false, and new Pods are created with the adjusted resource specifications.

How It Works

Under the hood, the Kubernetes API server relaxes the immutability constraint on pod template resource fields—but only for suspended Jobs. No new API types were introduced; instead, the existing Job and pod template structures accommodate this change through a targeted relaxation of validation logic.

Implementation Details

When a Job is suspended (spec.suspend: true), the API server now allows updates to spec.template.spec.containers[*].resources.requests and limits. These modifications are applied before the Job resumes, ensuring that newly created Pods use the updated resource profile. The feature is enabled by default in v1.36 due to its beta status, making it available without any special feature gate.

Practical Benefits

This enhancement is particularly valuable for batch processing, ML training pipelines, and any environment where resource demands fluctuate. For more details, refer to the Kubernetes Job documentation.

Conclusion

The mutable pod resources feature for suspended Jobs in Kubernetes v1.36 (beta) marks a significant improvement in workload management. By enabling dynamic resource adjustments without Job recreation, it reduces operational overhead and increases cluster efficiency. Operators and developers using batch or ML workloads should evaluate this capability to simplify their resource orchestration strategies.

Tags:

Recommended

Discover More

Critical cPanel and WHM Vulnerabilities: 3 Urgent Patches You Must Apply10 Key Updates in the Python VS Code Extension – March 2026 Release6 Key Insights from BNEF’s New Energy Outlook: Solar Dominance and Battery BoomMars Odyssey Celebrates 25 Years with Stunning Global Map of the Red PlanetCo-op Casino Chaos: Gamble With Your Friends Reaches 1 Million Sales in a Week