[KEP-5381]: blog for mutable pv nodeAffintiy alpha #53006

huww98 · 2025-10-30T03:40:43Z

Description

blog for mutable pv nodeAffintiy alpha

Issue

Closes: #

netlify · 2025-10-30T03:40:49Z

👷 Deploy Preview for kubernetes-io-vnext-staging processing.

Name	Link
🔨 Latest commit	`1ba9c20`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-io-vnext-staging/deploys/6902de3e86fce80008306fdf

netlify · 2025-10-30T03:50:20Z

✅ Pull request preview available for checking

Built without sensitive environment variables

Name	Link
🔨 Latest commit	`965f2d4`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-io-main-staging/deploys/6925330d82f623000812023b
😎 Deploy Preview	https://deploy-preview-53006--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

graz-dev · 2025-10-30T07:58:07Z

@huww98 thank you for opening this feature blog PR.
Feature blog PRs should be opened against the main branch, could you fix it please?

Thank you.

k8s-ci-robot · 2025-10-30T08:07:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nate-double-u for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

content/en/blog/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

graz-dev · 2025-11-17T09:24:10Z

Hi @huww98 👋 v1.35 Communications team here,

@yuanwang04 as author of #52895, I'd like you to be a writing buddy for @huww98 on this PR.

Please:

Review this PR, paying attention to the guidelines and review hints
Update your own PR based on any best practices you identify that should be applied
Remember to be compassionate with your fellow article author

graz-dev · 2025-11-18T21:48:58Z

Hi @huww98 👋 -- this is Graziano (@graz-dev) from the v1.35 Communications Team!

Just a friendly reminder that we are approaching the feature blog "ready for review" deadline: Friday 21st November. We ask you to have the blog PR in non-draft state, and all write-up to be complete, so that we can start the blog review from SIG Docs Blog team.

If you have any questions or need help, please don't hesitate to reach out to me or any of the Communications Team members. We are here to help you!

graz-dev · 2025-11-21T16:30:19Z

Sorry @huww98 the correct deadline for "Feature Blog Ready for Review" is Monday 24 November.
So you still have some days to finish the content and change the status of the PR.

Sorry, my bad :(

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

xing-yang · 2025-11-25T02:22:27Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+As another example, providers sometimes offer new generations of disks.
+New disks cannot always be attached to older nodes in the cluster.
+While this accessibility can also be expressed through PV node affinity and ensures the Pods can be scheduled to the right nodes,
+this can also prevent online disk upgrade.


How can this prevent online disk upgrade?

In fact, you can upgrade, but scheduler will not get that upgrade automatically. Update this to:

This accessibility can also be expressed through PV node affinity and ensures the Pods can be scheduled to the right nodes.
But when the disk is upgraded, new Pods using this disk can still be scheduled to older nodes.
To prevent this, you may want to change the PV node affinity from:

xing-yang · 2025-11-25T02:23:03Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+
+As another example, providers sometimes offer new generations of disks.
+New disks cannot always be attached to older nodes in the cluster.
+While this accessibility can also be expressed through PV node affinity and ensures the Pods can be scheduled to the right nodes,


So this involves detach and re-attach which will disrupt the workload.

Not necessary if the disk is already attached to a node that supports both gen1 and gen2.

xing-yang · 2025-11-25T02:25:43Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+Typically only administrators can edit PVs, please make sure you have the right RBAC permissions.
+
+Note that changing PV node affinity alone will not actually change the accessibility of the underlying volume.
+You must also update the underlying volume in the storage provider, and keep the node affinity in sync.


I think before asking people to try it out and edit PV nodeAffinity, you should explain what a storage vendor needs to do to support this feature.
Otherwise, some admin may try it manually and cause problems.

Storage provider needs to offer online updates that affects the accessibility of the volume.
If admins want to utilize those online update capabilities, they should use this feature.

Expanded "Try it out" section and hopes this can make it more clear.

xing-yang · 2025-11-25T02:30:21Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+One mitigation under discussion is to have the kubelet fail Pod startup if the PV’s node affinity is violated.
+This has not landed yet.
+So if you are trying this out now, please watch subsequent Pods that use the updated PV,
+and make sure they are scheduled onto nodes that can access the volume.


So these are for storage vendors who are interested in having their drivers support this feature. I think all of these should be under a heading that clarifies who is intended audience.

This is intended for admins who are willing to try this feature, to inform them the race condition. If someone try to update PV then start new pods in a script, it may not work as intended.

xing-yang · 2025-11-25T02:32:04Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+
+## Future Integration with CSI (Container Storage Interface)
+
+Currently, it is up to the cluster administrator to modify both PV's node affinity and the underlying volume in the storage provider.


What does a cluster admin need to do before making node affinity changes so that he/she won't run into problems?

I think this should be explained in the "Try it out" section.

xing-yang · 2025-11-25T02:33:24Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+
+As noted earlier, this is only a first step.
+
+If you are a Kubernetes user,


What kind of user has access to PV node affinity? It should be cluster admin, not any user.

Yes, currently. But after integration with CSI, unprivileged users should be able to trigger an update with VAC. So I'd like to here from all users.

yuanwang04

Thanks for introducing this useful feature and the blog; overall LGTM, left some clarification comments.

yuanwang04 · 2025-11-25T13:17:20Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+It is fine to allow more nodes to access the volume by relaxing node affinity.
+But there is a race condition when you try to tighten node affinity:
+We don't know how scheduler will see our modified PV in its cache,
+so there is a small window where the scheduler may place a Pod on an old node that can no longer access the volume.


What would happen if this happens? Would the PV failed to be bind to the node / pod?

yuanwang04 · 2025-11-25T13:18:17Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+This has not landed yet.
+So if you are trying this out now, please watch subsequent Pods that use the updated PV,
+and make sure they are scheduled onto nodes that can access the volume.
+If you update PV then immediately start new Pods in a script, it may not work as intended.


Is there an estimated time window to wait before Pod can be scheduled correctly?

yuanwang04 · 2025-11-25T13:35:42Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+dates back to Kubernetes v1.10.
+It is widely used to express that volumes may not be equally accessible by all nodes in the cluster.
+This field was previously immutable,
+we are now making it mutable in Kubernetes v1.35 (alpha), Opening a door to more flexible online volume management.


nit: avoid we

Suggested change

we are now making it mutable in Kubernetes v1.35 (alpha), Opening a door to more flexible online volume management.

and it is now mutable in Kubernetes v1.35 (alpha). This change opens a door to more flexible online volume management.

yuanwang04 · 2025-11-25T13:39:14Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+          - available
+```
+
+So, we are making it mutable now, a first step towards a more flexible online volume management.


nit: avoid we

Suggested change

So, we are making it mutable now, a first step towards a more flexible online volume management.

So, it is mutable now, a first step towards a more flexible online volume management.

yuanwang04 · 2025-11-25T13:40:17Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+There are only a few things out of Pod that can affects the scheduling decision. PV node affinity is one of them.
+It is fine to allow more nodes to access the volume by relaxing node affinity.
+But there is a race condition when you try to tighten node affinity:
+We don't know how scheduler will see our modified PV in its cache,


nit: avoid we

Suggested change

We don't know how scheduler will see our modified PV in its cache,

It is unclear how the Scheduler will see the modified PV in its cache,

yuanwang04 · 2025-11-25T13:41:54Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+
+Currently, it is up to the cluster administrator to modify both PV's node affinity and the underlying volume in the storage provider.
+But manual operations are error-prone and time-consuming.
+We would like to eventually integrate this with VolumeAttributesClass,


nit: avoid we

Suggested change

We would like to eventually integrate this with VolumeAttributesClass,

It is preferred to eventually integrate this with VolumeAttributesClass,

yuanwang04 · 2025-11-25T13:42:20Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+But manual operations are error-prone and time-consuming.
+We would like to eventually integrate this with VolumeAttributesClass,
+so that an unprivileged user can modify their PersistentVolumeClaim (PVC) to trigger storage-side updates,
+and PV node affinity is updated automatically when approprate, without the need for cluster admin's intervention.


nit: typo

Suggested change

and PV node affinity is updated automatically when approprate, without the need for cluster admin's intervention.

and PV node affinity is updated automatically when appropriate, without the need for cluster admin's intervention.

Serenity611

Reviewed for general proofreading, made a few suggestions for readability and alignment with the style guide :) Thanks!

Serenity611 · 2025-11-25T19:08:02Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+draft: true
+slug: kubernetes-v1-35-mutable-pv-nodeaffinity
+author: >
+  Weiwen Hu (Alibaba Cloud)


Suggested change

Weiwen Hu (Alibaba Cloud)

Weiwen Hu (Alibaba Cloud),

Serenity611 · 2025-11-25T19:12:04Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+This field was previously immutable,
+we are now making it mutable in Kubernetes v1.35 (alpha), Opening a door to more flexible online volume management.
+
+## Why Making Node Affinity Mutable?


Suggested change

## Why Making Node Affinity Mutable?

## Why make node affinity mutable?

Serenity611 · 2025-11-25T19:20:17Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+Note that changing PV node affinity alone will not actually change the accessibility of the underlying volume.
+So before using this feature,
+You must update the underlying volume in the storage provider first,
+and understand which nodes can access the volume after the update.
+Then you can enable this feature and keep the PV node affinity in sync.


Suggested change

Note that changing PV node affinity alone will not actually change the accessibility of the underlying volume.

So before using this feature,

You must update the underlying volume in the storage provider first,

and understand which nodes can access the volume after the update.

Then you can enable this feature and keep the PV node affinity in sync.

Note that changing PV node affinity alone will not actually change the accessibility of the underlying volume.

Before using this feature,

you must first update the underlying volume in the storage provider

and understand which nodes can access the volume after the update.

You can then enable this feature and keep the PV node affinity in sync.

Serenity611 · 2025-11-25T19:34:30Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+
+Currently, this feature is in alpha state.
+It is disabled by default, and may subject to change.
+To try it out, enable `MutablePVNodeAffinity` feature gate on APIServer, then you can edit PV spec.nodeAffinity field.


Suggested change

To try it out, enable `MutablePVNodeAffinity` feature gate on APIServer, then you can edit PV spec.nodeAffinity field.

To try it out, enable the `MutablePVNodeAffinity` feature gate on APIServer, then you can edit the PV `spec.nodeAffinity` field.

Serenity611 · 2025-11-25T19:35:39Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+To try it out, enable `MutablePVNodeAffinity` feature gate on APIServer, then you can edit PV spec.nodeAffinity field.
+Typically only administrators can edit PVs, please make sure you have the right RBAC permissions.
+
+### Race Condition between Updating and Scheduling


Suggested change

### Race Condition between Updating and Scheduling

### Race condition between updating and scheduling

Serenity611 · 2025-11-25T19:36:21Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+
+### Race Condition between Updating and Scheduling
+
+There are only a few things out of Pod that can affects the scheduling decision. PV node affinity is one of them.


Suggested change

There are only a few things out of Pod that can affects the scheduling decision. PV node affinity is one of them.

There are only a few things out of Pod that can affect the scheduling decision, and PV node affinity is one of them.

Serenity611 · 2025-11-25T19:37:21Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+It is fine to allow more nodes to access the volume by relaxing node affinity.
+But there is a race condition when you try to tighten node affinity:


Suggested change

It is fine to allow more nodes to access the volume by relaxing node affinity.

But there is a race condition when you try to tighten node affinity:

It is fine to allow more nodes to access the volume by relaxing node affinity,

but there is a race condition when you try to tighten node affinity:

Serenity611 · 2025-11-25T19:38:05Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+We don't know how scheduler will see our modified PV in its cache,
+so there is a small window where the scheduler may place a Pod on an old node that can no longer access the volume.
+
+One mitigation under discussion is to have the kubelet fail Pod startup if the PV’s node affinity is violated.


Suggested change

One mitigation under discussion is to have the kubelet fail Pod startup if the PV’s node affinity is violated.

One mitigation under discussion is to have the `kubelet` fail Pod startup if the PV’s node affinity is violated.

Serenity611 · 2025-11-25T19:38:38Z

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md

+and make sure they are scheduled onto nodes that can access the volume.
+If you update PV then immediately start new Pods in a script, it may not work as intended.
+
+## Future Integration with CSI (Container Storage Interface)


Suggested change

## Future Integration with CSI (Container Storage Interface)

## Future integration with CSI (Container Storage Interface)

gnufied · 2025-12-03T18:21:19Z

/assign

k8s-ci-robot added this to the 1.35 milestone Oct 30, 2025

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 30, 2025

k8s-ci-robot requested a review from graz-dev October 30, 2025 03:40

k8s-ci-robot added the area/blog Issues or PRs related to the Kubernetes Blog subproject label Oct 30, 2025

k8s-ci-robot requested a review from mrbobbytables October 30, 2025 03:40

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 30, 2025

huww98 mentioned this pull request Oct 30, 2025

Mutable PersistentVolume Node Affinity kubernetes/enhancements#5381

Open

4 tasks

huww98 changed the base branch from dev-1.35 to main October 30, 2025 08:06

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 30, 2025

huww98 force-pushed the blog-5381 branch from 1ba9c20 to 79154ba Compare October 30, 2025 08:07

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 30, 2025

graz-dev mentioned this pull request Nov 17, 2025

Blog post for restart all containers #52895

Merged

huww98 force-pushed the blog-5381 branch from 79154ba to 0f73f75 Compare November 24, 2025 13:47

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 24, 2025

huww98 marked this pull request as ready for review November 24, 2025 13:49

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 24, 2025

k8s-ci-robot requested a review from Gauravpadam November 24, 2025 13:49

huww98 force-pushed the blog-5381 branch from 0f73f75 to 852ccc3 Compare November 24, 2025 14:46

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 24, 2025

[KEP-5381]: blog for mutable pv nodeAffintiy alpha

9d0cdec

huww98 force-pushed the blog-5381 branch from 852ccc3 to 9d0cdec Compare November 24, 2025 15:11

iltyty reviewed Nov 25, 2025

View reviewed changes

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md Outdated Show resolved Hide resolved

iltyty reviewed Nov 25, 2025

View reviewed changes

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md Outdated Show resolved Hide resolved

iltyty reviewed Nov 25, 2025

View reviewed changes

content/en/blog/_posts/2025-XX-XX-mutable-pv-affinity-alpha/index.md Outdated Show resolved Hide resolved

xing-yang reviewed Nov 25, 2025

View reviewed changes

address comments

965f2d4

yuanwang04 reviewed Nov 25, 2025

View reviewed changes

Serenity611 reviewed Nov 25, 2025

View reviewed changes

k8s-ci-robot assigned gnufied Dec 3, 2025


		## Future Integration with CSI (Container Storage Interface)

		Currently, it is up to the cluster administrator to modify both PV's node affinity and the underlying volume in the storage provider.


		As noted earlier, this is only a first step.

		If you are a Kubernetes user,

	we are now making it mutable in Kubernetes v1.35 (alpha), Opening a door to more flexible online volume management.
	and it is now mutable in Kubernetes v1.35 (alpha). This change opens a door to more flexible online volume management.

	So, we are making it mutable now, a first step towards a more flexible online volume management.
	So, it is mutable now, a first step towards a more flexible online volume management.

	We don't know how scheduler will see our modified PV in its cache,
	It is unclear how the Scheduler will see the modified PV in its cache,

	We would like to eventually integrate this with VolumeAttributesClass,
	It is preferred to eventually integrate this with VolumeAttributesClass,

	and PV node affinity is updated automatically when approprate, without the need for cluster admin's intervention.
	and PV node affinity is updated automatically when appropriate, without the need for cluster admin's intervention.

	## Why Making Node Affinity Mutable?
	## Why make node affinity mutable?

	To try it out, enable `MutablePVNodeAffinity` feature gate on APIServer, then you can edit PV spec.nodeAffinity field.
	To try it out, enable the `MutablePVNodeAffinity` feature gate on APIServer, then you can edit the PV `spec.nodeAffinity` field.

	### Race Condition between Updating and Scheduling
	### Race condition between updating and scheduling


		### Race Condition between Updating and Scheduling

		There are only a few things out of Pod that can affects the scheduling decision. PV node affinity is one of them.

		It is fine to allow more nodes to access the volume by relaxing node affinity.
		But there is a race condition when you try to tighten node affinity:

	One mitigation under discussion is to have the kubelet fail Pod startup if the PV’s node affinity is violated.
	One mitigation under discussion is to have the `kubelet` fail Pod startup if the PV’s node affinity is violated.

[KEP-5381]: blog for mutable pv nodeAffintiy alpha #53006

Are you sure you want to change the base?

[KEP-5381]: blog for mutable pv nodeAffintiy alpha #53006

Uh oh!

Conversation

huww98 commented Oct 30, 2025

Description

Issue

Uh oh!

netlify bot commented Oct 30, 2025

👷 Deploy Preview for kubernetes-io-vnext-staging processing.

Uh oh!

netlify bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Pull request preview available for checking

Uh oh!

graz-dev commented Oct 30, 2025

Uh oh!

k8s-ci-robot commented Oct 30, 2025

Uh oh!

graz-dev commented Nov 17, 2025

Uh oh!

graz-dev commented Nov 18, 2025

Uh oh!

graz-dev commented Nov 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuanwang04 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Serenity611 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

netlify bot commented Oct 30, 2025 •

edited

Loading