-
Notifications
You must be signed in to change notification settings - Fork 4k
xds: Implement proactive connection in RingHashLoadBalancer per gRFC A61 #12596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds: Implement proactive connection in RingHashLoadBalancer per gRFC A61 #12596
Conversation
Implement proactive connection logic in RingHashLoadBalancer as outlined in gRFC A61. This address the missing logic where the balancer should initialize the first IDLE child when a child balancer reports TRANSIENT_FAILURE and no other children are connecting. This behavior, which was previously present before grpc#10610, ensures that a backup subchannel starts connecting immediately outside of the picker flow, reducing failover latency. Fixes grpc#12024
Update existing unit tests and add new test cases to validate the proactive connection behavior. Verification counts in several tests have been adjusted to reflect that the balancer now initiates connections immediately upon subchannel failure.
|
Thank you for the detailed review.
Please let me know if there are any further adjustments needed. |
ce9ea98 to
f5a4934
Compare
|
I merged the latest master branch expecting the CI tests to pass, but unfortunately, they are still failing. |
|
Thank you! |
|
Thank you for the detailed explanation regarding the determinism of the Ring Hash test. I now clearly understand how voidHash() ensures the test remains reliable despite the underlying complexity. I also appreciate you re-triggering the CI and merging the PR. Thanks for your support and for the great learning opportunity. |
Description
This PR implements the proactive connection logic in
RingHashLoadBalanceras outlined in gRFC A61.Previously, the Java implementation only initialized child balancers when a ring-chosen endpoint was in
TRANSIENT_FAILUREduring a picker'spickSubchannelcall. This PR adds the missing logic: when a child balancer reportsTRANSIENT_FAILURE, the LoadBalancer now proactively initializes the first availableIDLEchild if no other children are currently connecting or ready.This ensures a backup subchannel starts warming up immediately outside the RPC flow, reducing failover latency and improving overall resilience. This behavior was previously present but was inadvertently lost after #10610.
Changes
maybeTriggerIdleChildConnection()toRingHashLoadBalancerto trigger proactive connections.requestConnectionverification counts).READYsubchannel already exists.Fixes #12024