fix: close response body in ParseErr to prevent goroutine leak#804
Merged
Conversation
checkRespWithErrType returns (nil, ParseErr(bodyType, resp, errType)) on every 4xx / 5xx response, which means the caller never gets the response and therefore cannot close its Body. ParseErr passes the response on to decodeBody, which io.ReadAll's the body but does not close it either. As a result every failed VCD API call leaves one net/http.setRequestCancel.func4 goroutine alive forever, holding the underlying TCP connection. Observed in a Kubernetes operator that retries failed VCD calls in a reconcile loop: the controller pod accumulated ~28 goroutines per second of error rate (12 651 in net/http.setRequestCancel.func4 in one snapshot, ~200 MB RSS) until it OOMed. Fix it at ParseErr — the last point that has access to the response on the error path — by deferring resp.Body.Close() before decodeBody is invoked. Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
executeJsonRequest reads the body with io.ReadAll on every non-2xx
response and returns (resp, err) to the caller. All callers follow the
pattern
resp, err := client.executeJsonRequest(...)
if err != nil {
return nil, err
}
defer closeBody(resp)
so on the error path the deferred close never runs and resp.Body stays
open, leaking one net/http.setRequestCancel.func4 goroutine per failed
call. This is the same class of leak as vmware#803/vmware#804 in ParseErr but on a
different code path (OpenAPI / JSON requests).
Close the body explicitly after the read.
Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
rewrapRespBodyNoopCloser reads the response body into a byte slice and then replaces resp.Body with an io.NopCloser. After that any subsequent resp.Body.Close() in the caller is a no-op, so the underlying TCP connection is never released and the matching net/http.setRequestCancel goroutine waits forever. Three call-sites are affected (api_vcd.go, entity_client.go, openapi.go), so under any sustained workload that goes through OpenAPI or the entity client the operator on top of this library leaks one goroutine per request. Same class of leak as vmware#803/vmware#804 in ParseErr and in executeJsonRequest, just on a third code path. Close the original Body before replacing it with the NopCloser. Signed-off-by: Kirill Ilin <stitch14@yandex.ru>
frodenas
approved these changes
May 5, 2026
Member
|
Thanks @sircthulhu |
Merged
frodenas
added a commit
that referenced
this pull request
May 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Related issue: #803
Three response-body leaks in govcd that all show up under sustained 4xx / 5xx loads:
ParseErr(XML / JSON error path) —checkRespWithErrTypereturns(nil, ParseErr(bodyType, resp, errType)), so the caller has no handle on the response and cannot close it.ParseErritself only forwards the response todecodeBody, whichio.ReadAlls the body but never closes it.executeJsonRequest(OpenAPI / JSON path) — on every non-2xx the function reads the body withio.ReadAlland returns(resp, err). All callers followif err != nil { return nil, err }and onlydefer closeBody(resp)afterwards, so the deferred close never runs.rewrapRespBodyNoopCloser— reads the body and replacesresp.Bodywithio.NopCloser(...). After that any subsequentresp.Body.Close()is a no-op, so the underlying TCP connection is never released. Used fromapi_vcd.go,entity_client.go, andopenapi.go— every request that goes through these paths leaks a goroutine.Same class of leak (
net/http.setRequestCancel.func4goroutine pinned to a never-closed body), three different code paths.Verification
A real-world Kubernetes operator built on top of go-vcloud-director was leaking about 28 goroutines per second of error rate; one snapshot showed 12 651 leaked
setRequestCancel.func4goroutines (~200 MB RSS) before the pod OOMed. With all three fixes applied the same workload's goroutine count stays flat at the controller-runtime baseline (~600) and RSS hovers under 100 MB indefinitely.