-
Notifications
You must be signed in to change notification settings - Fork 591
Update integration test timeout and make parallel where possible #2702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update integration test timeout and make parallel where possible #2702
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
Review Summary by QodoExtend integration test timeout and enable parallel execution
WalkthroughsDescription• Increase integration test timeout from 30m to 60m • Enable parallel test execution with -p flag • Improve CI performance and reliability Diagramflowchart LR
A["GINKGO_ARGS config"] -->|"timeout: 30m"| B["Original settings"]
A -->|"timeout: 60m + -p flag"| C["Updated settings"]
C -->|"Benefits"| D["Longer timeout + Parallel runs"]
File Changes1. tests/hack/test.sh
|
|
Hello @JoelSpeed! Some important instructions when contributing to openshift/api: |
|
Caution Review failedFailed to post review comments 📝 WalkthroughWalkthroughUpdated test behavior and timing across test files: tests/hack/test.sh changes default GINKGO_ARGS from a 30-minute timeout to a 60-minute timeout and adds the 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented Comment |
Code Review by Qodo
1. Parallel envtest explosion
|
| ARTIFACT_DIR=${ARTIFACT_DIR:-""} | ||
| GINKGO=${GINKGO:-"go run ${REPO_ROOT}/vendor/github.com/onsi/ginkgo/v2/ginkgo"} | ||
| GINKGO_ARGS=${GINKGO_ARGS:-"-r -v --randomize-all --randomize-suites --keep-going --timeout=30m"} | ||
| GINKGO_ARGS=${GINKGO_ARGS:-"-r -v --randomize-all --randomize-suites --keep-going --timeout=60m -p"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Parallel envtest explosion 🐞 Bug ⛯ Reliability
• The added -p flag enables Ginkgo parallel mode with an auto-detected process count, which scales with CPU cores. • In parallel mode, each process executes BeforeSuite, and this suite’s BeforeSuite starts a full envtest.Environment (kube-apiserver + etcd), so -p can start many control planes simultaneously. • This can overwhelm CI resources (CPU/memory/ports) and make tests flaky or slower, defeating the intent of enabling parallelism by default.
Agent Prompt
### Issue description
`tests/hack/test.sh` now always runs `ginkgo` with `-p`, which makes Ginkgo auto-select a parallel process count based on CPU cores. This suite starts an `envtest.Environment` in `BeforeSuite`, so parallel mode can start multiple kube-apiserver/etcd control planes concurrently, causing CI resource exhaustion/flakes.
### Issue Context
Ginkgo v2 auto-computes `ComputedProcs()` from CPU count when `Parallel` is enabled. In parallel mode, each process runs `BeforeSuite`, and this repo’s `BeforeSuite` starts envtest.
### Fix Focus Areas
- tests/hack/test.sh[10-12]
### Suggested approach
- Remove `-p` from the default `GINKGO_ARGS`, OR
- Replace `-p` with a bounded `--procs=${GINKGO_PROCS:-1}` (and optionally allow users/CI to set `GINKGO_PROCS` to a small safe number like 2), OR
- Enable `-p` only in specific environments where resources are known to be sufficient (and still cap `--procs`).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tests/generator.go`:
- Around line 258-261: Fix the typo in the comment above the Eventually block:
change "sential" to "sentinel" in the comment that explains the retry logic
around Eventually(func() error { return k8sClient.Create(ctx, initialObj) },
timeout).Should(Succeed(), ...), leaving the code (Eventually, k8sClient.Create,
initialObj, GenerateName behavior) unchanged.
🧹 Nitpick comments (1)
tests/generator.go (1)
284-289: Consider fetching a fresh object before each Update attempt.The current pattern uses
initialObj.DeepCopy()which may have a staleResourceVersion, potentially causing conflict errors on each retry. WhileEventuallywill handle retries, fetching the latest object first would be more robust:Eventually(func() error { fresh := initialObj.DeepCopy() if err := k8sClient.Get(ctx, objectKey(fresh), fresh); err != nil { return err } fresh.Object["sentinel"] = initialObj.GetUID() + "+restored" return k8sClient.Update(ctx, fresh) }, timeout).Should(Succeed(), "Sentinel should be persisted")This is a minor optimization for test reliability; the current approach should still work in practice.
| // Use an eventually here, so that we retry until the sential correctly applies. | ||
| Eventually(func() error { | ||
| return k8sClient.Create(ctx, initialObj) | ||
| }).Should(Succeed(), "initial object should create successfully") | ||
| }, timeout).Should(Succeed(), "initial object should create successfully") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typo in comment: "sential" → "sentinel".
The retry logic is correct—retrying Create with GenerateName until the CRD schema update propagates is safe.
📝 Proposed fix
- // Use an eventually here, so that we retry until the sential correctly applies.
+ // Use an eventually here, so that we retry until the sentinel correctly applies.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Use an eventually here, so that we retry until the sential correctly applies. | |
| Eventually(func() error { | |
| return k8sClient.Create(ctx, initialObj) | |
| }).Should(Succeed(), "initial object should create successfully") | |
| }, timeout).Should(Succeed(), "initial object should create successfully") | |
| // Use an eventually here, so that we retry until the sentinel correctly applies. | |
| Eventually(func() error { | |
| return k8sClient.Create(ctx, initialObj) | |
| }, timeout).Should(Succeed(), "initial object should create successfully") |
🤖 Prompt for AI Agents
In `@tests/generator.go` around lines 258 - 261, Fix the typo in the comment above
the Eventually block: change "sential" to "sentinel" in the comment that
explains the retry logic around Eventually(func() error { return
k8sClient.Create(ctx, initialObj) }, timeout).Should(Succeed(), ...), leaving
the code (Eventually, k8sClient.Create, initialObj, GenerateName behavior)
unchanged.
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
3 similar comments
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
|
/test verify |
|
PR-Agent: could not fine a component named |
everettraven
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
Pipeline controller notification No second-stage tests were triggered for this PR. This can happen when:
Use |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: everettraven The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verified by CI |
2 similar comments
|
/verified by CI |
|
/verified by CI |
|
@JoelSpeed: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@JoelSpeed: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@JoelSpeed: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@JoelSpeed: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/tide refresh |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
01412ff
into
openshift:master
Integration tests are failing on the 30 minute timeout, this extends the timeout to give us more time.
Also added
-pwhich in theory parallelises the runs to save us time. I believe this should be safe on this suite, interested to see if it works in CI at all. Locally I ran the suite in 2m27s with this flag compared to 26m34s without.