Skip to content

Conversation

@yagikota
Copy link
Contributor

@yagikota yagikota commented Sep 27, 2025

What

  • Add E2E_TEST_MINIMAL environment variable to enable a reduced e2e test matrix for faster CI runtimes

Why

To allow us to minimize the E2E test matrix in some cases(e.g., presumits)

Context: #18983

Next Step

Update https://github.com/kubernetes/test-infra/blob/master/config/jobs/etcd/etcd-presubmits.yaml by adding E2E_TEST_MINIMAL=true

Review points

minimal test candidates

Do these candidates make sense?

	minimalTestCases := []testCase{
		{
			name:   "NoTLS",
			config: config.ClusterConfig{ClusterSize: 1},
		},
		{
			name:   "PeerTLS",
			config: config.ClusterConfig{ClusterSize: 3, PeerTLS: config.ManualTLS},
		},
		{
			name:   "ClientTLS",
			config: config.ClusterConfig{ClusterSize: 1, ClientTLS: config.ManualTLS},
		},
	}

Minimize test cases globally or per E2E test?

This PR introduces E2E_TEST_MINIMAL environment variable, which is supposed to be passed when running E2E tests in test-infra jobs.

For example:

TIMEOUT=60m VERBOSE=1 GOOS=linux GOARCH=amd64 CPU=4 EXPECT_DEBUG=true E2E_TEST_MINIMAL=true make test-e2e-release

This will affect all E2E tests inside tests/common/. (globally)

If we want flexible control over whether or not to minimize test cases depending on the E2E test, we need to consider an alternative approach instead of using an environment variable. (per E2E test)

@k8s-ci-robot
Copy link

Hi @yagikota. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@yagikota
Copy link
Contributor Author

/assign

@yagikota yagikota changed the title tests: add minimal E2E test configuration for faster CI runs [WIP]tests: add minimal E2E test configuration for faster CI runs Sep 27, 2025
@yagikota yagikota changed the title [WIP]tests: add minimal E2E test configuration for faster CI runs tests: add minimal E2E test configuration for faster CI runs Sep 27, 2025
@yagikota yagikota marked this pull request as ready for review September 27, 2025 08:23
@yagikota
Copy link
Contributor Author

@ahrtr @serathius @ivanvc

Would you review this PR?

@ahrtr
Copy link
Member

ahrtr commented Sep 27, 2025

/ok-to-test

@codecov
Copy link

codecov bot commented Sep 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.26%. Comparing base (d3f136a) to head (d11bc94).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files

see 27 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #20733      +/-   ##
==========================================
+ Coverage   69.16%   69.26%   +0.10%     
==========================================
  Files         420      422       +2     
  Lines       34817    34817              
==========================================
+ Hits        24081    24116      +35     
+ Misses       9338     9310      -28     
+ Partials     1398     1391       -7     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d3f136a...d11bc94. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ahrtr
Copy link
Member

ahrtr commented Sep 27, 2025

/retest

@serathius
Copy link
Member

serathius commented Sep 27, 2025

It saved around 10 minutes, reducing runtime from 57m to 46m.

image

@yagikota
Copy link
Contributor Author

yagikota commented Sep 27, 2025

@serathius

Do we need to reduce more time?

According to #18983, the target duration is 30 minutes

@ahrtr
Copy link
Member

ahrtr commented Sep 27, 2025

Please try the following case,

	minimalTestCases := []testCase{
		{
			name:   "NoTLS",
			config: config.ClusterConfig{ClusterSize: 1},
		},
		{
			name:   "PeerTLS and ClientTLS",
			config: config.ClusterConfig{ClusterSize: 3, PeerTLS: config.ManualTLS, ClientTLS: config.ManualTLS},
		},
	}

@yagikota
Copy link
Contributor Author

/retest

@yagikota yagikota force-pushed the resuce-e2e-test-time branch from bee786c to 9938a64 Compare September 28, 2025 01:50
@yagikota
Copy link
Contributor Author

yagikota commented Sep 28, 2025

	minimalTestCases := []testCase{
		{
			name:   "NoTLS",
			config: config.ClusterConfig{ClusterSize: 1},
		},
		{
			name:   "PeerTLS and ClientTLS",
			config: config.ClusterConfig{ClusterSize: 3, PeerTLS: config.ManualTLS, ClientTLS: config.ManualTLS},
		},
	}

https://prow.k8s.io/job-history/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-e2e-amd64
image

--> saved around 14 minutes:

  • before: around 57 minutes
  • after 43 minutes

-> Combining PeerTLS and ClientTLS is effective in saving runtime

Duration of each test: https://docs.google.com/spreadsheets/d/1kaHBg0OOBeQ51oX_kdto8b26_42xDGJi5xbJlwbNCLY/edit?gid=0#gid=0

@yagikota
Copy link
Contributor Author

yagikota commented Sep 28, 2025

	minimalTestCases := []testCase{
		{
			name:   "NoTLS",
			config: config.ClusterConfig{ClusterSize: 1},
		}
	}

https://prow.k8s.io/job-history/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-e2e-amd64
image

--> saved around 17 minutes:

before: around 57 minutes
after 40 minutes

Even if we test only NoTLS with 1 etcd member, it takes 40 minutes.

Duration of each test: https://docs.google.com/spreadsheets/d/1kaHBg0OOBeQ51oX_kdto8b26_42xDGJi5xbJlwbNCLY/edit?gid=46431453#gid=46431453

@yagikota yagikota force-pushed the resuce-e2e-test-time branch from b78e1bf to 02410fe Compare September 28, 2025 04:02
@yagikota
Copy link
Contributor Author

Currently, most of the e2e tests have not yet been migrated to the common testing framework, so even though we use E2E_TEST_MINIMAL, it does not effectively reduce CI runtime. It just reduces the e2e test matrix in tests/common, not in tests/e2e. After completing #13637, E2E_TEST_MINIMAL will maximize its effect.

@ahrtr
Copy link
Member

ahrtr commented Sep 28, 2025

--> saved around 14 minutes:

  • before: around 57 minutes
  • after 43 minutes

-> Combining PeerTLS and ClientTLS is effective in saving runtime

Duration of each test: https://docs.google.com/spreadsheets/d/1kaHBg0OOBeQ51oX_kdto8b26_42xDGJi5xbJlwbNCLY/edit?gid=0#gid=0

Based on comment above #20733 (comment), where there were 3 cases in minimalTestCases, it reduced runtime from 57m to 46m.

Now there were only 2 cases in minimalTestCases, it reduced runtime from 57m to 43m. It seems that we only saved additional 3m?

@yagikota
Copy link
Contributor Author

yagikota commented Sep 28, 2025

@ahrtr

Now there were only 2 cases in minimalTestCases, it reduced runtime from 57m to 43m. It seems that we only saved additional 3m?

Yes. Here is the number of tests of tests/e2e and tests/common in each case.
We can save 231 - 195 = 36 common tests, which reduces the runtime by only 3 minutes. (option 1 vs 2)

w/o minimalTestCases

  • go.etcd.io/etcd/tests/v3/e2e: 363
  • go.etcd.io/etcd/tests/v3/common: 402

Duration: 57 minutes

ref. https://docs.google.com/spreadsheets/d/1kaHBg0OOBeQ51oX_kdto8b26_42xDGJi5xbJlwbNCLY/edit?gid=1063825652#gid=1063825652

w/ minimalTestCases

option1

	minimalTestCases := []testCase{
		{
			name:   "NoTLS",
			config: config.ClusterConfig{ClusterSize: 1},
		},
		{
			name:   "PeerTLS",
			config: config.ClusterConfig{ClusterSize: 3, PeerTLS: config.ManualTLS},
		},
		{
			name:   "ClientTLS",
			config: config.ClusterConfig{ClusterSize: 1, ClientTLS: config.ManualTLS},
		},
	}
  • go.etcd.io/etcd/tests/v3/e2e: 363
  • go.etcd.io/etcd/tests/v3/common: 231

Duration: 46 minutes

ref. https://docs.google.com/spreadsheets/d/1kaHBg0OOBeQ51oX_kdto8b26_42xDGJi5xbJlwbNCLY/edit?gid=1996906343#gid=1996906343

option2

	minimalTestCases := []testCase{
		{
			name:   "NoTLS",
			config: config.ClusterConfig{ClusterSize: 1},
		},
		{
			name:   "PeerTLS and ClientTLS",
			config: config.ClusterConfig{ClusterSize: 3, PeerTLS: config.ManualTLS, ClientTLS: config.ManualTLS},
		},
	}
  • go.etcd.io/etcd/tests/v3/e2e: 363
  • go.etcd.io/etcd/tests/v3/common: 195

Duration: 43 minutes

ref. https://docs.google.com/spreadsheets/d/1kaHBg0OOBeQ51oX_kdto8b26_42xDGJi5xbJlwbNCLY/edit?gid=0#gid=0

option3

	minimalTestCases := []testCase{
		{
			name:   "NoTLS",
			config: config.ClusterConfig{ClusterSize: 1},
		}
	}
  • go.etcd.io/etcd/tests/v3/e2e: 363
  • go.etcd.io/etcd/tests/v3/common: 150

Duration: 40 minutes

ref. https://docs.google.com/spreadsheets/d/1kaHBg0OOBeQ51oX_kdto8b26_42xDGJi5xbJlwbNCLY/edit?gid=46431453#gid=46431453

@yagikota
Copy link
Contributor Author

If we choose option 2, we can reduce the number of common tests by nearly half, which will reduce the CI runtime by 14 minutes. As I mentioned in #20733 (comment), it will not be effective for tests/e2e, so the impact is limited for now. That said, it’s still worth adopting.

What do you think?

@ahrtr
Copy link
Member

ahrtr commented Sep 28, 2025

Thanks for the statistic data. I think we can apply for option 2 for now. We are open to any further enhancement in future if any.

	minimalTestCases := []testCase{
		{
			name:   "NoTLS",
			config: config.ClusterConfig{ClusterSize: 1},
		},
		{
			name:   "PeerTLS and ClientTLS",
			config: config.ClusterConfig{ClusterSize: 3, PeerTLS: config.ManualTLS, ClientTLS: config.ManualTLS},
		},
	}

@yagikota
Copy link
Contributor Author

@ahrtr

Can we combine PeerAutoTLS and ClientAutoTLS like this?:

-		{
-			name:   "PeerAutoTLS",
-			config: config.ClusterConfig{ClusterSize: 3, PeerTLS: config.AutoTLS},
-		},
-		{
-			name:   "ClientAutoTLS",
-			config: config.ClusterConfig{ClusterSize: 1, ClientTLS: config.AutoTLS},
-		},
+		{
+			name:   "PeerAutoTLS and ClientAutoTLS",
+			config: config.ClusterConfig{ClusterSize: 3, PeerTLS: config.AutoTLS, ClientTLS: config.AutoTLS},
+}

@ahrtr
Copy link
Member

ahrtr commented Sep 28, 2025

@ahrtr

Can we combine PeerAutoTLS and ClientAutoTLS like this?:

-		{
-			name:   "PeerAutoTLS",
-			config: config.ClusterConfig{ClusterSize: 3, PeerTLS: config.AutoTLS},
-		},
-		{
-			name:   "ClientAutoTLS",
-			config: config.ClusterConfig{ClusterSize: 1, ClientTLS: config.AutoTLS},
-		},
+		{
+			name:   "PeerAutoTLS and ClientAutoTLS",
+			config: config.ClusterConfig{ClusterSize: 3, PeerTLS: config.AutoTLS, ClientTLS: config.AutoTLS},
+}

I was proposing to add PeerTLS and ClientTLS, and keep the original PeerTLS and ClientTLS as they are.

But combining them works for me. The same to PeerAutoTLS and ClientAutoTLS

@yagikota
Copy link
Contributor Author

@ahrtr

Let me clarify my idea.
I think we can remove original PeerTLS/ClientTLS and PeerAutoTLS/ClientAutoTLS, then combine them like this:

@ahrtr
Copy link
Member

ahrtr commented Sep 28, 2025

pls squash the commits.

@yagikota
Copy link
Contributor Author

yagikota commented Sep 28, 2025

Hmm, probably this cannot cover all combinations 🤔

In my understanding, there are 3x3=9 combinations in terms of TLS:

Client \ Peer NoTLS TLS AutoTLS
NoTLS (1) (2) (3)
TLS (4) (5) (6)
AutoTLS (7) (8) (9)
// cover (1)
{
	name:   "NoTLS",
	config: config.ClusterConfig{ClusterSize: 1},
},

// cover (5)
{
	name:   "PeerTLS and ClientTLS",
	config: config.ClusterConfig{ClusterSize: 3, PeerTLS: config.ManualTLS, ClientTLS: config.ManualTLS},
},
// cover (9)
{
	name:   "PeerAutoTLS and ClientAutoTLS",
	config: config.ClusterConfig{ClusterSize: 3, PeerTLS: config.AutoTLS, ClientTLS: config.AutoTLS},
},

I think these three combinations are good starting points, but we need to cover the rest of the combinations as well.

@yagikota yagikota force-pushed the resuce-e2e-test-time branch from f982814 to df78447 Compare September 28, 2025 14:07
@ahrtr
Copy link
Member

ahrtr commented Sep 28, 2025

/test pull-etcd-e2e-386

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks

Next step is to update the workflow to use the env variable.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, serathius, yagikota

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@serathius
Copy link
Member

/retest

@serathius
Copy link
Member

go.etcd.io/etcd/tests/v3/common: TestKVGet/PeerAutoTLS_and_ClientAutoTLS test fails.

execute.go:47: ---> Test failed: test timed out after 6.944716365s, err: context deadline exceeded
    execute.go:47: goroutine 8724 [running]:

@yagikota
Copy link
Contributor Author

/test pull-etcd-coverage-report

@yagikota
Copy link
Contributor Author

/retest

@k8s-ci-robot
Copy link

@yagikota: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-coverage-report d11bc94 link true /test pull-etcd-coverage-report

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ahrtr ahrtr merged commit 86c4841 into etcd-io:main Sep 29, 2025
31 of 32 checks passed
@yagikota
Copy link
Contributor Author

@ahrtr

Is it no problem to ignore pull-etcd-coverage-report fail?

@ivanvc
Copy link
Member

ivanvc commented Sep 30, 2025

Update https://github.com/kubernetes/test-infra/blob/master/config/jobs/etcd/etcd-presubmits.yaml by adding E2E_TEST_MINIMAL=true

@yagikota, do you want to help with this, too?

@yagikota
Copy link
Contributor Author

@ivanvc
Yes, I'll create a PR for it.

@ahrtr
Copy link
Member

ahrtr commented Sep 30, 2025

@ahrtr

Is it no problem to ignore pull-etcd-coverage-report fail?

It isn't caused by this PR.

But I agree that it's annoying to frequently see the failure. It'd great if anyone can take a look. thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants