Skip to content

DAOS-18710 test: Update provider tests to use dc_x for UCX.#17746

Merged
phender merged 1 commit intodaily-testingfrom
jgm/DAOS-18710
Apr 3, 2026
Merged

DAOS-18710 test: Update provider tests to use dc_x for UCX.#17746
phender merged 1 commit intodaily-testingfrom
jgm/DAOS-18710

Conversation

@jgmoore-or
Copy link
Copy Markdown
Contributor

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Signed-off-by: Joseph Moore <joseph.moore@hpe.com>
@github-actions
Copy link
Copy Markdown

Ticket title is 'Change to using ucx+dc_x as the provider string for daily test run with UCX and for the weekly provider tests.'
Status is 'Open'
Labels: 'request_for_2.8'
https://daosio.atlassian.net/browse/DAOS-18710

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional on Leap 15.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-17746/1/display/redirect

@jgmoore-or jgmoore-or requested a review from phender March 23, 2026 17:49
'--provider argument; i.e. "ucx+dc_x", "ofi+verbs", "ofi+tcp")')
string(name: 'TestProviderUCX',
defaultValue: 'ucx+ud_x',
defaultValue: 'ucx+dc_x',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: for master we'll also want to update https://github.com/daos-stack/daos/blob/provider-testing/Jenkinsfile#L100 in a separate PR.

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium UCX Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17746/2/execution/node/322/log

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium UCX Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17746/3/execution/node/494/log

@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17746/4/execution/node/520/log

@jgmoore-or jgmoore-or requested a review from JohnMalmberg March 27, 2026 16:07
@daosbuild3
Copy link
Copy Markdown
Collaborator

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17746/5/execution/node/409/log

@johannlombardi
Copy link
Copy Markdown
Contributor

@jgmoore-or please review the failures and let us know whether there are either existing issues or new ones introduced by the switch from UD to DC.

@phender
Copy link
Copy Markdown
Contributor

phender commented Apr 1, 2026

Analysis of https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-17746/5/#showFailuresLink failures:

  • Known issue https://daosio.atlassian.net/browse/DAOS-15518
    • [HW Medium] 1-./control/dmg_network_scan.py:DmgNetworkScanTest.test_dmg_network_scan_basic
    • [HW Medium MD on SSD] 1-./control/dmg_network_scan.py:DmgNetworkScanTest.test_dmg_network_scan_basic
  • Known issue https://daosio.atlassian.net/browse/DAOS-16553
    • [HW Medium] 1-./dfuse/pil4dfs_fio.py:Pil4dfsFio.test_pil4dfs_vs_dfs
    • [HW Medium] 3-./dfuse/pil4dfs_fio.py:Pil4dfsFio.test_pil4dfs_vs_dfs
    • [HW Medium] 4-./dfuse/pil4dfs_fio.py:Pil4dfsFio.test_pil4dfs_vs_dfs
    • [HW Medium] 5-./dfuse/pil4dfs_fio.py:Pil4dfsFio.test_pil4dfs_vs_dfs
    • [HW Medium MD on SSD] 1-./dfuse/pil4dfs_fio.py:Pil4dfsFio.test_pil4dfs_vs_dfs
    • [HW Medium MD on SSD] 3-./dfuse/pil4dfs_fio.py:Pil4dfsFio.test_pil4dfs_vs_dfs
    • [HW Medium MD on SSD] 4-./dfuse/pil4dfs_fio.py:Pil4dfsFio.test_pil4dfs_vs_dfs
    • [HW Medium MD on SSD] 5-./dfuse/pil4dfs_fio.py:Pil4dfsFio.test_pil4dfs_vs_dfs
  • NEW issue
    • [HW Medium MD on SSD] 3-./pool/create.py:PoolCreateTests.test_create_no_space_loop
      • This last test case times out on loop 81 of 100 while attempting to create pools that are expected to fail due to DER_NOSPACE.
      • I do not see any open or recently closed tickets on this issue.
  • Known issue https://daosio.atlassian.net/browse/DAOS-18276
    • [HW Medium MD on SSD] 05-./pool/create_all_hw.py:PoolCreateAllHwTests.test_recycle_pools_hw
    • [HW Medium MD on SSD] 07-./pool/create_all_hw.py:PoolCreateAllHwTests.test_recycle_pools_hw
    • [HW Medium MD on SSD] 08-./pool/create_all_hw.py:PoolCreateAllHwTests.test_recycle_pools_hw
  • Known issue https://daosio.atlassian.net/browse/DAOS-17616
    • [HW Medium MD on SSD] 1-./server/daos_server_restart.py:DaosServerTest.test_daos_server_reformat

@jgmoore-or
Copy link
Copy Markdown
Contributor Author

jgmoore-or commented Apr 1, 2026

The daily testing does not run the HW Medium MD on SSD test for UCX, so the new failure noted by @phender is not something new introduced by the switch to dc_x from ud_x. This test is also not run in the weekly provider tests, which will be switched to dc_x in a separate PR.

I agree that we need a ticket for this failure, but it is not pertinent to this ticket (which just changes from ud_x to dc_x for the Functional Hardware Medium UCX Provider test). Note that none of the tests incurring failures for this PR are run as part of the Functional Hardware Medium UCX Provider test.

The dmg network scan failure is fixed by: #17636 (currently under review). The pil4dfs_fio tests failures are due to a limitation in UCX that requires a change to the UCX code to resolve. We have such a fixed (developed by Lei Huang) and plan to push this to UCX.

The other MD-on-SSD failures listed by Phil do not appear to be specific to UCX.

@phender
Copy link
Copy Markdown
Contributor

phender commented Apr 3, 2026

The daily testing does not run the HW Medium MD on SSD test for UCX, so the new failure noted by @phender is not something new introduced by the switch to dc_x from ud_x. This test is also not run in the weekly provider tests, which will be switched to dc_x in a separate PR.

It should be noted that build #5 ran all the Functional HW stages with ucx+dc_x due to having both the TestProvider and TestProviderUCX parameter to ucx+dc_x. The code change in this PR should only be setting the defafult TestProviderUCX value to ucx+dc_x. Normally TestProvider would be left empty (default value) resulting in verbs being used for the non-Provider Functional HW stages.

@phender phender requested a review from a team April 3, 2026 18:10
@phender phender merged commit a2e501b into daily-testing Apr 3, 2026
6 of 11 checks passed
@phender phender deleted the jgm/DAOS-18710 branch April 3, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants