Skip to content

use intrinsics::simd for aarch64 deinterleaving loads#2025

Open
folkertdev wants to merge 5 commits intorust-lang:mainfrom
folkertdev:arm-deinterleave-load
Open

use intrinsics::simd for aarch64 deinterleaving loads#2025
folkertdev wants to merge 5 commits intorust-lang:mainfrom
folkertdev:arm-deinterleave-load

Conversation

@folkertdev
Copy link
Contributor

@folkertdev folkertdev commented Feb 14, 2026

Hitting llvm/llvm-project#181514 for some ld2 cases.

@folkertdev folkertdev force-pushed the arm-deinterleave-load branch 6 times, most recently from 2f748fb to 70ca2a6 Compare February 14, 2026 22:58
@folkertdev
Copy link
Contributor Author

There is also an issue with ld2 on aarch64_be

 ---- core_arch::aarch64::neon::generated::assert_vld2q_p64_ld2 stdout ----
disassembly for stdarch_test_shim_vld2q_p64_ld2: 
	 0: add x9, x0, #0x10
	 1: ld1 {v0.2d}, [x0]
	 2: ld1 {v1.2d}, [x9]
	 3: add x9, x8, #0x10
	 4: zip1 v2.2d, v1.2d, v0.2d
	 5: zip2 v0.2d, v1.2d, v0.2d
	 6: st1 {v2.2d}, [x8]
	 7: st1 {v0.2d}, [x9]
	 8: ret

thread 'core_arch::aarch64::neon::generated::assert_vld2q_p64_ld2' (1746) panicked at crates/stdarch-test/src/lib.rs:204:9:
failed to find instruction `ld2` in the disassembly
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- core_arch::aarch64::neon::generated::assert_vld2q_u64_ld2 stdout ----
disassembly for stdarch_test_shim_vld2q_u64_ld2: 
	 0: add x9, x0, #0x10
	 1: ld1 {v0.2d}, [x0]
	 2: ld1 {v1.2d}, [x9]
	 3: add x9, x8, #0x10
	 4: zip1 v2.2d, v1.2d, v0.2d
	 5: zip2 v0.2d, v1.2d, v0.2d
	 6: st1 {v2.2d}, [x8]
	 7: st1 {v0.2d}, [x9]
	 8: ret

thread 'core_arch::aarch64::neon::generated::assert_vld2q_u64_ld2' (1748) panicked at crates/stdarch-test/src/lib.rs:204:9:
failed to find instruction `ld2` in the disassembly


failures:
    core_arch::aarch64::neon::generated::assert_vld2q_p64_ld2
    core_arch::aarch64::neon::generated::assert_vld2q_u64_ld2

Probably another missed optimization?

@folkertdev folkertdev force-pushed the arm-deinterleave-load branch from cad2cac to ab65f8a Compare February 14, 2026 23:47
@folkertdev folkertdev force-pushed the arm-deinterleave-load branch from f7f53ec to ca268d2 Compare February 14, 2026 23:57
@folkertdev
Copy link
Contributor Author

So, skipping ld2 for neon for now, it runs into weird issues that I can't even really reproduce locally. I'd expect ld4 to be by far the most common anyway.

r? sayantn
cc @adamgemmell

@folkertdev folkertdev marked this pull request as ready for review February 15, 2026 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants