Skip to content

Add PMU cycle counting for Armv8.1-M #1514

Merged
hanno-becker merged 3 commits intomainfrom
armv8-bench
Jan 21, 2026
Merged

Add PMU cycle counting for Armv8.1-M #1514
hanno-becker merged 3 commits intomainfrom
armv8-bench

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

Add PMU-based cycle counting support for Armv8.1-M Cortex-M processors.
This uses the CMSIS PMU APIs for portable cycle counter access.

@mkannwischer mkannwischer force-pushed the armv8-bench branch 2 times, most recently from e46f5da to 8ab59cf Compare January 21, 2026 02:54
@mkannwischer mkannwischer marked this pull request as ready for review January 21, 2026 03:12
@mkannwischer mkannwischer requested a review from a team as a code owner January 21, 2026 03:12
@mkannwischer mkannwischer requested a review from bremoran January 21, 2026 03:12
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Jan 21, 2026

CBMC Results (ML-KEM-512)

Full Results (139 proofs)
Proof Status Current Previous Change
**TOTAL** 1236s 1109s +11.5%
mlk_indcpa_enc 190s 174s +9%
mlk_indcpa_keypair_derand 186s 172s +8%
mlk_keccak_squeezeblocks_x4 140s 121s +16%
mlk_rej_uniform_c 95s 71s +34%
mlk_polyvec_basemul_acc_montgomery_cached_c 54s 41s +32%
mlk_poly_rej_uniform 38s 33s +15%
poly_ntt_native 32s 24s +33%
mlk_ntt_layer 25s 20s +25%
polyvec_basemul_acc_montgomery_cached_native 24s 22s +9%
keccakf1600x4_permute_native_x4 21s 21s +0%
mlk_poly_reduce_native 16s 14s +14%
mlk_poly_sub 12s 8s +50%
mlk_polyvec_add 11s 8s +38%
mlk_keccak_absorb_once_x4 10s 11s -9%
mlk_ntt_butterfly_block 10s 8s +25%
mlk_indcpa_dec 9s 10s -10%
mlk_keccak_squeezeblocks 9s 8s +12%
mlk_poly_frombytes_native 9s 8s +12%
mlk_poly_rej_uniform_x4 8s 9s -11%
keccakf1600_permute_native 7s 4s +75%
mlk_fqmul 7s 6s +17%
kem_dec 6s 3s +100%
mlk_keccak_squeeze_once 6s 6s +0%
mlk_poly_frommsg 6s 8s -25%
mlk_keccak_absorb_once 5s 4s +25%
mlk_keccakf1600_extract_bytes (big endian) 5s 2s +150%
mlk_polymat_permute_bitrev_to_custom 5s 4s +25%
mlk_polyvec_tobytes 5s 4s +25%
mlk_scalar_compress_d1 5s 2s +150%
poly_getnoise_eta1122_4x_native 5s 2s +150%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 2s +100%
kem_enc_derand 4s 4s +0%
mlk_gen_matrix_serial 4s 4s +0%
mlk_keccakf1600_permute 4s 1s +300%
mlk_poly_cbd_eta1 4s 2s +100%
mlk_poly_compress_du 4s 3s +33%
mlk_poly_frombytes 4s 2s +100%
mlk_poly_invntt_tomont_c 4s 4s +0%
mlk_poly_ntt_c 4s 1s +300%
mlk_polyvec_basemul_acc_montgomery_cached 4s 3s +33%
mlk_polyvec_invntt_tomont 4s 4s +0%
mlk_polyvec_permute_bitrev_to_custom_native 4s 1s +300%
mlk_scalar_decompress_d11 4s 2s +100%
mlk_scalar_signed_to_unsigned_q 4s 2s +100%
mlk_shake256x4 4s 3s +33%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 4s 1s +300%
sys_check_capability 4s 4s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 2s +50%
kem_check_sk 3s 3s +0%
kem_enc 3s 1s +200%
kem_keypair_derand 3s 4s -25%
mlk_ct_cmask_nonzero_u16 3s 3s +0%
mlk_ct_cmov_zero 3s 2s +50%
mlk_ct_get_optblocker_i32 3s 3s +0%
mlk_ct_get_optblocker_u32 3s 1s +200%
mlk_ct_get_optblocker_u8 3s 2s +50%
mlk_invntt_layer 3s 6s -50%
mlk_keccakf1600_extract_bytes 3s 3s +0%
mlk_keccakf1600x4_xor_bytes 3s 2s +50%
mlk_matvec_mul 3s 2s +50%
mlk_poly_add 3s 1s +200%
mlk_poly_compress_dv 3s 3s +0%
mlk_poly_getnoise_eta1122_4x 3s 2s +50%
mlk_poly_getnoise_eta2 3s 4s -25%
mlk_poly_invntt_tomont 3s 2s +50%
mlk_poly_mulcache_compute_c 3s 3s +0%
mlk_poly_mulcache_compute_native 3s 6s -50%
mlk_poly_reduce 3s 2s +50%
mlk_poly_tobytes 3s 3s +0%
mlk_poly_tobytes_c 3s 3s +0%
mlk_poly_tobytes_native 3s 4s -25%
mlk_poly_tomsg 3s 3s +0%
mlk_polyvec_compress_du 3s 4s -25%
mlk_polyvec_frombytes 3s 2s +50%
mlk_polyvec_mulcache_compute 3s 2s +50%
mlk_polyvec_ntt 3s 4s -25%
mlk_scalar_compress_d10 3s 4s -25%
mlk_scalar_compress_d4 3s 2s +50%
mlk_scalar_compress_d5 3s 4s -25%
mlk_scalar_decompress_d4 3s 3s +0%
mlk_scalar_decompress_d5 3s 4s -25%
mlk_sha3_256 3s 2s +50%
poly_reduce_native_aarch64 3s 1s +200%
poly_tobytes_native_aarch64 3s 2s +50%
keccak_f1600_x1_native_aarch64 2s 1s +100%
keccak_f1600_x4_native_aarch64_v84a 2s 1s +100%
kem_check_pk 2s 4s -50%
kem_keypair 2s 3s -33%
mlk_barrett_reduce 2s 1s +100%
mlk_ct_cmask_neg_i16 2s 1s +100%
mlk_ct_memcmp 2s 3s -33%
mlk_ct_sel_int16 2s 3s -33%
mlk_gen_matrix 2s 5s -60%
mlk_keccakf1600_xor_bytes 2s 1s +100%
mlk_keccakf1600x4_extract_bytes 2s 2s +0%
mlk_keccakf1600x4_permute 2s 2s +0%
mlk_montgomery_reduce 2s 2s +0%
mlk_poly_cbd_eta2 2s 2s +0%
mlk_poly_decompress_du 2s 2s +0%
mlk_poly_decompress_dv 2s 2s +0%
mlk_poly_getnoise_eta1_4x 2s 4s -50%
mlk_poly_getnoise_eta1_4x_native 2s 3s -33%
mlk_poly_mulcache_compute 2s 2s +0%
mlk_poly_ntt 2s 1s +100%
mlk_poly_reduce_c 2s 5s -60%
mlk_poly_tomont 2s 2s +0%
mlk_poly_tomont_c 2s 3s -33%
mlk_polyvec_decompress_du 2s 2s +0%
mlk_polyvec_permute_bitrev_to_custom 2s 1s +100%
mlk_polyvec_reduce 2s 4s -50%
mlk_polyvec_tomont 2s 2s +0%
mlk_rej_uniform 2s 3s -33%
mlk_scalar_compress_d11 2s 1s +100%
mlk_sha3_512 2s 1s +100%
mlk_shake128_absorb_once 2s 2s +0%
mlk_shake128_squeezeblocks 2s 2s +0%
mlk_shake128x4_squeezeblocks 2s 4s -50%
mlk_shake256 2s 1s +100%
mlk_value_barrier_u32 2s 3s -33%
mlk_value_barrier_u8 2s 3s -33%
ntt_native_aarch64 2s 6s -67%
poly_invntt_tomont_native 2s 1s +100%
poly_mulcache_compute_native_aarch64 2s 1s +100%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 2s 3s -33%
rej_uniform_native_aarch64 2s 3s -33%
intt_native_aarch64 1s 2s -50%
keccak_f1600_x1_native_aarch64_v84a 1s 2s -50%
mlk_check_pct 1s 4s -75%
mlk_ct_cmask_nonzero_u8 1s 2s -50%
mlk_ct_sel_uint8 1s 3s -67%
mlk_keccakf1600_xor_bytes (big endian) 1s 2s -50%
mlk_poly_frombytes_c 1s 2s -50%
mlk_poly_tomont_native 1s 3s -67%
mlk_scalar_decompress_d10 1s 2s -50%
mlk_shake128x4_absorb_once 1s 2s -50%
mlk_value_barrier_i32 1s 6s -83%
poly_tomont_native_aarch64 1s 1s +0%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 1s 3s -67%
rej_uniform_native 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Jan 21, 2026

CBMC Results (ML-KEM-768)

Full Results (139 proofs)
Proof Status Current Previous Change
**TOTAL** 1448s 1701s -14.9%
mlk_indcpa_keypair_derand 357s 420s -15%
mlk_indcpa_enc 218s 263s -17%
mlk_keccak_squeezeblocks_x4 126s 142s -11%
mlk_rej_uniform_c 71s 100s -29%
mlk_polyvec_basemul_acc_montgomery_cached_c 68s 104s -35%
polyvec_basemul_acc_montgomery_cached_native 57s 61s -7%
poly_ntt_native 46s 59s -22%
mlk_poly_rej_uniform 40s 45s -11%
mlk_ntt_layer 24s 32s -25%
keccakf1600x4_permute_native_x4 17s 22s -23%
mlk_indcpa_dec 17s 21s -19%
mlk_poly_reduce_native 12s 19s -37%
mlk_polyvec_add 12s 13s -8%
mlk_keccak_absorb_once_x4 11s 10s +10%
mlk_ntt_butterfly_block 9s 11s -18%
mlk_poly_frombytes_native 9s 9s +0%
mlk_poly_sub 9s 14s -36%
mlk_poly_frommsg 8s 7s +14%
mlk_invntt_layer 7s 6s +17%
mlk_keccak_squeezeblocks 7s 7s +0%
mlk_poly_rej_uniform_x4 7s 10s -30%
keccakf1600_permute_native 6s 5s +20%
kem_dec 6s 5s +20%
mlk_fqmul 6s 6s +0%
mlk_keccak_squeeze_once 6s 6s +0%
mlk_poly_getnoise_eta1_4x_native 6s 3s +100%
mlk_poly_mulcache_compute 5s 2s +150%
mlk_polymat_permute_bitrev_to_custom 5s 6s -17%
mlk_polyvec_mulcache_compute 5s 2s +150%
mlk_scalar_compress_d5 5s 1s +400%
poly_tobytes_native_aarch64 5s 2s +150%
kem_check_pk 4s 4s +0%
kem_check_sk 4s 3s +33%
kem_enc_derand 4s 4s +0%
mlk_ct_get_optblocker_u8 4s 1s +300%
mlk_gen_matrix_serial 4s 4s +0%
mlk_keccak_absorb_once 4s 4s +0%
mlk_keccakf1600_permute 4s 4s +0%
mlk_matvec_mul 4s 3s +33%
mlk_poly_getnoise_eta1_4x 4s 6s -33%
mlk_poly_reduce_c 4s 4s +0%
mlk_poly_tobytes_native 4s 5s -20%
mlk_value_barrier_i32 4s 3s +33%
poly_getnoise_eta1122_4x_native 4s 3s +33%
poly_mulcache_compute_native_aarch64 4s 4s +0%
intt_native_aarch64 3s 3s +0%
kem_enc 3s 3s +0%
kem_keypair 3s 2s +50%
kem_keypair_derand 3s 4s -25%
mlk_check_pct 3s 5s -40%
mlk_ct_sel_uint8 3s 2s +50%
mlk_gen_matrix 3s 4s -25%
mlk_keccakf1600x4_xor_bytes 3s 1s +200%
mlk_poly_cbd_eta2 3s 2s +50%
mlk_poly_compress_du 3s 4s -25%
mlk_poly_compress_dv 3s 3s +0%
mlk_poly_frombytes 3s 1s +200%
mlk_poly_frombytes_c 3s 3s +0%
mlk_poly_invntt_tomont_c 3s 1s +200%
mlk_poly_ntt 3s 3s +0%
mlk_poly_tobytes 3s 1s +200%
mlk_poly_tobytes_c 3s 2s +50%
mlk_poly_tomont_native 3s 2s +50%
mlk_poly_tomsg 3s 3s +0%
mlk_polyvec_decompress_du 3s 2s +50%
mlk_polyvec_invntt_tomont 3s 2s +50%
mlk_polyvec_ntt 3s 4s -25%
mlk_polyvec_permute_bitrev_to_custom 3s 1s +200%
mlk_polyvec_tomont 3s 2s +50%
mlk_scalar_compress_d1 3s 1s +200%
mlk_scalar_compress_d10 3s 2s +50%
mlk_scalar_compress_d11 3s 3s +0%
mlk_scalar_decompress_d10 3s 6s -50%
mlk_scalar_decompress_d5 3s 3s +0%
mlk_shake256 3s 2s +50%
mlk_shake256x4 3s 4s -25%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 3s 2s +50%
rej_uniform_native 3s 5s -40%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
mlk_barrett_reduce 2s 2s +0%
mlk_ct_cmask_neg_i16 2s 4s -50%
mlk_ct_cmask_nonzero_u16 2s 2s +0%
mlk_ct_cmask_nonzero_u8 2s 1s +100%
mlk_ct_cmov_zero 2s 2s +0%
mlk_ct_get_optblocker_u32 2s 2s +0%
mlk_ct_sel_int16 2s 3s -33%
mlk_keccakf1600_extract_bytes 2s 2s +0%
mlk_keccakf1600_extract_bytes (big endian) 2s 1s +100%
mlk_keccakf1600_xor_bytes 2s 2s +0%
mlk_keccakf1600_xor_bytes (big endian) 2s 2s +0%
mlk_keccakf1600x4_extract_bytes 2s 1s +100%
mlk_keccakf1600x4_permute 2s 2s +0%
mlk_montgomery_reduce 2s 2s +0%
mlk_poly_add 2s 3s -33%
mlk_poly_cbd_eta1 2s 4s -50%
mlk_poly_decompress_du 2s 3s -33%
mlk_poly_decompress_dv 2s 2s +0%
mlk_poly_getnoise_eta1122_4x 2s 4s -50%
mlk_poly_getnoise_eta2 2s 2s +0%
mlk_poly_invntt_tomont 2s 2s +0%
mlk_poly_mulcache_compute_c 2s 1s +100%
mlk_poly_mulcache_compute_native 2s 3s -33%
mlk_poly_ntt_c 2s 3s -33%
mlk_poly_reduce 2s 5s -60%
mlk_polyvec_basemul_acc_montgomery_cached 2s 2s +0%
mlk_polyvec_compress_du 2s 3s -33%
mlk_polyvec_frombytes 2s 3s -33%
mlk_polyvec_permute_bitrev_to_custom_native 2s 4s -50%
mlk_rej_uniform 2s 1s +100%
mlk_scalar_compress_d4 2s 1s +100%
mlk_scalar_signed_to_unsigned_q 2s 3s -33%
mlk_shake128x4_absorb_once 2s 4s -50%
mlk_shake128x4_squeezeblocks 2s 3s -33%
mlk_value_barrier_u32 2s 4s -50%
poly_invntt_tomont_native 2s 2s +0%
poly_reduce_native_aarch64 2s 2s +0%
poly_tomont_native_aarch64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 2s 3s -33%
rej_uniform_native_aarch64 2s 3s -33%
sys_check_capability 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 1s 2s -50%
mlk_ct_get_optblocker_i32 1s 2s -50%
mlk_ct_memcmp 1s 2s -50%
mlk_poly_tomont 1s 4s -75%
mlk_poly_tomont_c 1s 2s -50%
mlk_polyvec_reduce 1s 2s -50%
mlk_polyvec_tobytes 1s 2s -50%
mlk_scalar_decompress_d11 1s 1s +0%
mlk_scalar_decompress_d4 1s 5s -80%
mlk_sha3_256 1s 2s -50%
mlk_sha3_512 1s 3s -67%
mlk_shake128_absorb_once 1s 2s -50%
mlk_shake128_squeezeblocks 1s 1s +0%
mlk_value_barrier_u8 1s 2s -50%
ntt_native_aarch64 1s 2s -50%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Jan 21, 2026

CBMC Results (ML-KEM-1024)

Full Results (139 proofs)
Proof Status Current Previous Change
**TOTAL** 1826s 1903s -4.0%
mlk_indcpa_enc 379s 407s -7%
mlk_indcpa_keypair_derand 280s 294s -5%
mlk_keccak_squeezeblocks_x4 141s 145s -3%
mlk_polyvec_add 132s 142s -7%
polyvec_basemul_acc_montgomery_cached_native 124s 130s -5%
mlk_rej_uniform_c 92s 94s -2%
mlk_polyvec_basemul_acc_montgomery_cached_c 57s 59s -3%
poly_ntt_native 45s 51s -12%
mlk_poly_rej_uniform 40s 45s -11%
mlk_poly_decompress_dv 37s 36s +3%
mlk_ntt_layer 33s 32s +3%
keccakf1600x4_permute_native_x4 20s 20s +0%
mlk_indcpa_dec 20s 21s -5%
mlk_poly_reduce_native 17s 15s +13%
mlk_keccak_absorb_once_x4 12s 12s +0%
mlk_ntt_butterfly_block 12s 11s +9%
mlk_poly_frombytes_native 11s 9s +22%
mlk_poly_sub 11s 10s +10%
mlk_keccak_squeezeblocks 9s 7s +29%
keccakf1600_permute_native 8s 7s +14%
kem_dec 8s 7s +14%
mlk_gen_matrix 8s 8s +0%
mlk_poly_frommsg 8s 10s -20%
mlk_poly_rej_uniform_x4 8s 8s +0%
mlk_keccak_squeeze_once 7s 6s +17%
mlk_shake256x4 7s 7s +0%
mlk_fqmul 6s 6s +0%
mlk_gen_matrix_serial 6s 6s +0%
mlk_invntt_layer 6s 6s +0%
mlk_poly_compress_du 6s 6s +0%
mlk_ct_cmask_nonzero_u8 5s 2s +150%
mlk_keccak_absorb_once 5s 5s +0%
mlk_montgomery_reduce 5s 1s +400%
mlk_polymat_permute_bitrev_to_custom 5s 5s +0%
kem_keypair 4s 3s +33%
mlk_keccakf1600_permute 4s 4s +0%
mlk_poly_getnoise_eta1_4x_native 4s 4s +0%
mlk_poly_getnoise_eta2 4s 2s +100%
mlk_poly_reduce_c 4s 3s +33%
mlk_poly_tobytes 4s 2s +100%
mlk_poly_tobytes_c 4s 4s +0%
mlk_polyvec_permute_bitrev_to_custom_native 4s 4s +0%
mlk_sha3_256 4s 2s +100%
poly_tobytes_native_aarch64 4s 3s +33%
rej_uniform_native 4s 3s +33%
intt_native_aarch64 3s 3s +0%
keccak_f1600_x4_native_aarch64_v84a 3s 1s +200%
kem_check_pk 3s 6s -50%
kem_enc_derand 3s 5s -40%
kem_keypair_derand 3s 3s +0%
mlk_ct_get_optblocker_i32 3s 1s +200%
mlk_keccakf1600_extract_bytes 3s 3s +0%
mlk_keccakf1600_extract_bytes (big endian) 3s 2s +50%
mlk_keccakf1600_xor_bytes (big endian) 3s 1s +200%
mlk_keccakf1600x4_xor_bytes 3s 2s +50%
mlk_matvec_mul 3s 1s +200%
mlk_poly_add 3s 2s +50%
mlk_poly_cbd_eta1 3s 2s +50%
mlk_poly_cbd_eta2 3s 2s +50%
mlk_poly_decompress_du 3s 3s +0%
mlk_poly_frombytes 3s 1s +200%
mlk_poly_frombytes_c 3s 2s +50%
mlk_poly_getnoise_eta1_4x 3s 4s -25%
mlk_poly_invntt_tomont 3s 5s -40%
mlk_poly_mulcache_compute_c 3s 3s +0%
mlk_poly_ntt 3s 4s -25%
mlk_poly_ntt_c 3s 2s +50%
mlk_poly_tomont 3s 2s +50%
mlk_poly_tomsg 3s 4s -25%
mlk_polyvec_invntt_tomont 3s 5s -40%
mlk_polyvec_reduce 3s 3s +0%
mlk_scalar_compress_d10 3s 4s -25%
mlk_scalar_decompress_d10 3s 3s +0%
mlk_scalar_decompress_d11 3s 4s -25%
mlk_scalar_decompress_d4 3s 2s +50%
mlk_scalar_decompress_d5 3s 2s +50%
mlk_scalar_signed_to_unsigned_q 3s 3s +0%
mlk_sha3_512 3s 3s +0%
mlk_shake128_absorb_once 3s 2s +50%
mlk_shake128x4_squeezeblocks 3s 2s +50%
mlk_value_barrier_i32 3s 2s +50%
mlk_value_barrier_u32 3s 3s +0%
poly_getnoise_eta1122_4x_native 3s 1s +200%
poly_reduce_native_aarch64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 3s 3s +0%
keccak_f1600_x1_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
kem_check_sk 2s 2s +0%
mlk_ct_cmov_zero 2s 3s -33%
mlk_ct_get_optblocker_u32 2s 1s +100%
mlk_ct_get_optblocker_u8 2s 2s +0%
mlk_keccakf1600_xor_bytes 2s 2s +0%
mlk_keccakf1600x4_permute 2s 2s +0%
mlk_poly_compress_dv 2s 2s +0%
mlk_poly_getnoise_eta1122_4x 2s 3s -33%
mlk_poly_invntt_tomont_c 2s 3s -33%
mlk_poly_mulcache_compute 2s 2s +0%
mlk_poly_mulcache_compute_native 2s 3s -33%
mlk_poly_reduce 2s 1s +100%
mlk_poly_tobytes_native 2s 1s +100%
mlk_poly_tomont_c 2s 3s -33%
mlk_poly_tomont_native 2s 2s +0%
mlk_polyvec_basemul_acc_montgomery_cached 2s 1s +100%
mlk_polyvec_compress_du 2s 1s +100%
mlk_polyvec_decompress_du 2s 3s -33%
mlk_polyvec_frombytes 2s 2s +0%
mlk_polyvec_mulcache_compute 2s 2s +0%
mlk_polyvec_ntt 2s 5s -60%
mlk_polyvec_permute_bitrev_to_custom 2s 2s +0%
mlk_polyvec_tobytes 2s 2s +0%
mlk_rej_uniform 2s 2s +0%
mlk_scalar_compress_d1 2s 3s -33%
mlk_scalar_compress_d11 2s 3s -33%
mlk_scalar_compress_d4 2s 2s +0%
mlk_shake128_squeezeblocks 2s 2s +0%
mlk_shake256 2s 2s +0%
mlk_value_barrier_u8 2s 3s -33%
ntt_native_aarch64 2s 3s -33%
poly_invntt_tomont_native 2s 3s -33%
poly_mulcache_compute_native_aarch64 2s 3s -33%
poly_tomont_native_aarch64 2s 1s +100%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 2s 2s +0%
sys_check_capability 2s 2s +0%
keccak_f1600_x1_native_aarch64 1s 3s -67%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 1s 2s -50%
kem_enc 1s 4s -75%
mlk_barrett_reduce 1s 2s -50%
mlk_check_pct 1s 3s -67%
mlk_ct_cmask_neg_i16 1s 3s -67%
mlk_ct_cmask_nonzero_u16 1s 1s +0%
mlk_ct_memcmp 1s 4s -75%
mlk_ct_sel_int16 1s 4s -75%
mlk_ct_sel_uint8 1s 3s -67%
mlk_keccakf1600x4_extract_bytes 1s 2s -50%
mlk_polyvec_tomont 1s 2s -50%
mlk_scalar_compress_d5 1s 2s -50%
mlk_shake128x4_absorb_once 1s 3s -67%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 1s 2s -50%
rej_uniform_native_aarch64 1s 4s -75%

Copy link
Copy Markdown
Contributor

@bremoran bremoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread .github/workflows/baremetal.yml
Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot test that the PMU code is correct on HW, but the CI and a local check confirm that it at least builds and runs in QEMU, which is good enough for now.

mkannwischer and others added 3 commits January 21, 2026 15:07
Make the benchmark parameters (NWARMUP, NITERATIONS, NTESTS)
configurable via CFLAGS by wrapping them in #ifndef guards and
renaming to MLK_BENCHMARK_NWARMUP, MLK_BENCHMARK_NITERATIONS,
and MLK_BENCHMARK_NTESTS.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Add PMU-based cycle counting support for Armv8.1-M Cortex-M processors.
This uses the CMSIS PMU APIs for portable cycle counter access.

 - Resolves #1502

Co-Authored-By: Brendan Moran <brendan.moran@arm.com>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
The cycle counts will be zero, but it still tests the PMU code builds.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
@hanno-becker hanno-becker merged commit 51d942a into main Jan 21, 2026
780 of 781 checks passed
@hanno-becker hanno-becker deleted the armv8-bench branch January 21, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Armv8.1-M: Add PMU benchmarking code

4 participants