Skip to content

VM Specification Validation Test Suite#4408

Open
SRIKKANTH wants to merge 10 commits intomainfrom
smyakam/26_04_06/sku_checker
Open

VM Specification Validation Test Suite#4408
SRIKKANTH wants to merge 10 commits intomainfrom
smyakam/26_04_06/sku_checker

Conversation

@SRIKKANTH
Copy link
Copy Markdown
Collaborator

@SRIKKANTH SRIKKANTH commented Apr 6, 2026

Description

This module validates that Azure VMs match their declared hardware specifications.

A sample run book and CSV file included in examples.

The CSV file defines the expected properties for each VM size (CPU count, memory, NIC count, max disks, local NVMe disks, IOPS, etc.), and the LISA CSV combinator in runbook iterates over each row, provisioning a VM and running validation tests against it.

Related Issue

None

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Refactoring
  • Documentation update

Checklist

  • Description is filled in above
  • No credentials, secrets, or internal details are included
  • Peer review requested (if not, add required peer reviewers after raising PR)
  • Tests executed and results posted below

Test Validation

Key Test Cases:

  • verify_vm_cpu_count
  • verify_vm_memory
  • verify_vm_gpu_count
  • verify_vm_sriov_nic_count
  • verify_vm_synthetic_nic_count
  • verify_vm_max_premium_ssd_disk_count
  • verify_vm_nvme_disk_count
  • verify_vm_nvme_disk_iops
  • verify_vm_premium_ssd_iops
  • verify_vm_premium_ssd_throughput
  • verify_vm_network_bandwidth

Impacted LISA Features:
None
Tested Azure Marketplace Images:

  • Canonical:0001-com-ubuntu-server-jammy:22_04-lts-gen2:latest

Test Results

Image VM Size Result
Canonical:0001-com-ubuntu-server-jammy:22_04-lts-gen2:latest Standard_D32s_v5 PASSED

Copilot AI review requested due to automatic review settings April 6, 2026 11:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new LISA test suite and sample runbook/CSV to validate that provisioned Azure VMs match declared hardware specifications (CPU, memory, NICs, disks, NVMe, and optional perf thresholds), driven by a CSV combinator.

Changes:

  • Introduces VmSpecValidation test suite with checks for CPU/memory, NIC counts, disk/NVMe counts, and basic fio/ntttcp perf validation.
  • Adds an example runbook using the CSV combinator to iterate VM sizes and pass expected_* variables into each test iteration.
  • Adds a sample vm_specs.csv with one VM size row to demonstrate the format.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
lisa/microsoft/testsuites/vm_spec_validation/vm_spec_validation.py New VM spec validation test suite (functional + light perf checks).
lisa/microsoft/runbook/examples/vm_specs.csv Sample expected-spec CSV schema + example row.
lisa/microsoft/runbook/examples/vm_spec_validation.yml Example runbook wiring CSV combinator variables into the new suite.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 6, 2026 13:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

Comment on lines +172 to +186
@TestCaseMetadata(
description="""
Verify that the VM exposes the expected number of GPU devices.

This test is skipped when the ``expected_gpu_count`` variable
is empty or zero.

Steps:
1. Read expected_gpu_count from the runbook variables.
2. Query GPU PCI devices via ``lspci``.
3. Assert the GPU device count matches expected_gpu_count.
""",
priority=5,
requirement=simple_requirement(min_gpu_count=1),
)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case claims it is skipped when expected_gpu_count is empty/zero, but the requirement includes simple_requirement(min_gpu_count=1), which will prevent the case from running on non-GPU SKUs (so it won't actually exercise the skip logic or validate expected_gpu_count=0). Consider removing the min_gpu_count requirement and letting the test skip based on the variable, or alternatively update the description/variables to align with the enforced GPU requirement.

Copilot uses AI. Check for mistakes.
Comment on lines +691 to +735
# Start ntttcp receiver (server) in the background
server_process = server_ntttcp.run_as_server_async(
server_nic_name,
run_time_seconds=60, # 60s for a stable throughput measurement
ports_count=ports_count,
)

try:
# Run ntttcp sender (client)
client_ntttcp.run_as_client(
client_nic_name,
server_ip=server_node.internal_address,
threads_count=1,
run_time_seconds=60, # match the server duration
ports_count=ports_count,
)
finally:
# Stop the server and collect its result
server_node.tools[Kill].by_name(server_ntttcp.command)

server_executable_result = server_process.wait_result()
server_result = server_ntttcp.create_ntttcp_result(
server_executable_result
)

# ntttcp reports throughput in Gbps — convert to Mbps to match
# the runbook variable unit.
measured_bw_mbps = int(server_result.throughput_in_gbps * 1000)

# Allow tolerance below the declared max
bw_floor_mbps = int(
expected_bw_mbps * (100 - _PERF_TOLERANCE_PERCENT) / 100
)
log.info(
f"VM size: {vm_size} — ntttcp network bandwidth: "
f"measured {measured_bw_mbps} Mbps, "
f"expected >= {bw_floor_mbps} Mbps "
f"(declared: {expected_bw_mbps} Mbps, "
f"tolerance: {_PERF_TOLERANCE_PERCENT}%)"
)
assert_that(measured_bw_mbps).described_as(
f"VM size {vm_size}: expected network throughput "
f">= {bw_floor_mbps} Mbps (declared {expected_bw_mbps} Mbps "
f"with {_PERF_TOLERANCE_PERCENT}% tolerance) but measured "
f"only {measured_bw_mbps} Mbps"
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bandwidth validation, this uses ntttcp directly without the usual Ntttcp.setup_system(...)/restore_system(...) tuning and without any retry/cleanup logic. Existing ntttcp-based perf tests (e.g. lisa/microsoft/testsuites/performance/common.py) explicitly tune sysctls/firewall and include retries to reduce flakiness. Consider following the same pattern here to improve reliability of the measured throughput vs. declared expectations.

Suggested change
# Start ntttcp receiver (server) in the background
server_process = server_ntttcp.run_as_server_async(
server_nic_name,
run_time_seconds=60, # 60s for a stable throughput measurement
ports_count=ports_count,
)
try:
# Run ntttcp sender (client)
client_ntttcp.run_as_client(
client_nic_name,
server_ip=server_node.internal_address,
threads_count=1,
run_time_seconds=60, # match the server duration
ports_count=ports_count,
)
finally:
# Stop the server and collect its result
server_node.tools[Kill].by_name(server_ntttcp.command)
server_executable_result = server_process.wait_result()
server_result = server_ntttcp.create_ntttcp_result(
server_executable_result
)
# ntttcp reports throughput in Gbps — convert to Mbps to match
# the runbook variable unit.
measured_bw_mbps = int(server_result.throughput_in_gbps * 1000)
# Allow tolerance below the declared max
bw_floor_mbps = int(
expected_bw_mbps * (100 - _PERF_TOLERANCE_PERCENT) / 100
)
log.info(
f"VM size: {vm_size} — ntttcp network bandwidth: "
f"measured {measured_bw_mbps} Mbps, "
f"expected >= {bw_floor_mbps} Mbps "
f"(declared: {expected_bw_mbps} Mbps, "
f"tolerance: {_PERF_TOLERANCE_PERCENT}%)"
)
assert_that(measured_bw_mbps).described_as(
f"VM size {vm_size}: expected network throughput "
f">= {bw_floor_mbps} Mbps (declared {expected_bw_mbps} Mbps "
f"with {_PERF_TOLERANCE_PERCENT}% tolerance) but measured "
f"only {measured_bw_mbps} Mbps"
# Allow tolerance below the declared max
bw_floor_mbps = int(
expected_bw_mbps * (100 - _PERF_TOLERANCE_PERCENT) / 100
)
# Retry up to 3 times to reduce transient bandwidth variance.
max_attempts = 3
best_measured_bw_mbps = 0
server_ntttcp.setup_system()
client_ntttcp.setup_system()
try:
for attempt in range(1, max_attempts + 1):
# Start ntttcp receiver (server) in the background
server_process = server_ntttcp.run_as_server_async(
server_nic_name,
run_time_seconds=60, # 60s for a stable throughput measurement
ports_count=ports_count,
)
try:
# Run ntttcp sender (client)
client_ntttcp.run_as_client(
client_nic_name,
server_ip=server_node.internal_address,
threads_count=1,
run_time_seconds=60, # match the server duration
ports_count=ports_count,
)
finally:
# Stop the server and collect its result
server_node.tools[Kill].by_name(server_ntttcp.command)
server_executable_result = server_process.wait_result()
server_result = server_ntttcp.create_ntttcp_result(
server_executable_result
)
# ntttcp reports throughput in Gbps — convert to Mbps to match
# the runbook variable unit.
measured_bw_mbps = int(server_result.throughput_in_gbps * 1000)
best_measured_bw_mbps = max(
best_measured_bw_mbps, measured_bw_mbps
)
log.info(
f"VM size: {vm_size} — ntttcp network bandwidth attempt "
f"{attempt}/{max_attempts}: measured {measured_bw_mbps} Mbps, "
f"expected >= {bw_floor_mbps} Mbps "
f"(declared: {expected_bw_mbps} Mbps, "
f"tolerance: {_PERF_TOLERANCE_PERCENT}%)"
)
if measured_bw_mbps >= bw_floor_mbps:
break
finally:
server_node.tools[Kill].by_name(server_ntttcp.command)
server_ntttcp.restore_system()
client_ntttcp.restore_system()
log.info(
f"VM size: {vm_size} — best ntttcp network bandwidth: "
f"measured {best_measured_bw_mbps} Mbps, "
f"expected >= {bw_floor_mbps} Mbps "
f"(declared: {expected_bw_mbps} Mbps, "
f"tolerance: {_PERF_TOLERANCE_PERCENT}%)"
)
assert_that(best_measured_bw_mbps).described_as(
f"VM size {vm_size}: expected network throughput "
f">= {bw_floor_mbps} Mbps (declared {expected_bw_mbps} Mbps "
f"with {_PERF_TOLERANCE_PERCENT}% tolerance) but best measured "
f"value across {max_attempts} attempt(s) was only "
f"{best_measured_bw_mbps} Mbps"

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +31
# Usage:
# lisa -r lisa\microsoft\runbook\examples\vm_spec_validation.yml \
# --variable "subscription_id:<your_sub_id>" \
# --variable "csv_file:lisa\microsoft\runbook\examples\vm_specs.csv"
#
# You can also override location, marketplace_image, etc. from the CLI.

test_pass: "adhoc"

name: vm_spec_validation

include:
- path: ../azure.yml

extension:
- ../../testsuites

variable:
# Path to the CSV file with VM specifications (override via CLI if needed)
- name: csv_file
value: 'lisa\microsoft\runbook\examples\vm_specs.csv'

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runbook example uses Windows-style backslashes in paths (both in the CLI example and the default csv_file value). Other runbook examples in this repo use forward slashes (e.g. lisa/microsoft/runbook/examples/git_bisect.yml), which are more portable across OSes/shells. Consider switching to forward slashes in the example paths to make the sample runbook work out-of-the-box on Linux runners.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 6, 2026 13:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

) -> None:
expected_memory_gb = _get_int_var(variables, VAR_EXPECTED_MEMORY_GB)
expected_memory_mb = expected_memory_gb * 1024
actual_memory_mb = node.tools[Free].get_free_memory_mb()
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify_vm_memory is comparing the expected total RAM against Free.get_free_memory_mb(), which returns free/unused memory and will vary with workload/OS caches. This will cause false failures even when the VM has the correct total RAM. Use total memory (e.g., Free.get_total_memory_gb() / total in MiB) for the comparison instead of free memory.

Suggested change
actual_memory_mb = node.tools[Free].get_free_memory_mb()
actual_memory_mb = int(node.tools[Free].get_total_memory_gb() * 1024)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_free_memory_mb gives better granularity than "get_total_memory_gb() * 1024"

Comment on lines +658 to +666
expected_bw * (100 + _PERF_TOLERANCE_PERCENT) / 100
)
log.info(
f"VM size: {vm_size} - fio seq read across "
f"{len(data_disks)} disk(s): "
f"measured {measured_bw} MiB/s, "
f"expected {bw_floor}-{bw_ceiling} MiB/s "
f"(declared: {expected_bw} MBps)"
)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The throughput check mixes units: the docstring/CSV variable is labeled as "MBps" but the measurement derives from fio IOPS with block_size=1024K, which is effectively MiB/s. Either define the runbook variable as MiB/s, or convert MB/s to MiB/s (1000/1024) before applying the tolerance and reporting results to avoid confusion and inconsistent expectations.

Copilot uses AI. Check for mistakes.
iodepth=64,
numjob=4,
block_size="4K",
size_gb=8192,
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fio.launch(..., size_gb=8192, ...) is potentially misleading: in LISA's Fio tool, size_gb is passed as --size=<value>M (MiB), so 8192 means ~8 GiB per target, not 8192 GiB. Add an inline comment clarifying the intended unit/size (or adjust the value) so future readers don't assume this is a multi-terabyte workload.

Suggested change
size_gb=8192,
size_gb=8192, # LISA Fio passes this as MiB, so this is ~8 GiB per target.

Copilot uses AI. Check for mistakes.
Comment on lines +541 to +548
filename=filename,
mode="randread",
iodepth=64,
numjob=thread_count,
block_size="4K",
size_gb=8192,
time=120,
overwrite=True,
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fio workload parameters here (e.g., iodepth=64, numjob=thread_count, time=120) materially change test duration and load but are currently undocumented magic numbers. Add short inline comments (or named module-level constants) for these parameters so it’s clear why these values were chosen and how to tune them safely.

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +106
class VmSpecValidation(TestSuite):
"""Validate VM hardware against runbook-declared specifications."""

# ------------------------------------------------------------------
# CPU validation
# ------------------------------------------------------------------
@TestCaseMetadata(
description="""
Verify that the VM's vCPU count matches the expected value from
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description’s listed "Key Test Cases" includes names that don’t match the implemented methods in this suite (e.g., it mentions verify_vm_max_premium_data_disks / verify_vm_premium_ssd_storage_throughput / verify_vm_spec_summary, but the code defines verify_vm_max_premium_ssd_disk_count / verify_vm_premium_ssd_throughput and has no verify_vm_spec_summary). Please update the PR description (or align method names/implement the missing case) so readers can reliably run the referenced tests.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 7, 2026 05:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

Comment on lines +74 to +84
# Strip any non-digit characters (e.g. commas, units, spaces) before parsing.
digits_only = re.sub(r"[^\d]", "", str(raw))
if not digits_only:
raise SkippedException(
f"Variable '{name}' contains no digits - skipping check."
)
value = int(digits_only)
if value <= 0:
raise SkippedException(
f"Variable '{name}' is {value} (zero or negative) - skipping check."
)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_int_var() removes all non-digit characters before parsing. This drops the sign, so values like "-1" become "1" and will not be skipped as intended. Consider parsing with a regex that preserves an optional leading +/- (or explicitly rejecting any value containing '-' before stripping), so negative values reliably trigger the "zero or negative" skip path.

Copilot uses AI. Check for mistakes.
iodepth=64,
numjob=4,
block_size="4K",
size_gb=8192,
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fio wrapper's size_gb argument is passed through as --size=<value>M (MiB) in lisa/tools/fio.py, so size_gb=8192 actually means ~8 GiB. This is easy to misread and accidentally change. Please add an inline comment (or define a named constant) clarifying the unit and intent for 8192 here (and consider using an expression like 8 * 1024 for readability).

Suggested change
size_gb=8192,
# Fio.size_gb is passed through as MiB (--size=<value>M),
# so 8 * 1024 here intentionally means an 8 GiB fio size.
size_gb=8 * 1024,

Copilot uses AI. Check for mistakes.
Comment on lines +455 to +461
mode="randread",
iodepth=64,
numjob=4,
block_size="4K",
size_gb=8192,
time=120,
overwrite=True,
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fio parameters include several hard-coded values (iodepth=64, numjob=4, runtime=120, size_gb=8192) that materially control test behavior and result stability. Per repo guidelines, please either introduce named constants for them or add brief inline comments explaining why these specific values were chosen.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +640 to +648
filename=filename,
mode="read",
iodepth=64,
numjob=thread_count,
block_size="1024K",
size_gb=8192,
time=120,
overwrite=True,
)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This throughput test also uses hard-coded fio parameters (iodepth=64, numjob=thread_count, block_size=1024K, runtime=120, size_gb=8192). Please add inline comments or pull them into named constants so future maintainers can safely tune them and understand the tradeoffs.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +339 to +343
def verify_vm_max_premium_ssd_disk_count(
self, node: Node, log: Logger, variables: Dict[str, Any]
) -> None:
expected_max_disks = _get_int_var(variables, VAR_EXPECTED_MAX_DISKS)
vm_size = variables.get(VAR_VM_SIZE, "unknown")
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/test list mentions verify_vm_max_premium_data_disks, but this suite defines verify_vm_max_premium_ssd_disk_count. To avoid confusion for runbook authors and reviewers, please align the method name (or update the PR description/runbook docs) so the documented test case names match the actual ones.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 7, 2026 16:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

Comment on lines +535 to +542
best_iops = 0
for i in range(_PERF_ITERATIONS):
result = fio.launch(
name=f"iops_all_disks_{i}",
filename=filename,
mode="randread",
iodepth=64,
numjob=thread_count,
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numjob=thread_count with iodepth=64 can create an extremely high total queue depth on large core-count VMs and trigger libaio limits (EAGAIN) or severe perf drop. Consider capping numjob (the repo’s perf helpers cap to 256 for libaio) and/or bounding total qdepth to reduce flakiness.

Suggested change
best_iops = 0
for i in range(_PERF_ITERATIONS):
result = fio.launch(
name=f"iops_all_disks_{i}",
filename=filename,
mode="randread",
iodepth=64,
numjob=thread_count,
fio_numjob = min(
thread_count, 256
) # Cap libaio jobs to avoid EAGAIN/perf collapse on large-core VMs.
max_total_queue_depth = (
256 * 64
) # Preserve the prior effective upper bound of numjob * iodepth.
fio_iodepth = max(
1, max_total_queue_depth // fio_numjob
) # Ensure total queue depth stays bounded while keeping iodepth >= 1.
log.info(
f"VM size: {vm_size} - fio settings for aggregate IOPS: "
f"thread_count={thread_count}, numjob={fio_numjob}, "
f"iodepth={fio_iodepth}, total_queue_depth={fio_numjob * fio_iodepth}"
)
best_iops = 0
for i in range(_PERF_ITERATIONS):
result = fio.launch(
name=f"iops_all_disks_{i}",
filename=filename,
mode="randread",
iodepth=fio_iodepth,
numjob=fio_numjob,

Copilot uses AI. Check for mistakes.
Comment on lines +629 to +644
fio = node.tools[Fio]
cpu = node.tools[Lscpu]
thread_count = cpu.get_thread_count()
best_bw_mbps = 0
for i in range(_PERF_ITERATIONS):
result = fio.launch(
name=f"storage_throughput_all_disks_{i}",
filename=filename,
mode="read",
iodepth=64,
numjob=thread_count,
block_size="1024K",
size_gb=8192,
time=120,
overwrite=True,
)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same fio concurrency concern here: numjob=thread_count (with iodepth=64) can exceed async I/O limits on large VMs. Consider capping numjob similarly to other perf code in the repo to avoid intermittent failures.

Copilot uses AI. Check for mistakes.
Comment on lines +659 to +676
bw_ceiling = int(
expected_bw * (100 + _PERF_TOLERANCE_PERCENT) / 100
)
log.info(
f"VM size: {vm_size} - fio seq read across "
f"{len(data_disks)} disk(s): "
f"measured {measured_bw} MBps, "
f"expected {bw_floor}-{bw_ceiling} MBps "
f"(declared: {expected_bw} MBps)"
)
assert_that(measured_bw).described_as(
f"VM size {vm_size}: expected storage throughput "
f"between {bw_floor} and {bw_ceiling} MBps "
f"(declared {expected_bw} MBps with "
f"{_PERF_TOLERANCE_PERCENT}% tolerance) across "
f"{len(data_disks)} disk(s) but measured "
f"{measured_bw} MBps"
).is_between(bw_floor, bw_ceiling)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test description says it validates throughput is “at least” the expected value, but the assertion enforces an upper bound (is_between(bw_floor, bw_ceiling)). If the measured throughput exceeds the declared value by > tolerance, this will fail even though it meets the “at least” requirement. Either update the docstring to match a bounded-range check, or change the assertion to enforce only a minimum.

Suggested change
bw_ceiling = int(
expected_bw * (100 + _PERF_TOLERANCE_PERCENT) / 100
)
log.info(
f"VM size: {vm_size} - fio seq read across "
f"{len(data_disks)} disk(s): "
f"measured {measured_bw} MBps, "
f"expected {bw_floor}-{bw_ceiling} MBps "
f"(declared: {expected_bw} MBps)"
)
assert_that(measured_bw).described_as(
f"VM size {vm_size}: expected storage throughput "
f"between {bw_floor} and {bw_ceiling} MBps "
f"(declared {expected_bw} MBps with "
f"{_PERF_TOLERANCE_PERCENT}% tolerance) across "
f"{len(data_disks)} disk(s) but measured "
f"{measured_bw} MBps"
).is_between(bw_floor, bw_ceiling)
log.info(
f"VM size: {vm_size} - fio seq read across "
f"{len(data_disks)} disk(s): "
f"measured {measured_bw} MBps, "
f"expected at least {bw_floor} MBps "
f"(declared: {expected_bw} MBps)"
)
assert_that(measured_bw).described_as(
f"VM size {vm_size}: expected storage throughput "
f"at least {bw_floor} MBps "
f"(declared {expected_bw} MBps with "
f"{_PERF_TOLERANCE_PERCENT}% tolerance) across "
f"{len(data_disks)} disk(s) but measured "
f"{measured_bw} MBps"
).is_greater_than_or_equal_to(bw_floor)

Copilot uses AI. Check for mistakes.
Comment on lines +781 to +805
# Allow tolerance around the declared max
bw_floor_mbps = int(
expected_bw_mbps
* (100 - _PERF_TOLERANCE_PERCENT)
/ 100
)
bw_ceiling_mbps = int(
expected_bw_mbps
* (100 + _PERF_TOLERANCE_PERCENT)
/ 100
)
log.info(
f"VM size: {vm_size} — ntttcp network bandwidth: "
f"measured {measured_bw_mbps} Mbps, "
f"expected {bw_floor_mbps}-{bw_ceiling_mbps} Mbps "
f"(declared: {expected_bw_mbps} Mbps, "
f"tolerance: {_PERF_TOLERANCE_PERCENT}%)"
)
assert_that(measured_bw_mbps).described_as(
f"VM size {vm_size}: expected network throughput "
f"between {bw_floor_mbps} and {bw_ceiling_mbps} Mbps "
f"(declared {expected_bw_mbps} Mbps with "
f"{_PERF_TOLERANCE_PERCENT}% tolerance) but measured "
f"{measured_bw_mbps} Mbps"
).is_between(bw_floor_mbps, bw_ceiling_mbps)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The network bandwidth docstring says “at least” expected throughput, but the assertion enforces an upper bound (is_between). This can fail if the VM measures higher than the declared spec. Either document that this is a “within tolerance” validation, or assert only a lower bound.

Suggested change
# Allow tolerance around the declared max
bw_floor_mbps = int(
expected_bw_mbps
* (100 - _PERF_TOLERANCE_PERCENT)
/ 100
)
bw_ceiling_mbps = int(
expected_bw_mbps
* (100 + _PERF_TOLERANCE_PERCENT)
/ 100
)
log.info(
f"VM size: {vm_size} — ntttcp network bandwidth: "
f"measured {measured_bw_mbps} Mbps, "
f"expected {bw_floor_mbps}-{bw_ceiling_mbps} Mbps "
f"(declared: {expected_bw_mbps} Mbps, "
f"tolerance: {_PERF_TOLERANCE_PERCENT}%)"
)
assert_that(measured_bw_mbps).described_as(
f"VM size {vm_size}: expected network throughput "
f"between {bw_floor_mbps} and {bw_ceiling_mbps} Mbps "
f"(declared {expected_bw_mbps} Mbps with "
f"{_PERF_TOLERANCE_PERCENT}% tolerance) but measured "
f"{measured_bw_mbps} Mbps"
).is_between(bw_floor_mbps, bw_ceiling_mbps)
# Allow tolerance below the declared throughput to account for
# measurement variance while still enforcing a minimum expectation.
bw_floor_mbps = int(
expected_bw_mbps
* (100 - _PERF_TOLERANCE_PERCENT)
/ 100
)
log.info(
f"VM size: {vm_size} — ntttcp network bandwidth: "
f"measured {measured_bw_mbps} Mbps, "
f"expected at least {bw_floor_mbps} Mbps "
f"(declared: {expected_bw_mbps} Mbps, "
f"tolerance: {_PERF_TOLERANCE_PERCENT}%)"
)
assert_that(measured_bw_mbps).described_as(
f"VM size {vm_size}: expected network throughput "
f"at least {bw_floor_mbps} Mbps "
f"(declared {expected_bw_mbps} Mbps with "
f"{_PERF_TOLERANCE_PERCENT}% tolerance) but measured "
f"{measured_bw_mbps} Mbps"
).is_greater_than_or_equal_to(bw_floor_mbps)

Copilot uses AI. Check for mistakes.
Comment on lines +448 to +457
result = fio.launch(
name=f"nvme_iops_all_disks_{i}",
filename=filename,
mode="randread",
iodepth=64,
numjob=4,
block_size="4K",
size_gb=8192,
time=120,
overwrite=True,
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fio.launch(size_gb=...) is passed to fio as --size=<value>M (MiB). Using size_gb=8192 is likely intended as 8192 MiB (~8 GiB), but the code reads like 8192 GB. Consider introducing a clearly named local constant (e.g., size_mb = 8192) and passing that, or add an inline note about units.

Copilot uses AI. Check for mistakes.
Comment on lines +511 to +513
def verify_vm_premium_ssd_iops(
self, node: Node, log: Logger, variables: Dict[str, Any]
) -> None:
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description’s “Key Test Cases” list doesn’t match the implemented method names (e.g., it lists verify_vm_max_premium_data_disks/verify_vm_premium_disk_iops/verify_vm_premium_ssd_storage_throughput, but this suite defines verify_vm_max_premium_ssd_disk_count/verify_vm_premium_ssd_iops/verify_vm_premium_ssd_throughput). Please align the PR description (and any run instructions) with the actual test method names to avoid confusion when selecting cases.

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +31
# Usage:
# lisa -r lisa\microsoft\runbook\examples\vm_spec_validation.yml \
# --variable "subscription_id:<your_sub_id>" \
# --variable "csv_file:lisa\microsoft\runbook\examples\vm_specs.csv"
#
# You can also override location, marketplace_image, etc. from the CLI.

test_pass: "adhoc"

name: vm_spec_validation

include:
- path: ../azure.yml

extension:
- ../../testsuites

variable:
# Path to the CSV file with VM specifications (override via CLI if needed)
- name: csv_file
value: 'lisa\microsoft\runbook\examples\vm_specs.csv'

Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example runbook uses Windows-style backslashes in paths (e.g., lisa\microsoft\...). On Linux/macOS runners this is usually treated as a literal backslash and can cause “file not found”. Consider switching example paths to forward slashes for cross-platform usability.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants