Skip to content

<refactor>[compute]: split VmInstanceManagerImpl 2808L into 6 SubManagers#3400

Open
zstack-robot-1 wants to merge 1 commit into5.5.12from
sync/ye.zou/refactor/vm-instance-manager-split
Open

<refactor>[compute]: split VmInstanceManagerImpl 2808L into 6 SubManagers#3400
zstack-robot-1 wants to merge 1 commit into5.5.12from
sync/ye.zou/refactor/vm-instance-manager-split

Conversation

@zstack-robot-1
Copy link
Collaborator

背景

VmInstanceManagerImpl.java 累计达到 2808 行,严重违反 God Object 规范(硬上限 1500 行),违反 docs/rules/anti-degradation.md Rule 1。

Resolves: ZSTAC-82456

变更内容

将主类拆分为 6 个职责单一的 SubManager:

新文件 职责 行数
VmCreationSubManager.java VM 创建流程 + 候选机资源查询消息处理 ~1174
VmNicApiSubManager.java NIC API 消息(创建/删除/查网络服务) ~347
VmQuotaSubManager.java 配额上报(ReportQuotaExtensionPoint) ~221
VmExtensionPointAdapter.java L3 删除 / Host 状态 / Owner 变更等扩展点 ~255
VmExpungeSubManager.java VM 过期清理周期任务 + GlobalConfig 监听 ~202
VmFlowChainRegistry.java 12 个 FlowChain Builder 注册与 getter ~159

效果

  • VmInstanceManagerImpl.java: 2808 → 809 行(缩减 71%)
  • 主类保留薄门面层:消息路由 + Spring 生命周期
  • 所有新 Bean 已注册到 VmInstanceManager.xml

验证

  • mvn compile -pl compute -am -Dmaven.test.skipexit 0
  • Change-Id 已生成(Gerrit 兼容)

sync from gitlab !9248

…gers

Resolves: ZSTAC-82456

Change-Id: Ifecc0b1a31fb5fb2c37c8e9f3893bf2096c51ce3
@coderabbitai
Copy link

coderabbitai bot commented Feb 28, 2026

概览

本次变更引入了六个新的虚拟机(VM)管理子模块类,用于将虚拟机创建、过期清理、扩展点处理、API接口、工作流链管理和配额计算等功能进行模块化分离,提升代码组织性和可维护性。总计新增约2,358行代码。

变更内容

模块/文件(s) 摘要
虚拟机创建逻辑
compute/src/main/java/org/zstack/compute/vm/VmCreationSubManager.java
新增VmCreationSubManager类,封装VM创建和候选资源查询逻辑。包含doCreateVmInstance编排多步创建流程(预创建钩子、镜像/VO初始化、FlowChain流程、系统标签、SSH密钥对、instant-start),以及APIGetVmsCapabilitiesMsg、APIGetCandidateZonesClustersHostsForCreatingVmMsg等多个API消息处理器;提供独立的辅助方法计算备份存储、镜像、L3网络间的相互依赖关系。
虚拟机过期管理
compute/src/main/java/org/zstack/compute/vm/VmExpungeSubManager.java
新增VmExpungeSubManager类,实现定期后台任务清理已销毁的虚拟机。按配置的保留期限检查VmInstanceVO的lastOpDate,若超期则根据删除策略决定是否发送ExpungeVmMsg;支持全局配置更新器动态重启定时任务。
扩展点适配器
compute/src/main/java/org/zstack/compute/vm/VmExtensionPointAdapter.java
新增VmExtensionPointAdapter类,实现L3NetworkDeleteExtensionPoint、ResourceOwnerAfterChangeExtensionPoint、AfterChangeHostStatusExtensionPoint、VmInstanceMigrateExtensionPoint、VmInstanceBeforeStartExtensionPoint等五个扩展点接口,处理L3网络删除、资源所有权变更、主机状态变化、VM迁移、VM启动前的系统标签等各类VM生命周期事件。
工作流链管理
compute/src/main/java/org/zstack/compute/vm/VmFlowChainRegistry.java
新增VmFlowChainRegistry类,集中管理VM各操作阶段的FlowChainBuilder实例,包括创建、启动、停止、重启、销毁、迁移、附加卷、附加/卸载ISO、过期清理、暂停、恢复等13种工作流链;提供buildFlowChains初始化所有构建器,并为每种操作提供对应的getter方法。
虚拟机NIC API处理
compute/src/main/java/org/zstack/compute/vm/VmNicApiSubManager.java
新增VmNicApiSubManager类,封装VM网卡API处理逻辑。实现APICreateVmNicMsg、APIGetVmNicAttachedNetworkServiceMsg、APIDeleteVmNicMsg处理,包括NIC创建(MAC分配、IP分配、扩展点支持)、网络服务查询、NIC删除(区分已挂载/未挂载VM两种场景,含FlowChain流程和IP回收)。
虚拟机配额管理
compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java
新增VmQuotaSubManager类,实现ReportQuotaExtensionPoint接口,定义和注册VM相关配额检查(VM总数、运行数、运行CPU、运行内存、数据卷数、卷大小)。为APICreateVmInstanceMsg、APIStartVmInstanceMsg、APICreateDataVolumeMsg、APIChangeResourceOwnerMsg等API消息提供配额验证逻辑,包括镜像根卷大小、磁盘配置检查等。

代码审查工作量评估

🎯 4 (复杂) | ⏱️ ~60 分钟

🐰 兔兔跳跳来贺春,
六个新模块理纷纷,
创建、清理、链流转,
配额网卡齐管理,
虚机世界井井然! 🌈✨


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Title check ❌ Error PR title exceeds 72-character limit at 73 characters. Reduce title length to 72 characters or less, e.g.: '[compute]: split VmInstanceManagerImpl into 6 SubManagers'
Docstring Coverage ⚠️ Warning Docstring coverage is 7.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed PR description clearly relates to the changeset, detailing the refactoring of VmInstanceManagerImpl into 6 SubManagers with context, file mapping, and validation results.
✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sync/ye.zou/refactor/vm-instance-manager-split

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
compute/src/main/java/org/zstack/compute/vm/VmExpungeSubManager.java (1)

167-169: 回调中用 indexOf 反查消息是 O(n²),建议按下标遍历

这里可以直接按索引遍历 replies,避免重复查找并提升可读性。

♻️ 建议重构
-                        for (MessageReply r : replies) {
-                            ExpungeVmMsg msg = msgs.get(replies.indexOf(r));
+                        for (int i = 0; i < replies.size(); i++) {
+                            MessageReply r = replies.get(i);
+                            ExpungeVmMsg msg = msgs.get(i);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/VmExpungeSubManager.java` around
lines 167 - 169, The loop in VmExpungeSubManager currently finds the matching
ExpungeVmMsg by calling msgs.get(replies.indexOf(r)) inside a for-each over
replies, causing O(n²); change the loop to iterate by index (for i from 0 to
replies.size()-1) or use an indexed for loop so you can directly access
replies.get(i) and msgs.get(i), then use MessageReply r = replies.get(i) and
ExpungeVmMsg msg = msgs.get(i) to handle the success/failure logic; update any
references inside the loop accordingly to avoid repeated indexOf calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmCreationSubManager.java`:
- Around line 1152-1156: The code in VmCreationSubManager initializes result
with Collections.emptyList(), then calls result.add(...) in defensive branches
which throws UnsupportedOperationException; change the initialization in the
affected helper methods (the methods that declare List<ErrorCode> result =
Collections.emptyList()) to return a mutable list (e.g., new ArrayList<>()) so
result.add(...) is safe, and update both occurrences so all subsequent adds to
result succeed; keep the rest of the logic in the same methods (refer to the
result variable and the methods in VmCreationSubManager that perform the
null/msg checks).
- Around line 577-600: Inside VmCreationSubManager (the method containing the
loop over pss) you are causing an N+1 by querying BackupStorageVO inside the for
(PrimaryStorageVO ps : pss) loop; instead, collect all bsUuids from every
PrimaryStorageVO first (use
PrimaryStorageType.valueOf(ps.getType()).findBackupStorage(ps.getUuid()) for
each ps), deduplicate into one Set, run a single query
TypedQuery<BackupStorageVO> with "where bs.uuid in (:uuids)" to fetch all
BackupStorageVOs at once, then build a Map of backupUuid->BackupStorageVO and
aggregate/backfill the original per-primary-storage lists (or create a
Map<PrimaryStorageVO, List<BackupStorageVO>>) so you can replace the per-ps
bq.getResultList() calls with lookups into that map and remove the loop-in-query
pattern.

In `@compute/src/main/java/org/zstack/compute/vm/VmExpungeSubManager.java`:
- Around line 63-65: VmExpungeSubManager.stop() currently returns true without
cancelling the scheduled periodic task, causing a lifecycle leak; update the
stop() method in class VmExpungeSubManager to cancel the expungeVmTask (check
expungeVmTask != null), call its cancel(...) method on the
ScheduledFuture/TimerTask instance, optionally call shutdown on the scheduler if
owned by this class, and null out the expungeVmTask reference so the background
task is terminated when the component stops.

In `@compute/src/main/java/org/zstack/compute/vm/VmExtensionPointAdapter.java`:
- Around line 89-108: The method resourceOwnerAfterChange prematurely returns
when rootVolumeUuid is null or when vmnics is empty, which prevents
changeVmCdRomsOwner(ref.getResourceUuid(), newOwnerUuid) from running; modify
resourceOwnerAfterChange so it does not return early: if rootVolumeUuid is null
skip acntMgr.changeResourceOwner(rootVolumeUuid, ...) but continue, and when
vmnics is empty skip the loop that calls acntMgr.changeResourceOwner(vmnicUuid,
...) but continue to invoke changeVmCdRomsOwner(ref.getResourceUuid(),
newOwnerUuid); update the guards around acntMgr.changeResourceOwner and the
VmNicVO query (dbf.createQuery / VmNicVO_, VmNicVO_.uuid,
VmNicVO_.vmInstanceUuid) to conditionally execute without exiting the method.
- Around line 148-150: Fix the logging typos in VmExtensionPointAdapter: replace
the misspelled "unknowngc" message in the logger.warn call that mentions
creating an unknowngc job for vm[uuid:%s] to a clear English phrase such as
"create an UnknownVmGC job for vm[uuid:%s]" (or "create an unknown GC job for
vm[uuid:%s]") and also correct any possessive misuse in the earlier message
("it's" -> "its") so both logger.warn statements use proper spelling and
grammar.

In `@compute/src/main/java/org/zstack/compute/vm/VmNicApiSubManager.java`:
- Around line 84-108: The retry loop that only updates nicVO.mac leaves nic (the
in-memory representation) stale and still calls trigger.next() even if all
retries fail; fix by (1) after a successful SQLBatch.persist() inside the while
loop, copy the persisted mac back to the in-memory nic (e.g., set
nic.setMac(nicVO.getMac())) so callers see the new MAC, and (2) if the loop
exhausts all attempts (no successful persist), do not call trigger.next() —
instead throw an exception (or rethrow the last PersistenceException) to abort
further processing; update references around nicVO, nic, the SQLBatch block and
the tries loop, and keep using MacOperator.generateMacWithDeviceId(...) when
regenerating macs on Duplicate entry.

In `@compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java`:
- Around line 124-140: The loop in VmQuotaSubManager that iterates
diskOfferingCountMap.keySet() issues a DB query per entry (TypedQuery in the
loop) causing N+1 queries; replace it with a single Q API batch query to fetch
diskSize for all UUIDs in diskOfferingCountMap (use
Q.New(DiskOfferingVO.class).in(DiskOfferingVO_.uuid,
diskOfferingCountMap.keySet()).select(DiskOfferingVO_.diskSize).listValues()),
then locally multiply each returned diskSize by the corresponding count from
diskOfferingCountMap and sum into allVolumeSizeAsked, removing the per-UUID
TypedQuery and looped DB calls.
- Line 144: The current addCheckCondition in VmQuotaSubManager uses
msg.getStrategy().equals(VmCreationStrategy.JustCreate.toString()) which can NPE
if strategy is null; change the condition to a null-safe comparison (e.g., use
Objects.equals or call the constant's equals with msg.getStrategy()) so the
check does not throw when msg.getStrategy() is null, and keep the surrounding
logic in addCheckCondition((msg) -> ...) unchanged.

---

Nitpick comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmExpungeSubManager.java`:
- Around line 167-169: The loop in VmExpungeSubManager currently finds the
matching ExpungeVmMsg by calling msgs.get(replies.indexOf(r)) inside a for-each
over replies, causing O(n²); change the loop to iterate by index (for i from 0
to replies.size()-1) or use an indexed for loop so you can directly access
replies.get(i) and msgs.get(i), then use MessageReply r = replies.get(i) and
ExpungeVmMsg msg = msgs.get(i) to handle the success/failure logic; update any
references inside the loop accordingly to avoid repeated indexOf calls.

ℹ️ Review info

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6b8f88 and ed90fc8.

⛔ Files ignored due to path filters (1)
  • conf/springConfigXml/VmInstanceManager.xml is excluded by !**/*.xml
📒 Files selected for processing (7)
  • compute/src/main/java/org/zstack/compute/vm/VmCreationSubManager.java
  • compute/src/main/java/org/zstack/compute/vm/VmExpungeSubManager.java
  • compute/src/main/java/org/zstack/compute/vm/VmExtensionPointAdapter.java
  • compute/src/main/java/org/zstack/compute/vm/VmFlowChainRegistry.java
  • compute/src/main/java/org/zstack/compute/vm/VmInstanceManagerImpl.java
  • compute/src/main/java/org/zstack/compute/vm/VmNicApiSubManager.java
  • compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java

Comment on lines +577 to +600
for (String l3uuid : l3s) {
String sql = "select ps" +
" from PrimaryStorageVO ps, L2NetworkClusterRefVO l2ref," +
" L3NetworkVO l3, PrimaryStorageClusterRefVO psref" +
" where ps.uuid = psref.primaryStorageUuid" +
" and psref.clusterUuid = l2ref.clusterUuid" +
" and l2ref.l2NetworkUuid = l3.l2NetworkUuid" +
" and l3.uuid = :l3uuid";
TypedQuery<PrimaryStorageVO> psq = dbf.getEntityManager().createQuery(sql, PrimaryStorageVO.class);
psq.setParameter("l3uuid", l3uuid);
List<PrimaryStorageVO> pss = psq.getResultList();

List<BackupStorageVO> lst = new ArrayList<>();
for (PrimaryStorageVO ps : pss) {
PrimaryStorageType psType = PrimaryStorageType.valueOf(ps.getType());
List<String> bsUuids = psType.findBackupStorage(ps.getUuid());

if (!bsUuids.isEmpty()) {
// the primary storage has bound backup storage, e.g. ceph
sql = "select bs from BackupStorageVO bs where bs.uuid in (:uuids)";
TypedQuery<BackupStorageVO> bq = dbf.getEntityManager().createQuery(sql, BackupStorageVO.class);
bq.setParameter("uuids", bsUuids);
lst.addAll(bq.getResultList());
} else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n compute/src/main/java/org/zstack/compute/vm/VmCreationSubManager.java | sed -n '570,610p'

Repository: MatheMatrix/zstack

Length of output: 2397


🏁 Script executed:

# 搜索 TypedQuery 在整个文件中的使用
rg -n 'TypedQuery' compute/src/main/java/org/zstack/compute/vm/VmCreationSubManager.java -B2 -A2

Repository: MatheMatrix/zstack

Length of output: 3169


🏁 Script executed:

# 获取文件总行数和更多上下文
wc -l compute/src/main/java/org/zstack/compute/vm/VmCreationSubManager.java

Repository: MatheMatrix/zstack

Length of output: 136


循环内嵌套查询不符合性能规范

第 590-604 行的 for (PrimaryStorageVO ps : pss) 循环内部在第 597-599 行对每一条 PrimaryStorageVO 记录都执行了一次 BackupStorageVO 的查询,造成 N+1 问题。应改为先批量收集所有 bsUuids,然后一次性查询所有 BackupStorageVO 并按 PrimaryStorage 聚合,符合编码指南"禁止循环里套查询"的要求。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/VmCreationSubManager.java` around
lines 577 - 600, Inside VmCreationSubManager (the method containing the loop
over pss) you are causing an N+1 by querying BackupStorageVO inside the for
(PrimaryStorageVO ps : pss) loop; instead, collect all bsUuids from every
PrimaryStorageVO first (use
PrimaryStorageType.valueOf(ps.getType()).findBackupStorage(ps.getUuid()) for
each ps), deduplicate into one Set, run a single query
TypedQuery<BackupStorageVO> with "where bs.uuid in (:uuids)" to fetch all
BackupStorageVOs at once, then build a Map of backupUuid->BackupStorageVO and
aggregate/backfill the original per-primary-storage lists (or create a
Map<PrimaryStorageVO, List<BackupStorageVO>>) so you can replace the per-ps
bq.getResultList() calls with lookups into that map and remove the loop-in-query
pattern.

Comment on lines +1152 to +1156
List<ErrorCode> result = Collections.emptyList();
if (msg == null) {
result.add(operr(ORG_ZSTACK_COMPUTE_VM_10238, "CreateVmInstanceMsg cannot be null"));
return result;
} else if (cmsg != null && cmsg.getSystemTags() != null && !cmsg.getSystemTags().isEmpty()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# 确认文件大小和相关行的内容
wc -l compute/src/main/java/org/zstack/compute/vm/VmCreationSubManager.java

Repository: MatheMatrix/zstack

Length of output: 136


🏁 Script executed:

# 查看 1152-1156 和 1165-1168 周围的代码
sed -n '1150,1170p' compute/src/main/java/org/zstack/compute/vm/VmCreationSubManager.java | cat -n

Repository: MatheMatrix/zstack

Length of output: 1513


🏁 Script executed:

# 搜索所有 Collections.emptyList() 的使用位置
rg -n 'Collections\.emptyList\(\)' compute/src/main/java/org/zstack/compute/vm/VmCreationSubManager.java -B2 -A5

Repository: MatheMatrix/zstack

Length of output: 1662


Collections.emptyList() 后调用 add() 会直接抛异常

两处 helper 方法都将 result 初始化为不可变列表,但在防御分支执行了 result.add(...)Collections.emptyList() 返回的是不可变集合,直接调用 add() 会抛出 UnsupportedOperationException。虽然正常调用链通常传入非空的 msg,但这个防御分支本身的实现是错误的。

修复建议
         private List<ErrorCode> extEmitterHandleSystemTag(final CreateVmInstanceMsg msg, final APICreateMessage cmsg, VmInstanceVO finalVo) {
-            List<ErrorCode> result = Collections.emptyList();
+            List<ErrorCode> result = new ArrayList<>();
             if (msg == null) {
                 result.add(operr(ORG_ZSTACK_COMPUTE_VM_10238, "CreateVmInstanceMsg cannot be null"));
                 return result;
         private List<ErrorCode> extEmitterHandleSshKeyPair(final CreateVmInstanceMsg msg, final APICreateMessage cmsg, VmInstanceVO finalVo) {
-            List<ErrorCode> result = Collections.emptyList();
+            List<ErrorCode> result = new ArrayList<>();
             if (msg == null) {
                 result.add(operr(ORG_ZSTACK_COMPUTE_VM_10239, "CreateVmInstanceMsg cannot be null"));
                 return result;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
List<ErrorCode> result = Collections.emptyList();
if (msg == null) {
result.add(operr(ORG_ZSTACK_COMPUTE_VM_10238, "CreateVmInstanceMsg cannot be null"));
return result;
} else if (cmsg != null && cmsg.getSystemTags() != null && !cmsg.getSystemTags().isEmpty()) {
List<ErrorCode> result = new ArrayList<>();
if (msg == null) {
result.add(operr(ORG_ZSTACK_COMPUTE_VM_10238, "CreateVmInstanceMsg cannot be null"));
return result;
} else if (cmsg != null && cmsg.getSystemTags() != null && !cmsg.getSystemTags().isEmpty()) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/VmCreationSubManager.java` around
lines 1152 - 1156, The code in VmCreationSubManager initializes result with
Collections.emptyList(), then calls result.add(...) in defensive branches which
throws UnsupportedOperationException; change the initialization in the affected
helper methods (the methods that declare List<ErrorCode> result =
Collections.emptyList()) to return a mutable list (e.g., new ArrayList<>()) so
result.add(...) is safe, and update both occurrences so all subsequent adds to
result succeed; keep the rest of the logic in the same methods (refer to the
result variable and the methods in VmCreationSubManager that perform the
null/msg checks).

Comment on lines +63 to +65
public boolean stop() {
return true;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

stop() 未取消周期任务,存在生命周期泄漏风险

组件停止时应主动取消 expungeVmTask,避免后台任务继续运行。

🔧 建议修复
     `@Override`
     public boolean stop() {
+        if (expungeVmTask != null) {
+            expungeVmTask.cancel(true);
+            expungeVmTask = null;
+        }
         return true;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public boolean stop() {
return true;
}
`@Override`
public boolean stop() {
if (expungeVmTask != null) {
expungeVmTask.cancel(true);
expungeVmTask = null;
}
return true;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/VmExpungeSubManager.java` around
lines 63 - 65, VmExpungeSubManager.stop() currently returns true without
cancelling the scheduled periodic task, causing a lifecycle leak; update the
stop() method in class VmExpungeSubManager to cancel the expungeVmTask (check
expungeVmTask != null), call its cancel(...) method on the
ScheduledFuture/TimerTask instance, optionally call shutdown on the scheduler if
owned by this class, and null out the expungeVmTask reference so the background
task is terminated when the component stops.

Comment on lines +89 to +108
if (rootVolumeUuid == null) {
return;
}

acntMgr.changeResourceOwner(rootVolumeUuid, newOwnerUuid);

// change vmnic(s)
SimpleQuery<VmNicVO> sq = dbf.createQuery(VmNicVO.class);
sq.select(VmNicVO_.uuid);
sq.add(VmNicVO_.vmInstanceUuid, Op.EQ, ref.getResourceUuid());
List<String> vmnics = sq.listValue();
if (vmnics.isEmpty()) {
return;
}
for (String vmnicUuid : vmnics) {
acntMgr.changeResourceOwner(vmnicUuid, newOwnerUuid);
}

changeVmCdRomsOwner(ref.getResourceUuid(), newOwnerUuid);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

资源 owner 迁移被过早 return 截断

resourceOwnerAfterChange() 在 root volume 或 NIC 为空时直接 return,会导致 changeVmCdRomsOwner() 不执行,CD-ROM 的 owner 可能未同步更新。这里应只跳过缺失资源,不应提前结束整个流程。

🔧 建议修复
-        String rootVolumeUuid = q.findValue();
-        if (rootVolumeUuid == null) {
-            return;
-        }
-
-        acntMgr.changeResourceOwner(rootVolumeUuid, newOwnerUuid);
+        String rootVolumeUuid = q.findValue();
+        if (rootVolumeUuid != null) {
+            acntMgr.changeResourceOwner(rootVolumeUuid, newOwnerUuid);
+        }

         // change vmnic(s)
         SimpleQuery<VmNicVO> sq = dbf.createQuery(VmNicVO.class);
         sq.select(VmNicVO_.uuid);
         sq.add(VmNicVO_.vmInstanceUuid, Op.EQ, ref.getResourceUuid());
         List<String> vmnics = sq.listValue();
-        if (vmnics.isEmpty()) {
-            return;
-        }
         for (String vmnicUuid : vmnics) {
             acntMgr.changeResourceOwner(vmnicUuid, newOwnerUuid);
         }

         changeVmCdRomsOwner(ref.getResourceUuid(), newOwnerUuid);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/VmExtensionPointAdapter.java`
around lines 89 - 108, The method resourceOwnerAfterChange prematurely returns
when rootVolumeUuid is null or when vmnics is empty, which prevents
changeVmCdRomsOwner(ref.getResourceUuid(), newOwnerUuid) from running; modify
resourceOwnerAfterChange so it does not return early: if rootVolumeUuid is null
skip acntMgr.changeResourceOwner(rootVolumeUuid, ...) but continue, and when
vmnics is empty skip the loop that calls acntMgr.changeResourceOwner(vmnicUuid,
...) but continue to invoke changeVmCdRomsOwner(ref.getResourceUuid(),
newOwnerUuid); update the guards around acntMgr.changeResourceOwner and the
VmNicVO query (dbf.createQuery / VmNicVO_, VmNicVO_.uuid,
VmNicVO_.vmInstanceUuid) to conditionally execute without exiting the method.

Comment on lines +148 to +150
logger.warn(String.format("the host[uuid:%s] disconnected, but the vm[uuid:%s] fails to " +
"change it's state to Unknown, %s", hostUuid, vmUuid, reply.getError()));
logger.warn(String.format("create an unknowngc job for vm[uuid:%s]", vmUuid));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

修正日志拼写错误,避免告警检索噪音

unknowngc 建议改为 UnknownVmGCunknown GC

As per coding guidelines: 代码里不应当有有中文,包括报错、注释等都应当使用正确的、无拼写错误的英文来写

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/VmExtensionPointAdapter.java`
around lines 148 - 150, Fix the logging typos in VmExtensionPointAdapter:
replace the misspelled "unknowngc" message in the logger.warn call that mentions
creating an unknowngc job for vm[uuid:%s] to a clear English phrase such as
"create an UnknownVmGC job for vm[uuid:%s]" (or "create an unknown GC job for
vm[uuid:%s]") and also correct any possessive misuse in the earlier message
("it's" -> "its") so both logger.warn statements use proper spelling and
grammar.

Comment on lines +84 to +108
int tries = 5;
while (tries-- > 0) {
try {
new SQLBatch() {
@Override
protected void scripts() {
persist(nicVO);
}
}.execute();
break;
} catch (PersistenceException e) {
if (ExceptionDSL.isCausedBy(e, SQLIntegrityConstraintViolationException.class, "Duplicate entry")) {
logger.debug(String.format("Concurrent mac allocation. Mac[%s] has been allocated, try allocating another one. " +
"The error[Duplicate entry] printed by jdbc.spi.SqlExceptionHelper is no harm, " +
"we will try finding another mac", nicVO.getMac()));
logger.trace("", e);
nicVO.setMac(MacOperator.generateMacWithDeviceId((short) nicVO.getDeviceId()));
} else {
throw e;
}
}
}

trigger.next();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

MAC 重试分支会留下脏状态,且重试耗尽后仍继续执行

当前重试时只更新了 nicVO.macnic.mac 可能保持旧值;并且 5 次都失败时仍会 trigger.next(),会把“未成功落库”的 NIC 带入后续 IP 分配流程。

🔧 建议修复
                 int tries = 5;
+                boolean persisted = false;
                 while (tries-- > 0) {
                     try {
                         new SQLBatch() {
                             `@Override`
                             protected void scripts() {
                                 persist(nicVO);
                             }
                         }.execute();
+                        persisted = true;
                         break;
                     } catch (PersistenceException e) {
                         if (ExceptionDSL.isCausedBy(e, SQLIntegrityConstraintViolationException.class, "Duplicate entry")) {
                             logger.debug(String.format("Concurrent mac allocation. Mac[%s] has been allocated, try allocating another one. " +
                                     "The error[Duplicate entry] printed by jdbc.spi.SqlExceptionHelper is no harm, " +
                                     "we will try finding another mac", nicVO.getMac()));
                             logger.trace("", e);
                             nicVO.setMac(MacOperator.generateMacWithDeviceId((short) nicVO.getDeviceId()));
+                            nic.setMac(nicVO.getMac());
                         } else {
                             throw e;
                         }
                     }
                 }
 
+                if (!persisted) {
+                    trigger.fail(Platform.operr("failed to persist vm nic after mac retry exhausted"));
+                    return;
+                }
+                nic.setMac(nicVO.getMac());
                 trigger.next();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/VmNicApiSubManager.java` around
lines 84 - 108, The retry loop that only updates nicVO.mac leaves nic (the
in-memory representation) stale and still calls trigger.next() even if all
retries fail; fix by (1) after a successful SQLBatch.persist() inside the while
loop, copy the persisted mac back to the in-memory nic (e.g., set
nic.setMac(nicVO.getMac())) so callers see the new MAC, and (2) if the loop
exhausts all attempts (no successful persist), do not call trigger.next() —
instead throw an exception (or rethrow the last PersistenceException) to abort
further processing; update references around nicVO, nic, the SQLBatch block and
the tries loop, and keep using MacOperator.generateMacWithDeviceId(...) when
regenerating macs on Duplicate entry.

Comment on lines +124 to +140
HashMap<String, Long> diskOfferingCountMap = new HashMap<>();
if (!diskOfferingUuids.isEmpty()) {
for (String diskOfferingUuid : diskOfferingUuids) {
if (diskOfferingCountMap.containsKey(diskOfferingUuid)) {
diskOfferingCountMap.put(diskOfferingUuid, diskOfferingCountMap.get(diskOfferingUuid) + 1);
} else {
diskOfferingCountMap.put(diskOfferingUuid, 1L);
}
}
for (String diskOfferingUuid : diskOfferingCountMap.keySet()) {
sql = "select diskSize from DiskOfferingVO where uuid = :uuid";
TypedQuery<Long> dq = dbf.getEntityManager().createQuery(sql, Long.class);
dq.setParameter("uuid", diskOfferingUuid);
Long dsize = dq.getSingleResult();
dsize = dsize == null ? 0 : dsize;
allVolumeSizeAsked += dsize * diskOfferingCountMap.get(diskOfferingUuid);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's read the actual file to confirm the code at lines 124-140
wc -l compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java

Repository: MatheMatrix/zstack

Length of output: 132


🏁 Script executed:

# Read the specific lines mentioned in the review
sed -n '124,140p' compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java

Repository: MatheMatrix/zstack

Length of output: 1266


🏁 Script executed:

# Search for TypedQuery usage in this file
rg -n 'TypedQuery' compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java

Repository: MatheMatrix/zstack

Length of output: 411


🏁 Script executed:

# Search for batch query patterns in the codebase to understand recommended approaches
rg -l 'findByUuids\|findByUuids\|listByUuids' --type java compute/src/ storage/src/ | head -5

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# Check how other parts of the codebase handle similar batch DiskOfferingVO queries
rg -n 'DiskOfferingVO' compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java -B2 -A2

Repository: MatheMatrix/zstack

Length of output: 1112


🏁 Script executed:

# Let's check what query methods are available through dbf for batch operations
rg -n 'Q\.' compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java -A1 | head -20

Repository: MatheMatrix/zstack

Length of output: 902


🏁 Script executed:

# Check how DiskOfferingVO is queried elsewhere in the codebase using recommended methods
rg -n 'class\s+Q\b|Q\.select|Q\.findAll' compute/src/main/java/org/zstack/compute/vm/ -A2 | head -30

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# Look at the dbf usage patterns in this file to understand recommended query methods
rg -n 'dbf\.' compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java | head -20

Repository: MatheMatrix/zstack

Length of output: 371


🏁 Script executed:

# Check if there's a Q API method for batch queries by DiskOfferingVO
rg -n 'listByUuids|findByUuids|findByUuidsAndLoadingRequired' storage/src/ compute/src/ --type java -l | head -5

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

# Look at the full context around the problematic code to understand better
sed -n '100,180p' compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java

Repository: MatheMatrix/zstack

Length of output: 5623


🏁 Script executed:

# Check how to properly query multiple records by UUID using Q API
rg -n 'Q\.New.*\.in\(|listByUuids|getByUuids' compute/src/ --type java -A3 | head -40

Repository: MatheMatrix/zstack

Length of output: 4629


🏁 Script executed:

# Look for patterns of querying DiskOfferingVO by multiple UUIDs
rg -n 'DiskOfferingVO_\.' compute/src/main/java/org/zstack/compute/vm/ -B2 -A2

Repository: MatheMatrix/zstack

Length of output: 4766


🏁 Script executed:

# Check if there are any Q.New examples that query by collections/lists
rg -n 'Q\.New.*in\(' compute/src/ --type java -A2 | head -30

Repository: MatheMatrix/zstack

Length of output: 3502


使用Q API改为批量查询,避免循环内N+1查询

第133-140行的循环遍历diskOfferingCountMap.keySet()逐条查询DiskOfferingVO违反了"禁止循环里套查询"的原则。同时使用了TypedQuery,应改为推荐的Q API方式。

建议改为一次性查询所有diskOfferingUuid对应的diskSize:

List<Long> diskSizes = Q.New(DiskOfferingVO.class)
    .in(DiskOfferingVO_.uuid, diskOfferingCountMap.keySet())
    .select(DiskOfferingVO_.diskSize)
    .listValues();

然后本地聚合计算总大小,避免循环内查询。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java` around
lines 124 - 140, The loop in VmQuotaSubManager that iterates
diskOfferingCountMap.keySet() issues a DB query per entry (TypedQuery in the
loop) causing N+1 queries; replace it with a single Q API batch query to fetch
diskSize for all UUIDs in diskOfferingCountMap (use
Q.New(DiskOfferingVO.class).in(DiskOfferingVO_.uuid,
diskOfferingCountMap.keySet()).select(DiskOfferingVO_.diskSize).listValues()),
then locally multiply each returned diskSize by the corresponding count from
diskOfferingCountMap and sum into allVolumeSizeAsked, removing the per-UUID
TypedQuery and looped DB calls.

}

return allVolumeSizeAsked;
}).addCheckCondition((msg) -> !msg.getStrategy().equals(VmCreationStrategy.JustCreate.toString())));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# 查找消息类定义中 strategy 字段的定义
echo "=== 查找 APICreateVmInstanceMsg 或相关消息类 ==="
fd -e java | xargs grep -l "getStrategy\|strategy" | grep -i "msg\|message" | head -20

Repository: MatheMatrix/zstack

Length of output: 1368


🏁 Script executed:

#!/bin/bash
# 查看 VmQuotaSubManager.java 的完整内容和上下文
echo "=== 查找 VmQuotaSubManager.java 文件位置 ==="
fd 'VmQuotaSubManager.java'

Repository: MatheMatrix/zstack

Length of output: 167


🏁 Script executed:

#!/bin/bash
# 查找 VmCreationStrategy 的定义
echo "=== 查找 VmCreationStrategy 定义 ==="
fd 'VmCreationStrategy.java'

Repository: MatheMatrix/zstack

Length of output: 160


🏁 Script executed:

#!/bin/bash
# 查看 VmQuotaSubManager.java 的完整内容
echo "=== VmQuotaSubManager.java 内容 ==="
cat -n compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java

Repository: MatheMatrix/zstack

Length of output: 13955


🏁 Script executed:

#!/bin/bash
# 查看 APICreateVmInstanceMsg.java 中 strategy 字段的定义
echo "=== APICreateVmInstanceMsg.java 中 strategy 相关定义 ==="
cat -n header/src/main/java/org/zstack/header/vm/APICreateVmInstanceMsg.java | grep -A 5 -B 5 -i "strategy"

Repository: MatheMatrix/zstack

Length of output: 1519


🏁 Script executed:

#!/bin/bash
# 查看 CreateVmInstanceMsg.java 中 strategy 字段的定义
echo "=== CreateVmInstanceMsg.java 中 strategy 相关定义 ==="
cat -n header/src/main/java/org/zstack/header/vm/CreateVmInstanceMsg.java | grep -A 5 -B 5 -i "strategy"

Repository: MatheMatrix/zstack

Length of output: 1634


🏁 Script executed:

#!/bin/bash
# 查看 VmCreationStrategy 的完整定义
echo "=== VmCreationStrategy.java 内容 ==="
cat -n header/src/main/java/org/zstack/header/vm/VmCreationStrategy.java

Repository: MatheMatrix/zstack

Length of output: 353


🏁 Script executed:

#!/bin/bash
# 验证 APICreateVmInstanceMsg 在消息链路中是否可能被设置为 null strategy
echo "=== 查找 setStrategy 的调用位置 ==="
rg -n "setStrategy\|\.strategy\s*=" --type java | head -30

Repository: MatheMatrix/zstack

Length of output: 90


🏁 Script executed:

#!/bin/bash
# 查看 APICreateVmInstanceMsg 的消息创建和初始化过程
echo "=== 查找 APICreateVmInstanceMsg 的消息创建 ==="
rg -n "new APICreateVmInstanceMsg\|APICreateVmInstanceMsg()" --type java | head -20

Repository: MatheMatrix/zstack

Length of output: 101


🏁 Script executed:

#!/bin/bash
# 检查 QuotaMessageHandler 的 addCheckCondition 方法以理解其使用场景
echo "=== 查找 QuotaMessageHandler 和 addCheckCondition 的定义 ==="
fd 'QuotaMessageHandler.java'

Repository: MatheMatrix/zstack

Length of output: 195


🏁 Script executed:

#!/bin/bash
# 查看 QuotaMessageHandler 的 addCheckCondition 方法定义
echo "=== QuotaMessageHandler.java addCheckCondition 相关定义 ==="
cat -n header/src/main/java/org/zstack/header/identity/quota/QuotaMessageHandler.java | head -100

Repository: MatheMatrix/zstack

Length of output: 4245


🏁 Script executed:

#!/bin/bash
# 搜索其他类似的 addCheckCondition 使用场景
echo "=== 其他 addCheckCondition 的使用示例 ==="
rg -n "addCheckCondition" --type java -A 2 | head -50

Repository: MatheMatrix/zstack

Length of output: 5660


策略比较缺少空值保护,可能导致配额校验 NPE

Line 144 直接调用 msg.getStrategy().equals() 时,若 strategy 为 null 会抛出异常中断配额检查。建议改为空安全比较:

🔧 建议修复
-                }).addCheckCondition((msg) -> !msg.getStrategy().equals(VmCreationStrategy.JustCreate.toString())));
+                }).addCheckCondition((msg) -> !VmCreationStrategy.JustCreate.toString().equals(msg.getStrategy())));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute/src/main/java/org/zstack/compute/vm/VmQuotaSubManager.java` at line
144, The current addCheckCondition in VmQuotaSubManager uses
msg.getStrategy().equals(VmCreationStrategy.JustCreate.toString()) which can NPE
if strategy is null; change the condition to a null-safe comparison (e.g., use
Objects.equals or call the constant's equals with msg.getStrategy()) so the
check does not throw when msg.getStrategy() is null, and keep the surrounding
logic in addCheckCondition((msg) -> ...) unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants