docs: add 8 new FAQ entries covering GPU virtualization, scheduling, and ecosystem integration (#416) by mesutoezdil · Pull Request #426 · Project-HAMi/website

mesutoezdil · 2026-05-29T21:02:47Z

Adds 8 new FAQ entries to docs/faq/faq.md covering the three topic areas defined in the issue. All questions were sourced from the research compiled in #415.

New entries

GPU virtualization model

How does HAMi enforce GPU memory and compute limits? Explains the libvgpu.so CUDA API interception mechanism, what it covers, and what it does not (DinD, direct driver API calls). Links to GPU Virtualization.
HAMi vGPU vs NVIDIA MIG. Side-by-side comparison table covering hardware requirements, isolation mechanism, enforcement strength, granularity, and dynamic reconfiguration. Guidance on when to use each.
Why does nvidia-smi inside a container show less memory than the host? Explains that this is intentional - libvgpu.so intercepts memory query calls and returns the allocated limit.
Why is my gpumem limit not enforced? Covers the four root causes: CUDA_DISABLE_CONTROL, Docker-in-Docker, direct NVML/driver API calls, and misconfigured container runtime.

Scheduling interaction

Does HAMi replace kube-scheduler or run alongside it? Explains the extender model, the MutatingWebhook schedulerName assignment, and the impact on non-HAMi pods (none). Includes a note on multi-replica leader election.

Ecosystem integration

HAMi with vLLM multi-GPU tensor parallelism. Documents the NCCL segfault issue (CUDA_DEVICE_MEMORY_SHARED_CACHE per-container, fixed in v2.7.0), single-GPU usage, and Volcano multi-pod setup. Links to issues #1764 and #1853.
HAMi with NVIDIA GPU Operator and DCGM. Explains the device plugin conflict and how to disable GPU Operator's device plugin. Notes that DCGM Exporter is unaffected.
Prometheus and Grafana monitoring. Covers the metrics endpoint, key metric names, scrape config, and importing the bundled static/grafana/gpu-dashboard.json dashboard.

Closes #416.
Refs #415.

hami-robot · 2026-05-29T21:02:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mesutoezdil
Once this PR has been reviewed and has the lgtm label, please assign windsonsea for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2026-05-29T21:02:53Z

✅ Deploy Preview for project-hami ready!

Name	Link
🔨 Latest commit	`337648d`
🔍 Latest deploy log	https://app.netlify.com/projects/project-hami/deploys/6a2135240832dd00082fe628
😎 Deploy Preview	https://deploy-preview-426--project-hami.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
🤖 Make changes	Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

mesutoezdil · 2026-06-04T07:12:27Z

done @rootsongjc

rootsongjc · 2026-06-04T07:14:01Z

I think this article as an FAQ might be too long.

rootsongjc · 2026-06-04T07:15:17Z

And some of the FAQs could be added to the Concept document, or to other documents, or referenced from existing documents on websites. Instead of putting it all in the FAQ, which makes it difficult to maintain later on.

mesutoezdil · 2026-06-04T08:11:11Z

And some of the FAQs could be added to the Concept document, or to other documents, or referenced from existing documents on websites. Instead of putting it all in the FAQ, which makes it difficult to maintain later on.

ok now?

… pages Signed-off-by: mesutoezdil <mesudozdil@gmail.com>

hami-robot Bot added the dco-signoff: yes label May 29, 2026

hami-robot Bot requested review from archlitchi and wawa0210 May 29, 2026 21:02

hami-robot Bot added the size/L label May 29, 2026

mesutoezdil force-pushed the docs/faq-entries-416 branch from 24a8fb2 to 359c2cc Compare June 4, 2026 07:13

mesutoezdil force-pushed the docs/faq-entries-416 branch from 359c2cc to c03cd3c Compare June 4, 2026 08:09

hami-robot Bot added size/M and removed size/L labels Jun 4, 2026

docs: add 8 new FAQ entries with links to concept and troubleshooting…

337648d

… pages Signed-off-by: mesutoezdil <mesudozdil@gmail.com>

mesutoezdil force-pushed the docs/faq-entries-416 branch from c03cd3c to 337648d Compare June 4, 2026 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add 8 new FAQ entries covering GPU virtualization, scheduling, and ecosystem integration (#416)#426

docs: add 8 new FAQ entries covering GPU virtualization, scheduling, and ecosystem integration (#416)#426
mesutoezdil wants to merge 1 commit into
Project-HAMi:masterfrom
mesutoezdil:docs/faq-entries-416

mesutoezdil commented May 29, 2026 •

edited

Loading

Uh oh!

hami-robot Bot commented May 29, 2026

Uh oh!

netlify Bot commented May 29, 2026 •

edited

Loading

Uh oh!

mesutoezdil commented Jun 4, 2026

Uh oh!

rootsongjc commented Jun 4, 2026

Uh oh!

rootsongjc commented Jun 4, 2026

Uh oh!

mesutoezdil commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mesutoezdil commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New entries

GPU virtualization model

Scheduling interaction

Ecosystem integration

Uh oh!

hami-robot Bot commented May 29, 2026

Uh oh!

netlify Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for project-hami ready!

Uh oh!

mesutoezdil commented Jun 4, 2026

Uh oh!

rootsongjc commented Jun 4, 2026

Uh oh!

rootsongjc commented Jun 4, 2026

Uh oh!

mesutoezdil commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mesutoezdil commented May 29, 2026 •

edited

Loading

netlify Bot commented May 29, 2026 •

edited

Loading