Add Lunar Lake Xe2 iGPU compatibility report and benchmarks by MegaStood · Pull Request #342 · intel/llm-scaler

MegaStood · 2026-04-01T13:02:30Z

Summary

Adds comprehensive Lunar Lake (Arc 140V Xe2 iGPU) compatibility documentation
and benchmark results for running LLMs on 32GB shared-memory handhelds.

Compatibility report: kernel portability analysis, key differences from B60 discrete GPU
Benchmark results for 5 models: gpt-oss-20b (MXFP4), Qwen3.5-4B (INT4), Qwen3-8B (INT4), Qwen3.5-9B (BF16/FP8/INT4)
KV cache analysis: per-token cost derivation, concurrency math, prefix caching (51x TTFT improvement at 16K)
Running recipes for OpenClaw agent framework (tool calling + reasoning)
Generalized launch pattern derived from all tested models
Benchmark script for reproducible testing
vLLM #30359 upstream issue comment draft

Tested On

MSI Claw 8 AI+ (Core Ultra 7 258V, Arc 140V, 32GB LPDDR5x)
vLLM 0.14.1.dev0 (XPU backend, upstream commit b17039bcc)
Nobara (Fedora-based) with Intel oneAPI 2025.3

Comprehensive testing of 7 LLM models on MSI Claw 8 AI+ handheld (Intel Core Ultra 7 258V, Arc 140V Xe2 iGPU, 32GB LPDDR5x shared memory). Models benchmarked (vLLM XPU backend): - openai/gpt-oss-20b MXFP4: 22.5 tok/s single-user (recommended default) - Qwen3.5-4B AutoRound INT4: 23.4 tok/s (best for multi-service) - Qwen3-8B AutoRound INT4: 18.6 tok/s - Qwen3.5-9B BF16/FP8/sym_int4: 5-14.7 tok/s - Qwen3.5-35B-A3B, GLM-4.7-flash, Qwen3-30B-A3B: all OOM Key findings: - ~13 GiB practical model ceiling on 32GB shared memory - Layer-by-layer weight processing (NOT 2x bulk) for AutoRound/sym_int4 - Prefix caching: 51x TTFT improvement at 16K context - 88,576 KV cache tokens at 0.7 util (~2.7x concurrent at 32K) - Cold prefill: ~807 tok/s (16K in 20.3s), decode: 15.6-22.5 tok/s Includes: - Complete running recipes (gpt-oss-20b, Qwen3.5-4B, ASR, TTS) - Tool calling + reasoning parser configuration - Memory budget tables for single/multi-service configurations - Standard benchmark script - Pre-built sym_int4 quantizer .so (12KB, from BigDL-core) - Draft comment for vLLM issue #30359

Replace old split "Environment Variables" / "vLLM Launch Flags" sections (which hardcoded --quantization int4 --max-model-len 8192) with a unified "Environment Variables and Launch Flags" section showing a general pattern derived from all tested models (4B, 8B, 9B dense, 20B MoE). Required vs optional flags documented in tables with "Tested With" column. https://claude.ai/code/session_01JyMJU94Dq32vYBGMoMJM34

Model weights now load successfully (16.16 GiB) with disk swap. Two issues remain: (1) IPEX marlin_shuffle_weight DEVICE_LOST during MoE warmup, (2) no MLA KV compression — vLLM stores full expanded KV at ~940 KB/token instead of compressed latent at ~53 KB/token (18x inflation). With proper MLA support, GLM-4.7-flash would fit 32K context in ~1.66 GiB KV cache. https://claude.ai/code/session_01JyMJU94Dq32vYBGMoMJM34

claude added 3 commits April 2, 2026 20:46

MegaStood force-pushed the claude/upstream-lunar-lake-benchmarks-CB5w6 branch from 634b09c to 8fa0c1c Compare April 2, 2026 11:47

MegaStood mentioned this pull request Apr 2, 2026

Add Lunar Lake (32GB) support: Xe2 compatibility fixes and benchmark results #335

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Lunar Lake Xe2 iGPU compatibility report and benchmarks#342

Add Lunar Lake Xe2 iGPU compatibility report and benchmarks#342
MegaStood wants to merge 3 commits intointel:mainfrom
MegaStood:claude/upstream-lunar-lake-benchmarks-CB5w6

MegaStood commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MegaStood commented Apr 1, 2026

Summary

Tested On

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants