Skip to content

feat(cli): cross-flavor inline image and video display via MCP and ACP#958

Open
heavygee wants to merge 5 commits into
tiann:mainfrom
heavygee:feat/cross-flavor-inline-images
Open

feat(cli): cross-flavor inline image and video display via MCP and ACP#958
heavygee wants to merge 5 commits into
tiann:mainfrom
heavygee:feat/cross-flavor-inline-images

Conversation

@heavygee

@heavygee heavygee commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Cross-flavor inline image and video in HAPI web chat (not Cursor IDE composer):

  • CLI MCP: display_image and display_video in startHappyServer; stdio bridge forwards both; buildHapiMcpBridge auto-approves both tools.
  • Prompts: Shared displayImagePrompt / display_video exports for Claude, Codex, and HAPI_MCP_BRIDGE_PROMPT (Cursor, Gemini, Kimi, OpenCode first-prompt injection).
  • ACP: Image content blocks → generated-image agent messages; generated media stored in CLI memory and served via hub GET /api/sessions/:id/generated-images/:imageId (images and video share this route).
  • Web: GeneratedImageCard renders <img> (with tiny-image upscale + stable blob fetch) or <video controls> when mimeType is video/*.
  • Tooling: hapi-display-image.mjs routes mp4/webm to display_video (absolute paths); loads hapiMcpUrl from per-session GET.

Test plan

  • bun typecheck
  • CLI: generatedImages.test.ts, buildHapiMcpBridge.test.ts, codexMcpConfig.test.ts, AcpMessageHandler.test.ts, messageConverter.test.ts
  • Web: generatedInlineMedia.test.ts
  • Manual: display_image PNG and display_video MP4 on Cursor session with hapiMcpUrl; inline cards in HAPI web

Issues

Closes #956

heavygee and others added 2 commits June 19, 2026 17:54
Share display_image prompt across MCP-bridge flavors (Cursor, Gemini,
Kimi, Codex, Claude, OpenCode), auto-approve the tool in
buildHapiMcpBridge, handle ACP image content blocks, and harden
generated-image registration with content sniffing.

Closes tiann#956

Co-authored-by: Cursor <cursoragent@cursor.com>
Keep object URLs stable across refetch, upscale tiny inline images,
fetch generated-image bytes with cache no-store (avoid empty 304 bodies),
and load hapiMcpUrl from per-session API in hapi-display-image tooling.

Co-authored-by: Cursor <cursoragent@cursor.com>
heavygee and others added 3 commits June 20, 2026 22:01
Add display_video alongside display_image, video MIME sniffing with avif
guard, web GeneratedImageCard video player, and hapi-display-image auto-routing.

Co-authored-by: Cursor <cursoragent@cursor.com>
Share display_video prompts across MCP-bridge flavors, auto-approve the
tool, register mp4/webm via path sniffing, render inline video in web on
the existing generated-image RPC path, and restore robust media card fetch.

Co-authored-by: Cursor <cursoragent@cursor.com>
Bun hoists @modelcontextprotocol/sdk to the repo root; importing via
cli/node_modules broke the dogfood script in worktrees.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee heavygee changed the title feat(cli): cross-flavor inline image display via MCP and ACP feat(cli): cross-flavor inline image and video display via MCP and ACP Jun 20, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  • [Major] ACP inline media can be emitted before preceding assistant text. The new image branch flushes reasoning but leaves bufferedText open, while emitGeneratedImageFromAcpContent() emits asynchronously. For a normal ACP sequence like text chunk -> image block -> turn drain, the generated-image message can reach the web UI before the text that came first in the stream. Evidence: cli/src/agent/backends/acp/AcpMessageHandler.ts:559.

Questions

  • None.

Summary

  • Review mode: initial
  • One ordering regression found in the latest diff. Residual risk: media fetch/cache behavior was reviewed statically only.

Testing

  • Not run (automation). Suggested coverage: ACP handler test that sends a text agentMessageChunk, then an image agentMessageChunk, then drains, and asserts text precedes generated_image.

const content = update.content;
if (isObject(content) && content.type === 'image') {
this.flushReasoning();
void this.emitGeneratedImageFromAcpContent(content);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Major] Flush buffered assistant text before emitting ACP media. Text chunks in this handler are buffered until flushText()/drainBuffers(), but this new image branch only flushes reasoning before kicking off async media registration. For a stream that sends text, then an image block, then ends the turn, the image can be delivered before the preceding text and the chat order is wrong.

Suggested fix:

if (isObject(content) && content.type === 'image') {
    this.flushReasoning();
    this.flushText();
    void this.emitGeneratedImageFromAcpContent(content);
    return;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(cli+web): inline image display for all agent flavors

1 participant