feat(cli): cross-flavor inline image and video display via MCP and ACP#958
Open
heavygee wants to merge 5 commits into
Open
feat(cli): cross-flavor inline image and video display via MCP and ACP#958heavygee wants to merge 5 commits into
heavygee wants to merge 5 commits into
Conversation
Share display_image prompt across MCP-bridge flavors (Cursor, Gemini, Kimi, Codex, Claude, OpenCode), auto-approve the tool in buildHapiMcpBridge, handle ACP image content blocks, and harden generated-image registration with content sniffing. Closes tiann#956 Co-authored-by: Cursor <cursoragent@cursor.com>
Keep object URLs stable across refetch, upscale tiny inline images, fetch generated-image bytes with cache no-store (avoid empty 304 bodies), and load hapiMcpUrl from per-session API in hapi-display-image tooling. Co-authored-by: Cursor <cursoragent@cursor.com>
5 tasks
Add display_video alongside display_image, video MIME sniffing with avif guard, web GeneratedImageCard video player, and hapi-display-image auto-routing. Co-authored-by: Cursor <cursoragent@cursor.com>
Share display_video prompts across MCP-bridge flavors, auto-approve the tool, register mp4/webm via path sniffing, render inline video in web on the existing generated-image RPC path, and restore robust media card fetch. Co-authored-by: Cursor <cursoragent@cursor.com>
Bun hoists @modelcontextprotocol/sdk to the repo root; importing via cli/node_modules broke the dogfood script in worktrees. Co-authored-by: Cursor <cursoragent@cursor.com>
3 tasks
There was a problem hiding this comment.
Findings
- [Major] ACP inline media can be emitted before preceding assistant text. The new image branch flushes reasoning but leaves
bufferedTextopen, whileemitGeneratedImageFromAcpContent()emits asynchronously. For a normal ACP sequence like text chunk -> image block -> turn drain, the generated-image message can reach the web UI before the text that came first in the stream. Evidence:cli/src/agent/backends/acp/AcpMessageHandler.ts:559.
Questions
- None.
Summary
- Review mode: initial
- One ordering regression found in the latest diff. Residual risk: media fetch/cache behavior was reviewed statically only.
Testing
- Not run (automation). Suggested coverage: ACP handler test that sends a text
agentMessageChunk, then an imageagentMessageChunk, then drains, and asserts text precedesgenerated_image.
| const content = update.content; | ||
| if (isObject(content) && content.type === 'image') { | ||
| this.flushReasoning(); | ||
| void this.emitGeneratedImageFromAcpContent(content); |
There was a problem hiding this comment.
[Major] Flush buffered assistant text before emitting ACP media. Text chunks in this handler are buffered until flushText()/drainBuffers(), but this new image branch only flushes reasoning before kicking off async media registration. For a stream that sends text, then an image block, then ends the turn, the image can be delivered before the preceding text and the chat order is wrong.
Suggested fix:
if (isObject(content) && content.type === 'image') {
this.flushReasoning();
this.flushText();
void this.emitGeneratedImageFromAcpContent(content);
return;
}
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cross-flavor inline image and video in HAPI web chat (not Cursor IDE composer):
display_imageanddisplay_videoinstartHappyServer; stdio bridge forwards both;buildHapiMcpBridgeauto-approves both tools.displayImagePrompt/display_videoexports for Claude, Codex, andHAPI_MCP_BRIDGE_PROMPT(Cursor, Gemini, Kimi, OpenCode first-prompt injection).generated-imageagent messages; generated media stored in CLI memory and served via hubGET /api/sessions/:id/generated-images/:imageId(images and video share this route).GeneratedImageCardrenders<img>(with tiny-image upscale + stable blob fetch) or<video controls>whenmimeTypeisvideo/*.hapi-display-image.mjsroutes mp4/webm todisplay_video(absolute paths); loadshapiMcpUrlfrom per-session GET.Test plan
bun typecheckgeneratedImages.test.ts,buildHapiMcpBridge.test.ts,codexMcpConfig.test.ts,AcpMessageHandler.test.ts,messageConverter.test.tsgeneratedInlineMedia.test.tsdisplay_imagePNG anddisplay_videoMP4 on Cursor session withhapiMcpUrl; inline cards in HAPI webIssues
Closes #956