Feed prior PR conversation to the reviewer as context by tarekziade · Pull Request #29 · huggingface/serge

tarekziade · 2026-06-24T07:12:27Z

Why

Today a Serge review only sees the PR title/description, the diff, and the single comment that triggered it (prepare_review → build_user_prompt). It never fetches the PR's existing discussion. On PRs where the relevant context lives in the conversation — prior reviews, back-and-forth, earlier inline threads — the model reviews effectively blind.

What

GitHubClient: add list_issue_comments, list_review_comments, list_reviews. Extracted a shared _get_paginated helper and reused it for get_pr_files.
reviewer: _fetch_pr_conversation + _format_pr_conversation build a compact chronological transcript — top-level comments, review summaries (with APPROVE/REQUEST_CHANGES/COMMENT state), and inline threads tagged path:line.
- Best-effort: a fetch failure logs and the review continues.
- Budgeted newest-first to MAX_PR_CONVERSATION_CHARS, with a note for dropped entries.
- The triggering comment is skipped (already shown separately), and empty/COMMENTED no-body reviews are dropped as noise.
prompts: rendered inside an UNTRUSTED PR CONVERSATION block, run through the existing _scrub_delimiters defang — PR comments are attacker-controllable, same trust posture as the title/body.
config: MAX_PR_CONVERSATION_CHARS (default 16000; 0 disables the fetch entirely).

Tests

New coverage for formatting/ordering, the newest-first budget + omitted note, trigger-comment skip, empty-review dropping, best-effort-on-error, and the disabled path; the prompt block + delimiter scrubbing; and the config knob. Full suite: 267 passing.

Notes

Scoped to the review path. The inline follow-up path (run_followup) still sees only its anchored thread — a candidate follow-up if useful.
Adds 3 GitHub API reads per review (paginated). Negligible vs. the LLM call.

🤖 Generated with Claude Code

Until now a review only saw the PR title/description, the diff, and the single comment that triggered it — never the existing discussion. On PRs where the relevant context lives in the conversation (prior reviews, back-and-forth, earlier inline threads) the model was effectively blind. - GitHubClient: add list_issue_comments / list_review_comments / list_reviews (extract a shared _get_paginated helper; reuse it for get_pr_files). - reviewer: _fetch_pr_conversation + _format_pr_conversation build a compact chronological transcript (top-level comments, review summaries with state, inline threads tagged path:line). Best-effort — a fetch failure logs and the review continues. Budgeted newest-first; the triggering comment is skipped (already shown separately). - prompts: render it in an UNTRUSTED, delimiter-scrubbed block, since PR comments are attacker-controllable like the title/body. - config: MAX_PR_CONVERSATION_CHARS (default 16000; 0 disables). - Tests for formatting/budget/skip/best-effort, the prompt block + scrubbing, and the config knob. Docs + .env.example updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tarekziade · 2026-06-24T07:16:33Z

@askserge review this AI generated patch

sergereview

🤗 Serge says:

This PR adds prior PR conversation as context for reviews. The implementation is clean, well-tested, and security-conscious: conversation is treated as untrusted input, scrubbed for delimiter injection, and budgeted newest-first. The GitHub API endpoints and pagination logic are correct, and best-effort error handling ensures a fetch failure won't block reviews. No issues found.

serge v0.1.0 · model: moonshotai/Kimi-K2.6 · 1 LLM turn · 0 tool calls · 24.1s · 11181 in / 4596 out tokens

sayakpaul · 2026-06-24T07:31:24Z

 | `MENTION_TRIGGER` | `mention_trigger` | `@askserge` | Phrase that triggers reviews. |
 | `REVIEW_EVENT` | `review_event` | `COMMENT` | Fallback review event when the model omits one. |
 | `MAX_DIFF_CHARS` | `max_diff_chars` | `200000` | Maximum diff characters sent to the LLM. |
+| `MAX_PR_CONVERSATION_CHARS` | `max_pr_conversation_chars` | `16000` | Characters of prior PR conversation (issue comments, review summaries, inline review threads) fed to the model as untrusted context. Newest entries are kept; `0` disables the fetch. |


16000 seems like a very small number here given the stuff it's supposed to include in the context? Also, should it be made optional? Because the this context might contain prompt injection attacks. So, one needs to be extra careful.

yeah we have some protections in place like for the PR body, but it's indeed the same risk. Notice that we are still not executing any code and it's running inside bubblewrap.

for the length it's approxiamtely the last 3k but we could also use our compressor to pack more stuff/

tarekziade requested a review from sayakpaul June 24, 2026 07:16

sergereview Bot reviewed Jun 24, 2026

View reviewed changes

sayakpaul reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feed prior PR conversation to the reviewer as context#29

Feed prior PR conversation to the reviewer as context#29
tarekziade wants to merge 1 commit into
mainfrom
serge-pr-conversation-context

tarekziade commented Jun 24, 2026

Uh oh!

tarekziade commented Jun 24, 2026

Uh oh!

sergereview Bot left a comment

Uh oh!

sayakpaul Jun 24, 2026

Uh oh!

tarekziade Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tarekziade commented Jun 24, 2026

Why

What

Tests

Notes

Uh oh!

tarekziade commented Jun 24, 2026

Uh oh!

sergereview Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

tarekziade Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants