Skip to content

feat(init): LLM-powered codebase analysis for AGENTS.md generation#2745

Open
punkcanyang wants to merge 3 commits into
Hmbown:mainfrom
punkcanyang:feat/enhanced-init
Open

feat(init): LLM-powered codebase analysis for AGENTS.md generation#2745
punkcanyang wants to merge 3 commits into
Hmbown:mainfrom
punkcanyang:feat/enhanced-init

Conversation

@punkcanyang
Copy link
Copy Markdown
Contributor

@punkcanyang punkcanyang commented Jun 4, 2026

Summary

Replace the template-based /init command with deep codebase analysis that generates a customized AGENTS.md tailored to the actual project.

How it works: The command gathers rich project context in Rust, then delegates content generation to the LLM agent via SendMessage — the same pattern used by /change, /relay, and /rlm.

Cost: /init is an explicit user action — users who don't need it won't incur API calls.

Changes

  • gather_project_context() — orchestrates all context gathering
  • parse_cargo_toml() — workspace members, deps, features, workspace.dependencies
  • parse_package_json() — scripts, deps, framework detection
  • gather_git_info() — remote, branch, status (with --untracked-files=no)
  • detect_ci_systems() — GitHub Actions, GitLab CI, CircleCI, Jenkins, Travis, Azure
  • detect_build_systems() — Makefile, Justfile, CMake, Meson, Bazel, scripts/
  • detect_test_frameworks() — Cargo.toml dev-deps (incl. workspace.dev-dependencies), package.json, pytest
  • build_init_prompt() — formats context for the LLM agent
  • read_existing_agents_md() — up to 100KB for in-place updates

Test plan

  • 56 unit tests pass, clippy clean, fmt clean
  • Rebased onto v0.8.53
  • Manual: run /init to verify agent generates comprehensive AGENTS.md

🤖 Generated with Claude Code

Greptile Summary

This PR replaces the static template-based /init command with a rich context-gathering pipeline that collects Rust/Node.js/git/CI/build-system information, then delegates AGENTS.md authoring to the LLM agent via AppAction::SendMessage — mirroring the /change and /relay pattern.

  • New context helpers: parse_cargo_toml, parse_package_json, gather_git_info (via subprocess), detect_ci_systems, detect_build_systems, detect_test_frameworks, and read_existing_agents_md (100 KB cap) all feed a structured Markdown prompt via build_init_prompt.
  • Behavioral change: /init no longer writes AGENTS.md itself; instead it produces a comprehensive prompt that the agent executes, reading source files and producing a project-tailored guide.
  • Test coverage: 56 unit tests are added covering every new helper, replacing the old template-assertion tests.

Confidence Score: 3/5

Functionally sound rewrite, but the git remote URL is included verbatim in the LLM prompt and can carry embedded HTTP credentials, leaking them to the model provider.

The context-gathering pipeline is well-structured and thoroughly tested. The main concern is in gather_git_info: git remote get-url origin returns the raw URL, including any inline userinfo (username:token), which is appended directly to the prompt sent to the LLM. Repos cloned with an embedded token — common in CI/CD pipelines — would silently exfiltrate that credential. This needs to be addressed before the change ships.

crates/tui/src/commands/init.rs — specifically the gather_git_info function and how the remote URL is handled before being included in the prompt.

Security Review

  • Credential leakage via git remote URL (gather_git_info, line 386): git remote get-url origin can return a URL with embedded HTTP credentials (user + token/password). The raw URL is inserted into the LLM prompt and transmitted to the model provider, exfiltrating any embedded secret without user awareness. The userinfo component should be stripped before embedding the URL in the prompt.

Important Files Changed

Filename Overview
crates/tui/src/commands/init.rs Complete rewrite of /init from a template-based file writer to an LLM-delegated analyzer; introduces context-gathering helpers and a prompt builder — has a credential-leakage path via git remote URL and misses framework detection in devDependencies.

Fix All in Codex Fix All in Claude Code Fix All in Cursor

Reviews (1): Last reviewed commit: "fix(init): satisfy clippy let_chains war..." | Re-trigger Greptile

Greptile also left 3 inline comments on this PR.

punkcanyang and others added 3 commits June 4, 2026 14:11
Replace the template-based /init command with deep codebase analysis
that delegates AGENTS.md content generation to the LLM agent via
SendMessage, producing a customized project guide tailored to the
actual codebase.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ptimization

- Increase existing AGENTS.md read limit from 8KB to 100KB to match
  other file-reading limits in the codebase
- Parse [workspace.dependencies] in Cargo.toml for monorepo support
- Check [workspace.dev-dependencies] in test framework detection
- Add --untracked-files=no to git status to avoid scanning large repos

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

Thanks @punkcanyang for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the /init command in crates/tui/src/commands/init.rs to delegate the generation of AGENTS.md to the LLM agent. Instead of using a static template, it introduces a comprehensive context-gathering pipeline that extracts details about the project (such as Cargo/npm configurations, Git repository status, CI/CD systems, build systems, and test frameworks) to construct a rich prompt for the agent. Feedback on the changes highlights two main issues: first, the Git repository check fails in subdirectories or monorepos because it only looks for .git in the immediate workspace directory; second, there is a logical inconsistency in the git status parsing where untracked files are counted despite passing --untracked-files=no to the command, resulting in dead code.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +360 to +363
// Check if we're in a git repo.
if !workspace.join(".git").exists() {
return None;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Checking for .git directly in the workspace directory fails when the workspace is a subdirectory of a git repository (e.g., in a monorepo or nested crate structure). This prevents gather_git_info from collecting any git context for sub-projects.

We can resolve this by walking up the directory tree to find the .git directory, ensuring git context is gathered correctly for subdirectories.

    // Check if we're in a git repo (handles subdirectories/monorepos).
    let mut is_git = false;
    let mut current = Some(workspace);
    while let Some(path) = current {
        if path.join(\".git\").exists() {
            is_git = true;
            break;
        }
        current = path.parent();
    }
    if !is_git {
        return None;
    }

Comment on lines +419 to 432
let untracked = status_str.lines().filter(|l| l.starts_with("??")).count();
if staged > 0 || unstaged > 0 || untracked > 0 {
let mut parts = Vec::new();
if staged > 0 {
parts.push(format!("{staged} staged"));
}
if unstaged > 0 {
parts.push(format!("{unstaged} modified"));
}
if untracked > 0 {
parts.push(format!("{untracked} untracked"));
}
lines.push(format!("- Working tree: {}", parts.join(", ")));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since git status is executed with --untracked-files=no on line 397, the output status_str will never contain untracked files (which start with ??). Therefore, untracked will always be 0, making the untracked file counting and formatting logic dead code.

If untracked files should be ignored, we should remove this dead code to simplify the implementation. If untracked files should be counted, --untracked-files=no should be removed from the command arguments.

        if staged > 0 || unstaged > 0 {
            let mut parts = Vec::new();
            if staged > 0 {
                parts.push(format!(\"{staged} staged\"));
            }
            if unstaged > 0 {
                parts.push(format!(\"{unstaged} modified\"));
            }
            lines.push(format!(\"- Working tree: {}\", parts.join(\", \")));
        }

Comment on lines +386 to +388
if let Some(url) = run(&["remote", "get-url", "origin"]) {
lines.push(format!("- Remote: {url}"));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Git remote URL may leak embedded credentials to the LLM

git remote get-url origin can return a URL that contains inline HTTP credentials in the authority component (username and/or password/token). That raw string is pushed directly into the prompt and forwarded to the model provider. A user whose remote was configured with an embedded token — common in CI-cloned repos or older setups — would unknowingly exfiltrate that secret. Consider stripping the userinfo component before embedding: parse the URL, clear the password and username fields, and re-serialize, or apply a regex to mask the authority before pushing to lines.

Fix in Codex Fix in Claude Code Fix in Cursor

Comment on lines +299 to +318
// Dependencies.
if let Some(deps) = doc.get("dependencies").and_then(|v| v.as_object()) {
let dep_keys: Vec<&str> = deps.keys().map(|k| k.as_str()).collect();
if !dep_keys.is_empty() {
// Detect frameworks from deps.
let frameworks = detect_js_frameworks(&dep_keys);
if !frameworks.is_empty() {
lines.push(format!("- Frameworks detected: {}", frameworks.join(", ")));
}
lines.push(format!("- Dependencies: {}", dep_keys.join(", ")));
}
}

info
// Dev dependencies.
if let Some(dev_deps) = doc.get("devDependencies").and_then(|v| v.as_object()) {
let dev_keys: Vec<&str> = dev_deps.keys().map(|k| k.as_str()).collect();
if !dev_keys.is_empty() {
lines.push(format!("- Dev dependencies: {}", dev_keys.join(", ")));
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Framework detection misses devDependencies — build tools like Vite, Webpack, esbuild, and Turbopack are almost always in devDependencies, not dependencies. Calling detect_js_frameworks only with the runtime dependency keys means those tools are silently omitted from the generated AGENTS.md context.

Suggested change
// Dependencies.
if let Some(deps) = doc.get("dependencies").and_then(|v| v.as_object()) {
let dep_keys: Vec<&str> = deps.keys().map(|k| k.as_str()).collect();
if !dep_keys.is_empty() {
// Detect frameworks from deps.
let frameworks = detect_js_frameworks(&dep_keys);
if !frameworks.is_empty() {
lines.push(format!("- Frameworks detected: {}", frameworks.join(", ")));
}
lines.push(format!("- Dependencies: {}", dep_keys.join(", ")));
}
}
info
// Dev dependencies.
if let Some(dev_deps) = doc.get("devDependencies").and_then(|v| v.as_object()) {
let dev_keys: Vec<&str> = dev_deps.keys().map(|k| k.as_str()).collect();
if !dev_keys.is_empty() {
lines.push(format!("- Dev dependencies: {}", dev_keys.join(", ")));
}
}
// Dependencies.
if let Some(deps) = doc.get("dependencies").and_then(|v| v.as_object()) {
let dep_keys: Vec<&str> = deps.keys().map(|k| k.as_str()).collect();
if !dep_keys.is_empty() {
// Detect frameworks from runtime deps.
let frameworks = detect_js_frameworks(&dep_keys);
if !frameworks.is_empty() {
lines.push(format!("- Frameworks detected: {}", frameworks.join(", ")));
}
lines.push(format!("- Dependencies: {}", dep_keys.join(", ")));
}
}
// Dev dependencies.
if let Some(dev_deps) = doc.get("devDependencies").and_then(|v| v.as_object()) {
let dev_keys: Vec<&str> = dev_deps.keys().map(|k| k.as_str()).collect();
if !dev_keys.is_empty() {
// Also detect build-tool/framework entries from devDependencies (Vite, webpack, etc.).
let dev_frameworks = detect_js_frameworks(&dev_keys);
if !dev_frameworks.is_empty() {
lines.push(format!("- Dev frameworks/tools: {}", dev_frameworks.join(", ")));
}
lines.push(format!("- Dev dependencies: {}", dev_keys.join(", ")));
}
}

Fix in Codex Fix in Claude Code Fix in Cursor

Comment on lines +419 to +431
let untracked = status_str.lines().filter(|l| l.starts_with("??")).count();
if staged > 0 || unstaged > 0 || untracked > 0 {
let mut parts = Vec::new();
if staged > 0 {
parts.push(format!("{staged} staged"));
}
if unstaged > 0 {
parts.push(format!("{unstaged} modified"));
}
if untracked > 0 {
parts.push(format!("{untracked} untracked"));
}
lines.push(format!("- Working tree: {}", parts.join(", ")));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead untracked counter — the command is invoked with --untracked-files=no, so ?? lines never appear in the output. The untracked variable will always be 0, and the branch that formats "{untracked} untracked" is unreachable. This creates a misleading impression that untracked files are being counted.

Suggested change
let untracked = status_str.lines().filter(|l| l.starts_with("??")).count();
if staged > 0 || unstaged > 0 || untracked > 0 {
let mut parts = Vec::new();
if staged > 0 {
parts.push(format!("{staged} staged"));
}
if unstaged > 0 {
parts.push(format!("{unstaged} modified"));
}
if untracked > 0 {
parts.push(format!("{untracked} untracked"));
}
lines.push(format!("- Working tree: {}", parts.join(", ")));
if staged > 0 || unstaged > 0 {
let mut parts = Vec::new();
if staged > 0 {
parts.push(format!("{staged} staged"));
}
if unstaged > 0 {
parts.push(format!("{unstaged} modified"));
}
lines.push(format!("- Working tree: {}", parts.join(", ")));

Fix in Codex Fix in Claude Code Fix in Cursor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant