feat(init): LLM-powered codebase analysis for AGENTS.md generation#2745
feat(init): LLM-powered codebase analysis for AGENTS.md generation#2745punkcanyang wants to merge 3 commits into
Conversation
Replace the template-based /init command with deep codebase analysis that delegates AGENTS.md content generation to the LLM agent via SendMessage, producing a customized project guide tailored to the actual codebase. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ptimization - Increase existing AGENTS.md read limit from 8KB to 100KB to match other file-reading limits in the codebase - Parse [workspace.dependencies] in Cargo.toml for monorepo support - Check [workspace.dev-dependencies] in test framework detection - Add --untracked-files=no to git status to avoid scanning large repos Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks @punkcanyang for taking the time to contribute. This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in Please read |
There was a problem hiding this comment.
Code Review
This pull request refactors the /init command in crates/tui/src/commands/init.rs to delegate the generation of AGENTS.md to the LLM agent. Instead of using a static template, it introduces a comprehensive context-gathering pipeline that extracts details about the project (such as Cargo/npm configurations, Git repository status, CI/CD systems, build systems, and test frameworks) to construct a rich prompt for the agent. Feedback on the changes highlights two main issues: first, the Git repository check fails in subdirectories or monorepos because it only looks for .git in the immediate workspace directory; second, there is a logical inconsistency in the git status parsing where untracked files are counted despite passing --untracked-files=no to the command, resulting in dead code.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| // Check if we're in a git repo. | ||
| if !workspace.join(".git").exists() { | ||
| return None; | ||
| } |
There was a problem hiding this comment.
Checking for .git directly in the workspace directory fails when the workspace is a subdirectory of a git repository (e.g., in a monorepo or nested crate structure). This prevents gather_git_info from collecting any git context for sub-projects.
We can resolve this by walking up the directory tree to find the .git directory, ensuring git context is gathered correctly for subdirectories.
// Check if we're in a git repo (handles subdirectories/monorepos).
let mut is_git = false;
let mut current = Some(workspace);
while let Some(path) = current {
if path.join(\".git\").exists() {
is_git = true;
break;
}
current = path.parent();
}
if !is_git {
return None;
}| let untracked = status_str.lines().filter(|l| l.starts_with("??")).count(); | ||
| if staged > 0 || unstaged > 0 || untracked > 0 { | ||
| let mut parts = Vec::new(); | ||
| if staged > 0 { | ||
| parts.push(format!("{staged} staged")); | ||
| } | ||
| if unstaged > 0 { | ||
| parts.push(format!("{unstaged} modified")); | ||
| } | ||
| if untracked > 0 { | ||
| parts.push(format!("{untracked} untracked")); | ||
| } | ||
| lines.push(format!("- Working tree: {}", parts.join(", "))); | ||
| } |
There was a problem hiding this comment.
Since git status is executed with --untracked-files=no on line 397, the output status_str will never contain untracked files (which start with ??). Therefore, untracked will always be 0, making the untracked file counting and formatting logic dead code.
If untracked files should be ignored, we should remove this dead code to simplify the implementation. If untracked files should be counted, --untracked-files=no should be removed from the command arguments.
if staged > 0 || unstaged > 0 {
let mut parts = Vec::new();
if staged > 0 {
parts.push(format!(\"{staged} staged\"));
}
if unstaged > 0 {
parts.push(format!(\"{unstaged} modified\"));
}
lines.push(format!(\"- Working tree: {}\", parts.join(\", \")));
}| if let Some(url) = run(&["remote", "get-url", "origin"]) { | ||
| lines.push(format!("- Remote: {url}")); | ||
| } |
There was a problem hiding this comment.
Git remote URL may leak embedded credentials to the LLM
git remote get-url origin can return a URL that contains inline HTTP credentials in the authority component (username and/or password/token). That raw string is pushed directly into the prompt and forwarded to the model provider. A user whose remote was configured with an embedded token — common in CI-cloned repos or older setups — would unknowingly exfiltrate that secret. Consider stripping the userinfo component before embedding: parse the URL, clear the password and username fields, and re-serialize, or apply a regex to mask the authority before pushing to lines.
| // Dependencies. | ||
| if let Some(deps) = doc.get("dependencies").and_then(|v| v.as_object()) { | ||
| let dep_keys: Vec<&str> = deps.keys().map(|k| k.as_str()).collect(); | ||
| if !dep_keys.is_empty() { | ||
| // Detect frameworks from deps. | ||
| let frameworks = detect_js_frameworks(&dep_keys); | ||
| if !frameworks.is_empty() { | ||
| lines.push(format!("- Frameworks detected: {}", frameworks.join(", "))); | ||
| } | ||
| lines.push(format!("- Dependencies: {}", dep_keys.join(", "))); | ||
| } | ||
| } | ||
|
|
||
| info | ||
| // Dev dependencies. | ||
| if let Some(dev_deps) = doc.get("devDependencies").and_then(|v| v.as_object()) { | ||
| let dev_keys: Vec<&str> = dev_deps.keys().map(|k| k.as_str()).collect(); | ||
| if !dev_keys.is_empty() { | ||
| lines.push(format!("- Dev dependencies: {}", dev_keys.join(", "))); | ||
| } | ||
| } |
There was a problem hiding this comment.
Framework detection misses
devDependencies — build tools like Vite, Webpack, esbuild, and Turbopack are almost always in devDependencies, not dependencies. Calling detect_js_frameworks only with the runtime dependency keys means those tools are silently omitted from the generated AGENTS.md context.
| // Dependencies. | |
| if let Some(deps) = doc.get("dependencies").and_then(|v| v.as_object()) { | |
| let dep_keys: Vec<&str> = deps.keys().map(|k| k.as_str()).collect(); | |
| if !dep_keys.is_empty() { | |
| // Detect frameworks from deps. | |
| let frameworks = detect_js_frameworks(&dep_keys); | |
| if !frameworks.is_empty() { | |
| lines.push(format!("- Frameworks detected: {}", frameworks.join(", "))); | |
| } | |
| lines.push(format!("- Dependencies: {}", dep_keys.join(", "))); | |
| } | |
| } | |
| info | |
| // Dev dependencies. | |
| if let Some(dev_deps) = doc.get("devDependencies").and_then(|v| v.as_object()) { | |
| let dev_keys: Vec<&str> = dev_deps.keys().map(|k| k.as_str()).collect(); | |
| if !dev_keys.is_empty() { | |
| lines.push(format!("- Dev dependencies: {}", dev_keys.join(", "))); | |
| } | |
| } | |
| // Dependencies. | |
| if let Some(deps) = doc.get("dependencies").and_then(|v| v.as_object()) { | |
| let dep_keys: Vec<&str> = deps.keys().map(|k| k.as_str()).collect(); | |
| if !dep_keys.is_empty() { | |
| // Detect frameworks from runtime deps. | |
| let frameworks = detect_js_frameworks(&dep_keys); | |
| if !frameworks.is_empty() { | |
| lines.push(format!("- Frameworks detected: {}", frameworks.join(", "))); | |
| } | |
| lines.push(format!("- Dependencies: {}", dep_keys.join(", "))); | |
| } | |
| } | |
| // Dev dependencies. | |
| if let Some(dev_deps) = doc.get("devDependencies").and_then(|v| v.as_object()) { | |
| let dev_keys: Vec<&str> = dev_deps.keys().map(|k| k.as_str()).collect(); | |
| if !dev_keys.is_empty() { | |
| // Also detect build-tool/framework entries from devDependencies (Vite, webpack, etc.). | |
| let dev_frameworks = detect_js_frameworks(&dev_keys); | |
| if !dev_frameworks.is_empty() { | |
| lines.push(format!("- Dev frameworks/tools: {}", dev_frameworks.join(", "))); | |
| } | |
| lines.push(format!("- Dev dependencies: {}", dev_keys.join(", "))); | |
| } | |
| } |
| let untracked = status_str.lines().filter(|l| l.starts_with("??")).count(); | ||
| if staged > 0 || unstaged > 0 || untracked > 0 { | ||
| let mut parts = Vec::new(); | ||
| if staged > 0 { | ||
| parts.push(format!("{staged} staged")); | ||
| } | ||
| if unstaged > 0 { | ||
| parts.push(format!("{unstaged} modified")); | ||
| } | ||
| if untracked > 0 { | ||
| parts.push(format!("{untracked} untracked")); | ||
| } | ||
| lines.push(format!("- Working tree: {}", parts.join(", "))); |
There was a problem hiding this comment.
Dead
untracked counter — the command is invoked with --untracked-files=no, so ?? lines never appear in the output. The untracked variable will always be 0, and the branch that formats "{untracked} untracked" is unreachable. This creates a misleading impression that untracked files are being counted.
| let untracked = status_str.lines().filter(|l| l.starts_with("??")).count(); | |
| if staged > 0 || unstaged > 0 || untracked > 0 { | |
| let mut parts = Vec::new(); | |
| if staged > 0 { | |
| parts.push(format!("{staged} staged")); | |
| } | |
| if unstaged > 0 { | |
| parts.push(format!("{unstaged} modified")); | |
| } | |
| if untracked > 0 { | |
| parts.push(format!("{untracked} untracked")); | |
| } | |
| lines.push(format!("- Working tree: {}", parts.join(", "))); | |
| if staged > 0 || unstaged > 0 { | |
| let mut parts = Vec::new(); | |
| if staged > 0 { | |
| parts.push(format!("{staged} staged")); | |
| } | |
| if unstaged > 0 { | |
| parts.push(format!("{unstaged} modified")); | |
| } | |
| lines.push(format!("- Working tree: {}", parts.join(", "))); |
Summary
Replace the template-based
/initcommand with deep codebase analysis that generates a customizedAGENTS.mdtailored to the actual project.How it works: The command gathers rich project context in Rust, then delegates content generation to the LLM agent via
SendMessage— the same pattern used by/change,/relay, and/rlm.Cost:
/initis an explicit user action — users who don't need it won't incur API calls.Changes
gather_project_context()— orchestrates all context gatheringparse_cargo_toml()— workspace members, deps, features, workspace.dependenciesparse_package_json()— scripts, deps, framework detectiongather_git_info()— remote, branch, status (with--untracked-files=no)detect_ci_systems()— GitHub Actions, GitLab CI, CircleCI, Jenkins, Travis, Azuredetect_build_systems()— Makefile, Justfile, CMake, Meson, Bazel, scripts/detect_test_frameworks()— Cargo.toml dev-deps (incl. workspace.dev-dependencies), package.json, pytestbuild_init_prompt()— formats context for the LLM agentread_existing_agents_md()— up to 100KB for in-place updatesTest plan
/initto verify agent generates comprehensive AGENTS.md🤖 Generated with Claude Code
Greptile Summary
This PR replaces the static template-based
/initcommand with a rich context-gathering pipeline that collects Rust/Node.js/git/CI/build-system information, then delegatesAGENTS.mdauthoring to the LLM agent viaAppAction::SendMessage— mirroring the/changeand/relaypattern.parse_cargo_toml,parse_package_json,gather_git_info(via subprocess),detect_ci_systems,detect_build_systems,detect_test_frameworks, andread_existing_agents_md(100 KB cap) all feed a structured Markdown prompt viabuild_init_prompt./initno longer writesAGENTS.mditself; instead it produces a comprehensive prompt that the agent executes, reading source files and producing a project-tailored guide.Confidence Score: 3/5
Functionally sound rewrite, but the git remote URL is included verbatim in the LLM prompt and can carry embedded HTTP credentials, leaking them to the model provider.
The context-gathering pipeline is well-structured and thoroughly tested. The main concern is in
gather_git_info:git remote get-url originreturns the raw URL, including any inline userinfo (username:token), which is appended directly to the prompt sent to the LLM. Repos cloned with an embedded token — common in CI/CD pipelines — would silently exfiltrate that credential. This needs to be addressed before the change ships.crates/tui/src/commands/init.rs — specifically the
gather_git_infofunction and how the remote URL is handled before being included in the prompt.Security Review
gather_git_info, line 386):git remote get-url origincan return a URL with embedded HTTP credentials (user + token/password). The raw URL is inserted into the LLM prompt and transmitted to the model provider, exfiltrating any embedded secret without user awareness. The userinfo component should be stripped before embedding the URL in the prompt.Important Files Changed
/initfrom a template-based file writer to an LLM-delegated analyzer; introduces context-gathering helpers and a prompt builder — has a credential-leakage path via git remote URL and misses framework detection in devDependencies.Reviews (1): Last reviewed commit: "fix(init): satisfy clippy let_chains war..." | Re-trigger Greptile