feat(init): LLM-powered codebase analysis for AGENTS.md generation by punkcanyang · Pull Request #2745 · Hmbown/CodeWhale

punkcanyang · 2026-06-04T08:42:21Z

Summary

Replace the template-based /init command with deep codebase analysis that generates a customized AGENTS.md tailored to the actual project.

How it works: The command gathers rich project context in Rust, then delegates content generation to the LLM agent via SendMessage — the same pattern used by /change, /relay, and /rlm.

Cost: /init is an explicit user action — users who don't need it won't incur API calls.

Changes

gather_project_context() — orchestrates all context gathering
parse_cargo_toml() — workspace members, deps, features, workspace.dependencies
parse_package_json() — scripts, deps, framework detection
gather_git_info() — remote, branch, status (with --untracked-files=no)
detect_ci_systems() — GitHub Actions, GitLab CI, CircleCI, Jenkins, Travis, Azure
detect_build_systems() — Makefile, Justfile, CMake, Meson, Bazel, scripts/
detect_test_frameworks() — Cargo.toml dev-deps (incl. workspace.dev-dependencies), package.json, pytest
build_init_prompt() — formats context for the LLM agent
read_existing_agents_md() — up to 100KB for in-place updates

Test plan

56 unit tests pass, clippy clean, fmt clean
Rebased onto v0.8.53
Manual: run /init to verify agent generates comprehensive AGENTS.md

🤖 Generated with Claude Code

Greptile Summary

This PR replaces the static template-based /init command with a rich context-gathering pipeline that collects Rust/Node.js/git/CI/build-system information, then delegates AGENTS.md authoring to the LLM agent via AppAction::SendMessage — mirroring the /change and /relay pattern.

New context helpers: parse_cargo_toml, parse_package_json, gather_git_info (via subprocess), detect_ci_systems, detect_build_systems, detect_test_frameworks, and read_existing_agents_md (100 KB cap) all feed a structured Markdown prompt via build_init_prompt.
Behavioral change: /init no longer writes AGENTS.md itself; instead it produces a comprehensive prompt that the agent executes, reading source files and producing a project-tailored guide.
Test coverage: 56 unit tests are added covering every new helper, replacing the old template-assertion tests.

Confidence Score: 3/5

Functionally sound rewrite, but the git remote URL is included verbatim in the LLM prompt and can carry embedded HTTP credentials, leaking them to the model provider.

The context-gathering pipeline is well-structured and thoroughly tested. The main concern is in gather_git_info: git remote get-url origin returns the raw URL, including any inline userinfo (username:token), which is appended directly to the prompt sent to the LLM. Repos cloned with an embedded token — common in CI/CD pipelines — would silently exfiltrate that credential. This needs to be addressed before the change ships.

crates/tui/src/commands/init.rs — specifically the gather_git_info function and how the remote URL is handled before being included in the prompt.

Security Review

Credential leakage via git remote URL (gather_git_info, line 386): git remote get-url origin can return a URL with embedded HTTP credentials (user + token/password). The raw URL is inserted into the LLM prompt and transmitted to the model provider, exfiltrating any embedded secret without user awareness. The userinfo component should be stripped before embedding the URL in the prompt.

Important Files Changed

Filename	Overview
crates/tui/src/commands/init.rs	Complete rewrite of `/init` from a template-based file writer to an LLM-delegated analyzer; introduces context-gathering helpers and a prompt builder — has a credential-leakage path via git remote URL and misses framework detection in devDependencies.

_{Reviews (1): Last reviewed commit: "fix(init): satisfy clippy let_chains war..." | Re-trigger Greptile}

Greptile also left 3 inline comments on this PR.

Replace the template-based /init command with deep codebase analysis that delegates AGENTS.md content generation to the LLM agent via SendMessage, producing a customized project guide tailored to the actual codebase. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ptimization - Increase existing AGENTS.md read limit from 8KB to 100KB to match other file-reading limits in the codebase - Parse [workspace.dependencies] in Cargo.toml for monorepo support - Check [workspace.dev-dependencies] in test framework detection - Add --untracked-files=no to git status to avoid scanning large repos Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-06-04T08:42:34Z

Thanks @punkcanyang for taking the time to contribute.

This repository is currently observing a maintainer-managed contribution gate in dry-run mode, so this pull request is staying open. When enforcement is enabled, pull requests from contributors who are not listed in .github/APPROVED_CONTRIBUTORS will be closed automatically.

Please read CONTRIBUTING.md for the expected contribution shape. A maintainer can grant PR access by commenting /lgtm on a pull request.

gemini-code-assist

Code Review

This pull request refactors the /init command in crates/tui/src/commands/init.rs to delegate the generation of AGENTS.md to the LLM agent. Instead of using a static template, it introduces a comprehensive context-gathering pipeline that extracts details about the project (such as Cargo/npm configurations, Git repository status, CI/CD systems, build systems, and test frameworks) to construct a rich prompt for the agent. Feedback on the changes highlights two main issues: first, the Git repository check fails in subdirectories or monorepos because it only looks for .git in the immediate workspace directory; second, there is a logical inconsistency in the git status parsing where untracked files are counted despite passing --untracked-files=no to the command, resulting in dead code.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-04T08:44:12Z

+    // Check if we're in a git repo.
+    if !workspace.join(".git").exists() {
+        return None;
+    }


Checking for .git directly in the workspace directory fails when the workspace is a subdirectory of a git repository (e.g., in a monorepo or nested crate structure). This prevents gather_git_info from collecting any git context for sub-projects.

We can resolve this by walking up the directory tree to find the .git directory, ensuring git context is gathered correctly for subdirectories.

// Check if we're in a git repo (handles subdirectories/monorepos). let mut is_git = false; let mut current = Some(workspace); while let Some(path) = current { if path.join(\".git\").exists() { is_git = true; break; } current = path.parent(); } if !is_git { return None; }

gemini-code-assist · 2026-06-04T08:44:12Z

+        let untracked = status_str.lines().filter(|l| l.starts_with("??")).count();
+        if staged > 0 || unstaged > 0 || untracked > 0 {
+            let mut parts = Vec::new();
+            if staged > 0 {
+                parts.push(format!("{staged} staged"));
+            }
+            if unstaged > 0 {
+                parts.push(format!("{unstaged} modified"));
            }
+            if untracked > 0 {
+                parts.push(format!("{untracked} untracked"));
+            }
+            lines.push(format!("- Working tree: {}", parts.join(", ")));
        }


Since git status is executed with --untracked-files=no on line 397, the output status_str will never contain untracked files (which start with ??). Therefore, untracked will always be 0, making the untracked file counting and formatting logic dead code.

If untracked files should be ignored, we should remove this dead code to simplify the implementation. If untracked files should be counted, --untracked-files=no should be removed from the command arguments.

if staged > 0 || unstaged > 0 { let mut parts = Vec::new(); if staged > 0 { parts.push(format!(\"{staged} staged\")); } if unstaged > 0 { parts.push(format!(\"{unstaged} modified\")); } lines.push(format!(\"- Working tree: {}\", parts.join(\", \"))); }

greptile-apps · 2026-06-04T08:47:09Z

+    if let Some(url) = run(&["remote", "get-url", "origin"]) {
+        lines.push(format!("- Remote: {url}"));
+    }


Git remote URL may leak embedded credentials to the LLM

git remote get-url origin can return a URL that contains inline HTTP credentials in the authority component (username and/or password/token). That raw string is pushed directly into the prompt and forwarded to the model provider. A user whose remote was configured with an embedded token — common in CI-cloned repos or older setups — would unknowingly exfiltrate that secret. Consider stripping the userinfo component before embedding: parse the URL, clear the password and username fields, and re-serialize, or apply a regex to mask the authority before pushing to lines.

greptile-apps · 2026-06-04T08:47:10Z

+    // Dependencies.
+    if let Some(deps) = doc.get("dependencies").and_then(|v| v.as_object()) {
+        let dep_keys: Vec<&str> = deps.keys().map(|k| k.as_str()).collect();
+        if !dep_keys.is_empty() {
+            // Detect frameworks from deps.
+            let frameworks = detect_js_frameworks(&dep_keys);
+            if !frameworks.is_empty() {
+                lines.push(format!("- Frameworks detected: {}", frameworks.join(", ")));
+            }
+            lines.push(format!("- Dependencies: {}", dep_keys.join(", ")));
+        }
    }

-    info
+    // Dev dependencies.
+    if let Some(dev_deps) = doc.get("devDependencies").and_then(|v| v.as_object()) {
+        let dev_keys: Vec<&str> = dev_deps.keys().map(|k| k.as_str()).collect();
+        if !dev_keys.is_empty() {
+            lines.push(format!("- Dev dependencies: {}", dev_keys.join(", ")));
+        }
+    }


Framework detection misses devDependencies — build tools like Vite, Webpack, esbuild, and Turbopack are almost always in devDependencies, not dependencies. Calling detect_js_frameworks only with the runtime dependency keys means those tools are silently omitted from the generated AGENTS.md context.

Suggested change

// Dependencies.

if let Some(deps) = doc.get("dependencies").and_then(|v| v.as_object()) {

let dep_keys: Vec<&str> = deps.keys().map(|k| k.as_str()).collect();

if !dep_keys.is_empty() {

// Detect frameworks from deps.

let frameworks = detect_js_frameworks(&dep_keys);

if !frameworks.is_empty() {

lines.push(format!("- Frameworks detected: {}", frameworks.join(", ")));

}

lines.push(format!("- Dependencies: {}", dep_keys.join(", ")));

}

}

info

// Dev dependencies.

if let Some(dev_deps) = doc.get("devDependencies").and_then(|v| v.as_object()) {

let dev_keys: Vec<&str> = dev_deps.keys().map(|k| k.as_str()).collect();

if !dev_keys.is_empty() {

lines.push(format!("- Dev dependencies: {}", dev_keys.join(", ")));

}

}

// Dependencies.

if let Some(deps) = doc.get("dependencies").and_then(|v| v.as_object()) {

let dep_keys: Vec<&str> = deps.keys().map(|k| k.as_str()).collect();

if !dep_keys.is_empty() {

// Detect frameworks from runtime deps.

let frameworks = detect_js_frameworks(&dep_keys);

if !frameworks.is_empty() {

lines.push(format!("- Frameworks detected: {}", frameworks.join(", ")));

}

lines.push(format!("- Dependencies: {}", dep_keys.join(", ")));

}

}

// Dev dependencies.

if let Some(dev_deps) = doc.get("devDependencies").and_then(|v| v.as_object()) {

let dev_keys: Vec<&str> = dev_deps.keys().map(|k| k.as_str()).collect();

if !dev_keys.is_empty() {

// Also detect build-tool/framework entries from devDependencies (Vite, webpack, etc.).

let dev_frameworks = detect_js_frameworks(&dev_keys);

if !dev_frameworks.is_empty() {

lines.push(format!("- Dev frameworks/tools: {}", dev_frameworks.join(", ")));

}

lines.push(format!("- Dev dependencies: {}", dev_keys.join(", ")));

}

}

greptile-apps · 2026-06-04T08:47:11Z

+        let untracked = status_str.lines().filter(|l| l.starts_with("??")).count();
+        if staged > 0 || unstaged > 0 || untracked > 0 {
+            let mut parts = Vec::new();
+            if staged > 0 {
+                parts.push(format!("{staged} staged"));
+            }
+            if unstaged > 0 {
+                parts.push(format!("{unstaged} modified"));
            }
+            if untracked > 0 {
+                parts.push(format!("{untracked} untracked"));
+            }
+            lines.push(format!("- Working tree: {}", parts.join(", ")));


Dead untracked counter — the command is invoked with --untracked-files=no, so ?? lines never appear in the output. The untracked variable will always be 0, and the branch that formats "{untracked} untracked" is unreachable. This creates a misleading impression that untracked files are being counted.

Suggested change

let untracked = status_str.lines().filter(|l| l.starts_with("??")).count();

if staged > 0 || unstaged > 0 || untracked > 0 {

let mut parts = Vec::new();

if staged > 0 {

parts.push(format!("{staged} staged"));

}

if unstaged > 0 {

parts.push(format!("{unstaged} modified"));

}

if untracked > 0 {

parts.push(format!("{untracked} untracked"));

}

lines.push(format!("- Working tree: {}", parts.join(", ")));

if staged > 0 || unstaged > 0 {

let mut parts = Vec::new();

if staged > 0 {

parts.push(format!("{staged} staged"));

}

if unstaged > 0 {

parts.push(format!("{unstaged} modified"));

}

lines.push(format!("- Working tree: {}", parts.join(", ")));

punkcanyang and others added 3 commits June 4, 2026 14:11

fix(init): satisfy clippy let_chains warnings

45abac3

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 4, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(init): LLM-powered codebase analysis for AGENTS.md generation#2745

feat(init): LLM-powered codebase analysis for AGENTS.md generation#2745
punkcanyang wants to merge 3 commits into
Hmbown:mainfrom
punkcanyang:feat/enhanced-init

punkcanyang commented Jun 4, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

greptile-apps Bot Jun 4, 2026

Uh oh!

greptile-apps Bot Jun 4, 2026

Uh oh!

greptile-apps Bot Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

punkcanyang commented Jun 4, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Greptile Summary

Confidence Score: 3/5

Security Review

Important Files Changed

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

punkcanyang commented Jun 4, 2026 •

edited by greptile-apps Bot

Loading