feat(skills): add dependency-pruning skill by blarghmatey · Pull Request #21 · mitodl/agent-kit

blarghmatey · 2026-06-26T20:37:46Z

What are the relevant tickets?

N/A

Description (What does it do?)

Adds a new dependency-pruning skill under skills/process/ that audits a repository's dependencies across Python, JS/TS, Go, Rust, and other ecosystems to surface four categories of action:

Remove: unused packages, confirmed via automated tools (deptry, depcheck, cargo machete, go mod tidy) plus manual grep to catch packages the tools miss
Optimize import style: JS/TS packages imported in a way that prevents tree-shaking (e.g. import _ from 'lodash' vs import { debounce } from 'lodash-es')
Vendor/rewrite candidates: packages where only ≤3 unique symbols are used and the package is small enough to inline (default thresholds: ≤3 symbols, ≤500 LOC — both configurable)
Migrate away from: deprecated, sunset, or abandoned packages with known migration targets (e.g. react-ga → GA4/PostHog after Universal Analytics sunset)

The skill includes explicit blind-spot guidance to avoid common false positives:

Django projects: deptry DEP002 false-positive rate can exceed 30 flags (PyPI name ≠ Python module name); INSTALLED_APPS packages are loaded via strings, not imports
Server runtime packages: gunicorn, uwsgi, granian, hypercorn are invoked via CLI in Dockerfile/K8s — check deployment configs AND git history for in-flight server migrations before flagging for removal
CLI-invoked dev tooling: ipdb, pdbpp, bpython, ptpython, debugpy etc. are terminal tools, not app imports — flag as "move to dev deps" rather than "remove"
Webpack/babel plugins: referenced in config files, not source imports

After reporting, the skill offers to execute safe changes (removals, import-style fixes, vendor stubs) and delegates PR creation to the create-ol-pull-request skill if available.

Evaluation

Evaluated over 2 iterations against ocw-studio (Django 5.2 + React/lodash). The skill achieves 93% assertion pass rate vs 60% for the no-skill baseline across three eval scenarios:

Full-repo dependency audit
Focused lodash vendoring/tree-shaking analysis
Django-specific Python-only audit

Key iteration-2 improvements over iteration-1: added Phase 3b (import style / tree-shaking for JS/TS), Phase 4 (deprecated/sunset detection), Django INSTALLED_APPS blind spot, server runtime caveat, and developer tooling caveat.

How can this be tested?

Point the skill at any repo with Python or JS/TS dependencies:

/dependency-pruning
Audit the dependencies in ~/code/mit/apps/maintained/ocw-studio

Verify the report:

Covers both Python (pyproject.toml) and JS (package.json) ecosystems
Lists unused packages with concrete evidence (not just tool output)
Includes an "Optimize Import Style" section for lodash/ramda
Does NOT flag uwsgi or active server runtimes for removal without checking deployment configs
Does NOT flag ipdb/bpython for removal (marks as "move to dev deps")
Includes a "Migrate Away From" section for react-ga (GA3 sunset)

Additional Context

The skill is designed to be conservative — it requires evidence before flagging anything as removable, and asks the user to confirm before executing any changes. The "Optimize Import Style" category is the highest-ROI output for JS/TS-heavy repos: switching from import _ from 'lodash' to lodash-es typically saves 40–70 KB gzipped.

Audits a repository's dependencies across Python, JS/TS, Go, Rust, and other ecosystems to surface four categories of action: - Remove: unused packages (confirmed via tool output + manual grep) - Optimize: JS/TS packages with import styles that block tree-shaking - Vendor/rewrite: packages where only ≤3 symbols are used and the package is small enough to inline (configurable thresholds) - Migrate: deprecated, sunset, or abandoned packages with known migration targets Includes blind-spot guidance for Django projects (deptry false positives, INSTALLED_APPS string loading), server runtime packages (check Dockerfile + git history for in-flight migrations before flagging for removal), and CLI-invoked developer tooling (ipdb, bpython, pdbpp, etc. that static analysis always marks unused). Evaluated over 2 iterations against ocw-studio; skill achieves 93% assertion pass rate vs 60% for the no-skill baseline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a new 'dependency-pruning' skill, including its documentation, evaluation prompts, and reference guides for unused dependency detection across various ecosystems such as Python, JS/TS, Go, and Rust. The reviewer's feedback provides valuable and highly actionable improvements to the fallback scripts and shell commands. Specifically, the feedback addresses portability issues with GNU grep, a performance bug when measuring single-file Python modules, compatibility fallbacks for tomllib on Python versions older than 3.11, shell quoting issues in Node.js, and a regex parsing bug in Cargo.toml that incorrectly treats metadata keys as dependencies.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-26T20:39:33Z

+rg "from ${PKG}(\.\w+)? import (\w+)" --no-filename -o --include="*.py" \
+  | grep -oP 'import \K\w+' | sort -u


Using grep -oP relies on GNU grep, which is not pre-installed on macOS by default and can cause portability issues. We can achieve the same result directly using ripgrep's replace feature (-r), which is cleaner and more portable.

Suggested change

rg "from ${PKG}(\.\w+)? import (\w+)" --no-filename -o --include="*.py" \

| grep -oP 'import \K\w+' | sort -u

rg "from ${PKG}(?:\.\w+)? import (\w+)" -g "*.py" -o -r '$1' --no-filename | sort -u

gemini-code-assist · 2026-06-26T20:39:33Z

+# Python
+python -c "
+import importlib.util, pathlib
+spec = importlib.util.find_spec('${PKG}')
+if spec and spec.origin:
+    root = pathlib.Path(spec.origin).parent
+    lines = sum(len(f.read_text(errors='ignore').splitlines()) for f in root.rglob('*.py'))
+    print(lines)
+"


If the package is a single-file module (e.g., six.py), pathlib.Path(spec.origin).parent will resolve to the site-packages directory. Running root.rglob('*.py') on site-packages will scan every single installed package in the environment, causing massive performance issues and incorrect line counts. We should check if the origin is a directory package (__init__.py) or a single-file module.

Suggested change

# Python

python -c "

import importlib.util, pathlib

spec = importlib.util.find_spec('${PKG}')

if spec and spec.origin:

root = pathlib.Path(spec.origin).parent

lines = sum(len(f.read_text(errors='ignore').splitlines()) for f in root.rglob('*.py'))

print(lines)

"

# Python

python -c "

import importlib.util, pathlib

spec = importlib.util.find_spec('${PKG}')

if spec and spec.origin:

origin = pathlib.Path(spec.origin)

if origin.name == '__init__.py':

root = origin.parent

lines = sum(len(f.read_text(errors='ignore').splitlines()) for f in root.rglob('*.py'))

else:

lines = len(origin.read_text(errors='ignore').splitlines())

print(lines)

"

gemini-code-assist · 2026-06-26T20:39:33Z

+python -c "
+import tomllib, pathlib, subprocess, sys
+
+with open('pyproject.toml', 'rb') as f:
+    data = tomllib.load(f)
+
+deps = data.get('project', {}).get('dependencies', [])
+# Strip version specifiers
+pkgs = [d.split('[')[0].split('>=')[0].split('==')[0].split('<')[0].strip().lower().replace('-','_') for d in deps]
+
+for pkg in pkgs:
+    result = subprocess.run(['rg', '-l', pkg, '--include=*.py', '--glob=!tests/'], capture_output=True, text=True)
+    if not result.stdout.strip():
+        print(f'UNUSED: {pkg}')
+    else:
+        print(f'used:   {pkg} ({len(result.stdout.strip().splitlines())} files)')
+"


This script has two issues: first, tomllib is only available in Python 3.11+, so it will fail on older Python versions. We should add a fallback to pip._vendor.tomli or tomli. Second, splitting on [ and operators like >= is fragile and fails on complex PEP 508 specifiers (e.g., with environment markers or other operators). Using a simple regex to extract the package name is much more robust.

python -c " import pathlib, subprocess, sys, re try: import tomllib except ImportError: try: import pip._vendor.tomli as tomllib except ImportError: print('Error: tomllib or tomli required') sys.exit(1) with open('pyproject.toml', 'rb') as f: data = tomllib.load(f) deps = data.get('project', {}).get('dependencies', []) pkgs = [] for d in deps: match = re.match(r'^([a-zA-Z0-9_.-]+)', d) if match: pkgs.append(match.group(1).lower().replace('-', '_')) for pkg in pkgs: result = subprocess.run(['rg', '-l', pkg, '--include=*.py', '--glob=!tests/'], capture_output=True, text=True) if not result.stdout.strip(): print(f'UNUSED: {pkg}') else: print(f'used: {pkg} ({len(result.stdout.strip().splitlines())} files)') "

gemini-code-assist · 2026-06-26T20:39:33Z

+const deps = Object.keys({...(pkg.dependencies||{}), ...(pkg.devDependencies||{})});
+for (const dep of deps) {
+  try {
+    const out = execSync(\`rg -l '\"'\${dep}'\"\\|'\${dep}' ' src/ --include='*.{ts,tsx,js,jsx}'\`, {stdio:['pipe','pipe','pipe']}).toString();


The nested single-quotes inside the shell command can lead to syntax errors or unexpected behavior depending on the shell environment. We can simplify the pattern and use double quotes to make the command much cleaner and more robust.

Suggested change

const out = execSync(\`rg -l '\"'\${dep}'\"\\|'\${dep}' ' src/ --include='*.{ts,tsx,js,jsx}'\`, {stdio:['pipe','pipe','pipe']}).toString();

const out = execSync("rg -l \"['\\\"]" + dep + "['\\\"]\" src/ --include='*.{ts,tsx,js,jsx}'", {stdio:['pipe','pipe','pipe']}).toString();

gemini-code-assist · 2026-06-26T20:39:33Z

+python3 -c "
+import re, subprocess, pathlib
+
+cargo = pathlib.Path('Cargo.toml').read_text()
+deps = re.findall(r'^(\w[\w-]*)\s*=', cargo, re.MULTILINE)
+
+for dep in deps:
+    crate_name = dep.replace('-', '_')
+    result = subprocess.run(['rg', '-l', crate_name, 'src/'], capture_output=True, text=True)
+    if result.stdout.strip():
+        print(f'used:   {dep}')
+    else:
+        result2 = subprocess.run(['rg', '-l', dep, 'src/'], capture_output=True, text=True)
+        print(f'UNUSED: {dep}' if not result2.stdout.strip() else f'used:   {dep}')
+"
+```


Using re.findall(r'^(\w[\w-]*)\s*=') on Cargo.toml will match metadata keys at the start of lines (such as name, version, edition, publish), treating them as dependencies and incorrectly flagging them as unused. We should parse Cargo.toml properly using tomllib (with a fallback to tomli) to extract actual dependencies from the relevant sections.

python3 -c " import pathlib, subprocess, sys try: import tomllib except ImportError: try: import pip._vendor.tomli as tomllib except ImportError: print('Error: tomllib or tomli required') sys.exit(1) cargo_data = tomllib.loads(pathlib.Path('Cargo.toml').read_text()) deps = [] for section in ['dependencies', 'dev-dependencies', 'build-dependencies']: deps.extend(cargo_data.get(section, {}).keys()) for dep in sorted(set(deps)): crate_name = dep.replace('-', '_') result = subprocess.run(['rg', '-l', crate_name, 'src/'], capture_output=True, text=True) if result.stdout.strip(): print(f'used: {dep}') else: result2 = subprocess.run(['rg', '-l', dep, 'src/'], capture_output=True, text=True) print(f'UNUSED: {dep}' if not result2.stdout.strip() else f'used: {dep}') "

Copilot

Pull request overview

Adds a new process skill, dependency-pruning, intended to help audit and reduce dependency footprint across multiple ecosystems (Python, JS/TS, Go, Rust, etc.) by producing an evidence-backed report and optionally applying safe changes.

Changes:

Adds skills/process/dependency-pruning/SKILL.md defining a phased dependency-audit workflow (unused deps, vendoring candidates, tree-shaking/import-style issues, deprecation/sunset migrations).
Adds supporting reference material and eval scenarios under skills/process/dependency-pruning/references/ and skills/process/dependency-pruning/evals/.
Registers the new skill in skills/README.md and skills/process/README.md.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
skills/README.md	Adds `dependency-pruning` to the top-level skills index.
skills/process/README.md	Adds `dependency-pruning` to the process skills index.
skills/process/dependency-pruning/SKILL.md	Introduces the new dependency-pruning skill instructions and report format.
skills/process/dependency-pruning/references/unused-detection.md	Adds per-ecosystem command reference for detecting unused dependencies.
skills/process/dependency-pruning/evals/evals.json	Adds evaluation prompts/expectations for the new skill.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+**Django / Python projects**: deptry's DEP002 false-positive rate can be very
+high (sometimes 30+ flags for a single project) because PyPI package names
+rarely match their Python module names:
+- `djangorestframework` → `rest_framework`
+- `beautifulsoup4` → `bs4`
+- `pyyaml` → `yaml`
+- `pygithub` → `github`
+- `psycopg2-binary` → `psycopg2`
+
+When you see many DEP002 warnings on a Django project, verify each one manually
+rather than reporting them all as unused. After the audit, suggest adding a
+`[tool.deptry.package_module_name_map]` section to `pyproject.toml` so future
+runs are accurate.


+## Remove — Unused Dependencies
+| Package | Ecosystem | Evidence of non-use |
+| ddt     | Python    | No `import ddt` or `from ddt` in any test file |
+
+## Optimize Import Style (JS/TS)
+| Package | Current import | Issue | Fix |
+| lodash  | `import _ from 'lodash'` | Prevents tree-shaking; full ~72KB ships | Switch to `lodash-es` or per-function imports |
+
+## Vendor/Rewrite Candidates
+| Package | Used symbols | Package LOC | Replacement sketch |
+| waait   | default (1)  | 1 LOC       | `const wait = (ms=0) => new Promise(r => setTimeout(r, ms))` |
+
+## Migrate Away From
+| Package | Status | Migration target |
+| react-ga | GA3 sunset Jul 2023 | PostHog (already wired), or GA4 via gtag |
+
+## Dev-only Misclassifications
+| Package | Currently | Should be |
+| ipython | dependencies | dev dependencies |


+for pkg in pkgs:
+    result = subprocess.run(['rg', '-l', pkg, '--include=*.py', '--glob=!tests/'], capture_output=True, text=True)
+    if not result.stdout.strip():
+        print(f'UNUSED: {pkg}')
+    else:
+        print(f'used:   {pkg} ({len(result.stdout.strip().splitlines())} files)')


+node -e "
+const pkg = require('./package.json');
+const { execSync } = require('child_process');
+const deps = Object.keys({...(pkg.dependencies||{}), ...(pkg.devDependencies||{})});
+for (const dep of deps) {
+  try {
+    const out = execSync(\`rg -l '\"'\${dep}'\"\\|'\${dep}' ' src/ --include='*.{ts,tsx,js,jsx}'\`, {stdio:['pipe','pipe','pipe']}).toString();
+    console.log(out.trim() ? 'used: '+dep : 'UNUSED: '+dep);
+  } catch { console.log('UNUSED: '+dep); }
+}
+"


+Run in a temp directory to avoid mutating the real go.mod:
+
+```bash
+# Non-destructive: show what's unused
+cp go.mod /tmp/go.mod.bak && cp go.sum /tmp/go.sum.bak
+go mod tidy -v 2>&1 | grep "^removing"
+# Restore
+cp /tmp/go.mod.bak go.mod && cp /tmp/go.sum.bak go.sum
+```


gemini-code-assist Bot reviewed Jun 26, 2026

View reviewed changes

blarghmatey requested a review from Copilot June 26, 2026 21:03

Copilot started reviewing on behalf of blarghmatey June 26, 2026 21:03 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(skills): add dependency-pruning skill#21

feat(skills): add dependency-pruning skill#21
blarghmatey wants to merge 1 commit into
mainfrom
feat/dependency-pruning-skill

blarghmatey commented Jun 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 26, 2026

Uh oh!

gemini-code-assist Bot Jun 26, 2026

Uh oh!

gemini-code-assist Bot Jun 26, 2026

Uh oh!

gemini-code-assist Bot Jun 26, 2026

Uh oh!

gemini-code-assist Bot Jun 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		rg "from ${PKG}(\.\w+)? import (\w+)" --no-filename -o --include="*.py" \
		\| grep -oP 'import \K\w+' \| sort -u

	rg "from ${PKG}(\.\w+)? import (\w+)" --no-filename -o --include="*.py" \
	\| grep -oP 'import \K\w+' \| sort -u
	rg "from ${PKG}(?:\.\w+)? import (\w+)" -g "*.py" -o -r '$1' --no-filename \| sort -u

	const out = execSync(\`rg -l '\"'\${dep}'\"\\\|'\${dep}' ' src/ --include='*.{ts,tsx,js,jsx}'\`, {stdio:['pipe','pipe','pipe']}).toString();
	const out = execSync("rg -l \"['\\\"]" + dep + "['\\\"]\" src/ --include='*.{ts,tsx,js,jsx}'", {stdio:['pipe','pipe','pipe']}).toString();

Uh oh!

Conversation

blarghmatey commented Jun 26, 2026

What are the relevant tickets?

Description (What does it do?)

Evaluation

How can this be tested?

Additional Context

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants