Skip to content

feat(skills): add dependency-pruning skill#21

Open
blarghmatey wants to merge 1 commit into
mainfrom
feat/dependency-pruning-skill
Open

feat(skills): add dependency-pruning skill#21
blarghmatey wants to merge 1 commit into
mainfrom
feat/dependency-pruning-skill

Conversation

@blarghmatey

Copy link
Copy Markdown
Member

What are the relevant tickets?

N/A

Description (What does it do?)

Adds a new dependency-pruning skill under skills/process/ that audits a repository's dependencies across Python, JS/TS, Go, Rust, and other ecosystems to surface four categories of action:

  • Remove: unused packages, confirmed via automated tools (deptry, depcheck, cargo machete, go mod tidy) plus manual grep to catch packages the tools miss
  • Optimize import style: JS/TS packages imported in a way that prevents tree-shaking (e.g. import _ from 'lodash' vs import { debounce } from 'lodash-es')
  • Vendor/rewrite candidates: packages where only ≤3 unique symbols are used and the package is small enough to inline (default thresholds: ≤3 symbols, ≤500 LOC — both configurable)
  • Migrate away from: deprecated, sunset, or abandoned packages with known migration targets (e.g. react-ga → GA4/PostHog after Universal Analytics sunset)

The skill includes explicit blind-spot guidance to avoid common false positives:

  • Django projects: deptry DEP002 false-positive rate can exceed 30 flags (PyPI name ≠ Python module name); INSTALLED_APPS packages are loaded via strings, not imports
  • Server runtime packages: gunicorn, uwsgi, granian, hypercorn are invoked via CLI in Dockerfile/K8s — check deployment configs AND git history for in-flight server migrations before flagging for removal
  • CLI-invoked dev tooling: ipdb, pdbpp, bpython, ptpython, debugpy etc. are terminal tools, not app imports — flag as "move to dev deps" rather than "remove"
  • Webpack/babel plugins: referenced in config files, not source imports

After reporting, the skill offers to execute safe changes (removals, import-style fixes, vendor stubs) and delegates PR creation to the create-ol-pull-request skill if available.

Evaluation

Evaluated over 2 iterations against ocw-studio (Django 5.2 + React/lodash). The skill achieves 93% assertion pass rate vs 60% for the no-skill baseline across three eval scenarios:

  1. Full-repo dependency audit
  2. Focused lodash vendoring/tree-shaking analysis
  3. Django-specific Python-only audit

Key iteration-2 improvements over iteration-1: added Phase 3b (import style / tree-shaking for JS/TS), Phase 4 (deprecated/sunset detection), Django INSTALLED_APPS blind spot, server runtime caveat, and developer tooling caveat.

How can this be tested?

Point the skill at any repo with Python or JS/TS dependencies:

/dependency-pruning
Audit the dependencies in ~/code/mit/apps/maintained/ocw-studio

Verify the report:

  • Covers both Python (pyproject.toml) and JS (package.json) ecosystems
  • Lists unused packages with concrete evidence (not just tool output)
  • Includes an "Optimize Import Style" section for lodash/ramda
  • Does NOT flag uwsgi or active server runtimes for removal without checking deployment configs
  • Does NOT flag ipdb/bpython for removal (marks as "move to dev deps")
  • Includes a "Migrate Away From" section for react-ga (GA3 sunset)

Additional Context

The skill is designed to be conservative — it requires evidence before flagging anything as removable, and asks the user to confirm before executing any changes. The "Optimize Import Style" category is the highest-ROI output for JS/TS-heavy repos: switching from import _ from 'lodash' to lodash-es typically saves 40–70 KB gzipped.

Audits a repository's dependencies across Python, JS/TS, Go, Rust,
and other ecosystems to surface four categories of action:

- Remove: unused packages (confirmed via tool output + manual grep)
- Optimize: JS/TS packages with import styles that block tree-shaking
- Vendor/rewrite: packages where only ≤3 symbols are used and the
  package is small enough to inline (configurable thresholds)
- Migrate: deprecated, sunset, or abandoned packages with known
  migration targets

Includes blind-spot guidance for Django projects (deptry false
positives, INSTALLED_APPS string loading), server runtime packages
(check Dockerfile + git history for in-flight migrations before
flagging for removal), and CLI-invoked developer tooling (ipdb,
bpython, pdbpp, etc. that static analysis always marks unused).

Evaluated over 2 iterations against ocw-studio; skill achieves 93%
assertion pass rate vs 60% for the no-skill baseline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new 'dependency-pruning' skill, including its documentation, evaluation prompts, and reference guides for unused dependency detection across various ecosystems such as Python, JS/TS, Go, and Rust. The reviewer's feedback provides valuable and highly actionable improvements to the fallback scripts and shell commands. Specifically, the feedback addresses portability issues with GNU grep, a performance bug when measuring single-file Python modules, compatibility fallbacks for tomllib on Python versions older than 3.11, shell quoting issues in Node.js, and a regex parsing bug in Cargo.toml that incorrectly treats metadata keys as dependencies.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +152 to +153
rg "from ${PKG}(\.\w+)? import (\w+)" --no-filename -o --include="*.py" \
| grep -oP 'import \K\w+' | sort -u

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using grep -oP relies on GNU grep, which is not pre-installed on macOS by default and can cause portability issues. We can achieve the same result directly using ripgrep's replace feature (-r), which is cleaner and more portable.

Suggested change
rg "from ${PKG}(\.\w+)? import (\w+)" --no-filename -o --include="*.py" \
| grep -oP 'import \K\w+' | sort -u
rg "from ${PKG}(?:\.\w+)? import (\w+)" -g "*.py" -o -r '$1' --no-filename | sort -u

Comment on lines +175 to +183
# Python
python -c "
import importlib.util, pathlib
spec = importlib.util.find_spec('${PKG}')
if spec and spec.origin:
root = pathlib.Path(spec.origin).parent
lines = sum(len(f.read_text(errors='ignore').splitlines()) for f in root.rglob('*.py'))
print(lines)
"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the package is a single-file module (e.g., six.py), pathlib.Path(spec.origin).parent will resolve to the site-packages directory. Running root.rglob('*.py') on site-packages will scan every single installed package in the environment, causing massive performance issues and incorrect line counts. We should check if the origin is a directory package (__init__.py) or a single-file module.

Suggested change
# Python
python -c "
import importlib.util, pathlib
spec = importlib.util.find_spec('${PKG}')
if spec and spec.origin:
root = pathlib.Path(spec.origin).parent
lines = sum(len(f.read_text(errors='ignore').splitlines()) for f in root.rglob('*.py'))
print(lines)
"
# Python
python -c "
import importlib.util, pathlib
spec = importlib.util.find_spec('${PKG}')
if spec and spec.origin:
origin = pathlib.Path(spec.origin)
if origin.name == '__init__.py':
root = origin.parent
lines = sum(len(f.read_text(errors='ignore').splitlines()) for f in root.rglob('*.py'))
else:
lines = len(origin.read_text(errors='ignore').splitlines())
print(lines)
"

Comment on lines +31 to +47
python -c "
import tomllib, pathlib, subprocess, sys

with open('pyproject.toml', 'rb') as f:
data = tomllib.load(f)

deps = data.get('project', {}).get('dependencies', [])
# Strip version specifiers
pkgs = [d.split('[')[0].split('>=')[0].split('==')[0].split('<')[0].strip().lower().replace('-','_') for d in deps]

for pkg in pkgs:
result = subprocess.run(['rg', '-l', pkg, '--include=*.py', '--glob=!tests/'], capture_output=True, text=True)
if not result.stdout.strip():
print(f'UNUSED: {pkg}')
else:
print(f'used: {pkg} ({len(result.stdout.strip().splitlines())} files)')
"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This script has two issues: first, tomllib is only available in Python 3.11+, so it will fail on older Python versions. We should add a fallback to pip._vendor.tomli or tomli. Second, splitting on [ and operators like >= is fragile and fails on complex PEP 508 specifiers (e.g., with environment markers or other operators). Using a simple regex to extract the package name is much more robust.

python -c "
import pathlib, subprocess, sys, re
try:
    import tomllib
except ImportError:
    try:
        import pip._vendor.tomli as tomllib
    except ImportError:
        print('Error: tomllib or tomli required')
        sys.exit(1)

with open('pyproject.toml', 'rb') as f:
    data = tomllib.load(f)

deps = data.get('project', {}).get('dependencies', [])
pkgs = []
for d in deps:
    match = re.match(r'^([a-zA-Z0-9_.-]+)', d)
    if match:
        pkgs.append(match.group(1).lower().replace('-', '_'))

for pkg in pkgs:
    result = subprocess.run(['rg', '-l', pkg, '--include=*.py', '--glob=!tests/'], capture_output=True, text=True)
    if not result.stdout.strip():
        print(f'UNUSED: {pkg}')
    else:
        print(f'used:   {pkg} ({len(result.stdout.strip().splitlines())} files)')
"

const deps = Object.keys({...(pkg.dependencies||{}), ...(pkg.devDependencies||{})});
for (const dep of deps) {
try {
const out = execSync(\`rg -l '\"'\${dep}'\"\\|'\${dep}' ' src/ --include='*.{ts,tsx,js,jsx}'\`, {stdio:['pipe','pipe','pipe']}).toString();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The nested single-quotes inside the shell command can lead to syntax errors or unexpected behavior depending on the shell environment. We can simplify the pattern and use double quotes to make the command much cleaner and more robust.

Suggested change
const out = execSync(\`rg -l '\"'\${dep}'\"\\|'\${dep}' ' src/ --include='*.{ts,tsx,js,jsx}'\`, {stdio:['pipe','pipe','pipe']}).toString();
const out = execSync("rg -l \"['\\\"]" + dep + "['\\\"]\" src/ --include='*.{ts,tsx,js,jsx}'", {stdio:['pipe','pipe','pipe']}).toString();

Comment on lines +161 to +176
python3 -c "
import re, subprocess, pathlib

cargo = pathlib.Path('Cargo.toml').read_text()
deps = re.findall(r'^(\w[\w-]*)\s*=', cargo, re.MULTILINE)

for dep in deps:
crate_name = dep.replace('-', '_')
result = subprocess.run(['rg', '-l', crate_name, 'src/'], capture_output=True, text=True)
if result.stdout.strip():
print(f'used: {dep}')
else:
result2 = subprocess.run(['rg', '-l', dep, 'src/'], capture_output=True, text=True)
print(f'UNUSED: {dep}' if not result2.stdout.strip() else f'used: {dep}')
"
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using re.findall(r'^(\w[\w-]*)\s*=') on Cargo.toml will match metadata keys at the start of lines (such as name, version, edition, publish), treating them as dependencies and incorrectly flagging them as unused. We should parse Cargo.toml properly using tomllib (with a fallback to tomli) to extract actual dependencies from the relevant sections.

python3 -c "
import pathlib, subprocess, sys
try:
    import tomllib
except ImportError:
    try:
        import pip._vendor.tomli as tomllib
    except ImportError:
        print('Error: tomllib or tomli required')
        sys.exit(1)

cargo_data = tomllib.loads(pathlib.Path('Cargo.toml').read_text())
deps = []
for section in ['dependencies', 'dev-dependencies', 'build-dependencies']:
    deps.extend(cargo_data.get(section, {}).keys())

for dep in sorted(set(deps)):
    crate_name = dep.replace('-', '_')
    result = subprocess.run(['rg', '-l', crate_name, 'src/'], capture_output=True, text=True)
    if result.stdout.strip():
        print(f'used:   {dep}')
    else:
        result2 = subprocess.run(['rg', '-l', dep, 'src/'], capture_output=True, text=True)
        print(f'UNUSED: {dep}' if not result2.stdout.strip() else f'used:   {dep}')
"

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new process skill, dependency-pruning, intended to help audit and reduce dependency footprint across multiple ecosystems (Python, JS/TS, Go, Rust, etc.) by producing an evidence-backed report and optionally applying safe changes.

Changes:

  • Adds skills/process/dependency-pruning/SKILL.md defining a phased dependency-audit workflow (unused deps, vendoring candidates, tree-shaking/import-style issues, deprecation/sunset migrations).
  • Adds supporting reference material and eval scenarios under skills/process/dependency-pruning/references/ and skills/process/dependency-pruning/evals/.
  • Registers the new skill in skills/README.md and skills/process/README.md.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
skills/README.md Adds dependency-pruning to the top-level skills index.
skills/process/README.md Adds dependency-pruning to the process skills index.
skills/process/dependency-pruning/SKILL.md Introduces the new dependency-pruning skill instructions and report format.
skills/process/dependency-pruning/references/unused-detection.md Adds per-ecosystem command reference for detecting unused dependencies.
skills/process/dependency-pruning/evals/evals.json Adds evaluation prompts/expectations for the new skill.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +90 to +102
**Django / Python projects**: deptry's DEP002 false-positive rate can be very
high (sometimes 30+ flags for a single project) because PyPI package names
rarely match their Python module names:
- `djangorestframework` → `rest_framework`
- `beautifulsoup4` → `bs4`
- `pyyaml` → `yaml`
- `pygithub` → `github`
- `psycopg2-binary` → `psycopg2`

When you see many DEP002 warnings on a Django project, verify each one manually
rather than reporting them all as unused. After the audit, suggest adding a
`[tool.deptry.package_module_name_map]` section to `pyproject.toml` so future
runs are accurate.
Comment on lines +254 to +272
## Remove — Unused Dependencies
| Package | Ecosystem | Evidence of non-use |
| ddt | Python | No `import ddt` or `from ddt` in any test file |

## Optimize Import Style (JS/TS)
| Package | Current import | Issue | Fix |
| lodash | `import _ from 'lodash'` | Prevents tree-shaking; full ~72KB ships | Switch to `lodash-es` or per-function imports |

## Vendor/Rewrite Candidates
| Package | Used symbols | Package LOC | Replacement sketch |
| waait | default (1) | 1 LOC | `const wait = (ms=0) => new Promise(r => setTimeout(r, ms))` |

## Migrate Away From
| Package | Status | Migration target |
| react-ga | GA3 sunset Jul 2023 | PostHog (already wired), or GA4 via gtag |

## Dev-only Misclassifications
| Package | Currently | Should be |
| ipython | dependencies | dev dependencies |
Comment on lines +41 to +46
for pkg in pkgs:
result = subprocess.run(['rg', '-l', pkg, '--include=*.py', '--glob=!tests/'], capture_output=True, text=True)
if not result.stdout.strip():
print(f'UNUSED: {pkg}')
else:
print(f'used: {pkg} ({len(result.stdout.strip().splitlines())} files)')
Comment on lines +83 to +93
node -e "
const pkg = require('./package.json');
const { execSync } = require('child_process');
const deps = Object.keys({...(pkg.dependencies||{}), ...(pkg.devDependencies||{})});
for (const dep of deps) {
try {
const out = execSync(\`rg -l '\"'\${dep}'\"\\|'\${dep}' ' src/ --include='*.{ts,tsx,js,jsx}'\`, {stdio:['pipe','pipe','pipe']}).toString();
console.log(out.trim() ? 'used: '+dep : 'UNUSED: '+dep);
} catch { console.log('UNUSED: '+dep); }
}
"
Comment on lines +102 to +110
Run in a temp directory to avoid mutating the real go.mod:

```bash
# Non-destructive: show what's unused
cp go.mod /tmp/go.mod.bak && cp go.sum /tmp/go.sum.bak
go mod tidy -v 2>&1 | grep "^removing"
# Restore
cp /tmp/go.mod.bak go.mod && cp /tmp/go.sum.bak go.sum
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants