Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions .github/docker/Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -77,17 +77,26 @@ RUN npx playwright install-deps chromium
# render in DejaVu Sans. playwright install-deps happens to pull this in today,
# but the dep is implicit and could change — install explicitly so upgrades
# can't silently regress rendering.
#
# Xvfb is also installed here so the browse --headed integration tests
# (headed-xvfb, headed-orphan-cleanup) can exercise the Linux container
# auto-spawn path on every CI run. Without Xvfb in the image, the most
# common production --headed path goes untested.
RUN for i in 1 2 3; do \
apt-get update && apt-get install -y --no-install-recommends fonts-liberation fontconfig && break || \
apt-get update && apt-get install -y --no-install-recommends fonts-liberation fontconfig xvfb x11-utils && break || \
(echo "fonts-liberation install retry $i/3"; sleep 10); \
done \
&& fc-cache -f \
&& rm -rf /var/lib/apt/lists/*

# Pre-install dependencies (cached layer — only rebuilds when package.json changes)
COPY package.json /workspace/
# Pre-install dependencies (cached layer — only rebuilds when package.json or
# bun.lock changes). Copy BOTH so install is deterministic and matches local
# resolution. Without bun.lock here, bun install resolved transitive deps
# differently in CI vs local (observed on v1.28.0.0: socks landed but
# smart-buffer + ip-address didn't make it into the cached node_modules).
COPY package.json bun.lock /workspace/
WORKDIR /workspace
RUN bun install && rm -rf /tmp/*
RUN bun install --frozen-lockfile && rm -rf /tmp/*

# Install Playwright Chromium to a shared location accessible by all users
ENV PLAYWRIGHT_BROWSERS_PATH=/opt/playwright-browsers
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/ci-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ on:
paths:
- '.github/docker/Dockerfile.ci'
- 'package.json'
- 'bun.lock'
# Manual trigger
workflow_dispatch:

Expand All @@ -22,7 +23,7 @@ jobs:
- uses: actions/checkout@v4

# Copy lockfile + package.json into Docker build context
- run: cp package.json .github/docker/
- run: cp package.json bun.lock .github/docker/

- uses: docker/login-action@v3
with:
Expand Down
10 changes: 7 additions & 3 deletions .github/workflows/evals-periodic.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
- uses: actions/checkout@v4

- id: meta
run: echo "tag=${{ env.IMAGE }}:${{ hashFiles('.github/docker/Dockerfile.ci', 'package.json') }}" >> "$GITHUB_OUTPUT"
run: echo "tag=${{ env.IMAGE }}:${{ hashFiles('.github/docker/Dockerfile.ci', 'package.json', 'bun.lock') }}" >> "$GITHUB_OUTPUT"

- uses: docker/login-action@v3
with:
Expand All @@ -43,7 +43,7 @@ jobs:
fi

- if: steps.check.outputs.exists == 'false'
run: cp package.json .github/docker/
run: cp package.json bun.lock .github/docker/

- if: steps.check.outputs.exists == 'false'
uses: docker/build-push-action@v6
Expand Down Expand Up @@ -101,10 +101,14 @@ jobs:
echo "TMPDIR=/home/runner/.cache"
} >> "$GITHUB_ENV"

# Recursive copy (cp -r) instead of symlink: bun build resolves a
# file's realpath when looking for sibling deps. See evals.yml for the
# full explanation. cp -al would be faster but /opt and /workspace
# are on different overlay-fs layers, so cross-device hardlink fails.
- name: Restore deps
run: |
if [ -d /opt/node_modules_cache ] && diff -q /opt/node_modules_cache/.package.json package.json >/dev/null 2>&1; then
ln -s /opt/node_modules_cache node_modules
cp -r /opt/node_modules_cache node_modules
else
bun install
fi
Expand Down
16 changes: 12 additions & 4 deletions .github/workflows/evals.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
- uses: actions/checkout@v4

- id: meta
run: echo "tag=${{ env.IMAGE }}:${{ hashFiles('.github/docker/Dockerfile.ci', 'package.json') }}" >> "$GITHUB_OUTPUT"
run: echo "tag=${{ env.IMAGE }}:${{ hashFiles('.github/docker/Dockerfile.ci', 'package.json', 'bun.lock') }}" >> "$GITHUB_OUTPUT"

- uses: docker/login-action@v3
with:
Expand All @@ -43,7 +43,7 @@ jobs:
fi

- if: steps.check.outputs.exists == 'false'
run: cp package.json .github/docker/
run: cp package.json bun.lock .github/docker/

- if: steps.check.outputs.exists == 'false'
uses: docker/build-push-action@v6
Expand Down Expand Up @@ -110,11 +110,19 @@ jobs:
echo "TMPDIR=/home/runner/.cache"
} >> "$GITHUB_ENV"

# Restore pre-installed node_modules from Docker image via symlink (~0s vs ~15s install)
# Restore pre-installed node_modules from Docker image via recursive
# copy. Symlink (`ln -s`) breaks bun's module resolution because bun
# resolves a file's realpath when walking up to find node_modules/<dep>;
# from a symlinked path, realpath escapes the workspace and sibling
# deps no longer resolve. Hardlink copy (`cp -al`) fails because /opt
# and /workspace are on different overlay-fs layers ("Invalid
# cross-device link"). Recursive copy works on every layout. Cost:
# ~5s for ~200 packages of small JS files vs ~0s for symlink — still
# vastly cheaper than rerunning `bun install` (network + resolution).
- name: Restore deps
run: |
if [ -d /opt/node_modules_cache ] && diff -q /opt/node_modules_cache/.package.json package.json >/dev/null 2>&1; then
ln -s /opt/node_modules_cache node_modules
cp -r /opt/node_modules_cache node_modules
else
bun install
fi
Expand Down
64 changes: 63 additions & 1 deletion BROWSER.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ $B connect # headed Chromium + Side Panel extension
5. [Snapshot system + ref-based selection](#snapshot-system)
6. [Browser-skills runtime](#browser-skills-runtime)
7. [Domain-skills (per-site agent notes)](#domain-skills)
8. [Real-browser mode (`$B connect`)](#real-browser-mode)
8. [Real-browser mode (`$B connect`)](#real-browser-mode) — including [`--headed` + `--proxy` + `--navigate` (v1.28.0.0)](#headed-mode--proxy--browser-native-downloads-v12800)
9. [Side Panel + sidebar agent](#side-panel--sidebar-agent)
10. [Pair-agent — remote agents over an ngrok tunnel](#pair-agent)
11. [Authentication + tokens](#authentication)
Expand Down Expand Up @@ -545,6 +545,63 @@ When in real-browser mode, `/qa` and `/design-review` automatically skip
cookie import prompts and headless workarounds — the headed browser already
has whatever session you logged into.

### Headed mode + proxy + browser-native downloads (v1.28.0.0)

Three coordinated flags for sites that block headless browsers, fingerprint
Playwright defaults, or sit behind authenticated upstream proxies:

```bash
# Visible Chromium. Auto-spawns Xvfb on Linux containers without DISPLAY.
$B --headed goto https://example.com

# SOCKS5 with auth — Chromium can't prompt for SOCKS5 creds, so $B runs a
# local 127.0.0.1 bridge that handles the auth handshake.
$B --proxy socks5://user:pass@residential.proxy.host:1080 goto https://example.com

# HTTP/HTTPS proxy passes through to Chromium directly.
$B --proxy http://corp-proxy:3128 goto https://example.com

# Browser-native download for Content-Disposition, redirect chains, anti-bot
# CDNs where page.request.fetch() falls over.
$B download "https://protected.example.com/file" /tmp/file.bin --navigate

# Combined.
$B --headed --proxy socks5://user:pass@host:1080 \
download "https://protected.example.com/file" /tmp/file.bin --navigate
```

**Credential policy.** Pass creds via the URL (`socks5://user:pass@host`) OR
the env vars `BROWSE_PROXY_USER` / `BROWSE_PROXY_PASS` — never both. `$B`
refuses with a clear hint when both are set; silent override created
"works on my machine" debugging traps.

**Daemon discipline.** `--proxy` and `--headed` are daemon-startup config.
A running daemon with config A meeting a new invocation with config B exits
1 with a `browse disconnect` hint instead of silently restarting and dropping
tab state, cookies, or sessions.

**Stealth scope.** When `--headed` or `--proxy` are set, `$B` masks
`navigator.webdriver` only — via Chromium's
`--disable-blink-features=AutomationControlled` plus a small init script.
We do NOT fake `navigator.plugins`, `navigator.languages`, or `window.chrome`
— modern fingerprinters check those for consistency, and synthesizing fixed
values can flag MORE bot-like, not less. ChromeDriver's `cdc_` runtime
artifacts and the Permissions API patch are still cleaned up.

**Container support.** `--headed` on Linux without `DISPLAY` walks the
display range (`:99`, `:100`, ...) until `xdpyinfo` reports a free slot,
then spawns Xvfb. Cleanup-on-disconnect validates the recorded PID's
`/proc/<pid>/cmdline` matches `Xvfb` AND start-time matches before sending
any signal — no PID-reuse footguns. Skips spawn entirely when
`WAYLAND_DISPLAY` is set (Chromium uses Wayland natively). Standard
Debian/Ubuntu containers work out of the box; minimal images (alpine,
distroless) may need fonts/dbus/gtk libs for headed Chromium to render.

**Failure modes.** SOCKS5 upstream rejected or unreachable — fail-fast at
startup with a redacted error after 3 retries (5s budget). Mid-stream
upstream drop — bridge kills the affected client connection only; no
transport retries that could corrupt browser traffic.

---

## Side Panel + sidebar agent
Expand Down Expand Up @@ -1117,6 +1174,11 @@ browse/
│ ├── cli.ts # Thin client — reads state, sends HTTP, prints
│ ├── server.ts # Bun HTTP daemon — routes commands, dual-listener
│ ├── browser-manager.ts # Chromium lifecycle, tabs, ref map, crash detection
│ ├── socks-bridge.ts # Local 127.0.0.1 SOCKS5 bridge that handles auth handshakes Chromium can't speak
│ ├── proxy-config.ts # --proxy URL parsing + cred resolution (URL vs env, fail-fast on both)
│ ├── proxy-redact.ts # Cred-redaction helper for any proxy URL surfaced to logs/errors
│ ├── xvfb.ts # Xvfb auto-spawn + orphan cleanup with PID + start-time validation
│ ├── stealth.ts # navigator.webdriver mask + cdc_ cleanup + Permissions API patch
│ ├── browse-client.ts # Canonical SDK — what skills import as _lib/browse-client.ts
│ ├── snapshot.ts # AX tree → @e/@c refs → Locator map; -D/-a/-C handling
│ ├── read-commands.ts # Non-mutating: text, html, links, js, css, is, dialog, ...
Expand Down
116 changes: 116 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,121 @@
# Changelog

## [1.28.0.0] - 2026-05-07

## **Browse handles real-world automation now: SOCKS5 with auth, container Xvfb, browser-native downloads. Plus a single-file `llms.txt` index agents can crawl in one read.**

Five capabilities ship in one PR. Browse picks up `--proxy` (with an
embedded SOCKS5 bridge so Chromium can speak to authenticated
upstreams it can't speak to natively), `--headed` (auto-spawns Xvfb
on Linux containers without DISPLAY), and `download --navigate` (uses
the browser's native download handler for Content-Disposition,
multi-hop CDN redirects, and anti-bot CDN chains where
`page.request.fetch()` falls over). Stealth is narrowed to
`navigator.webdriver` masking only — modern fingerprinters punish
inconsistent fakes, so faking plugins/languages was making
detection easier, not harder. And `gstack/llms.txt` is now
auto-generated from the same source as every SKILL.md, so any agent
that reads `llms.txt` boots into the full surface (47 skills, 75
browse commands) in one fetch.

### The numbers that matter

End-to-end verified via `bun test browse/test/{socks-bridge,proxy-config,proxy-redact,xvfb,stealth-webdriver,bridge-chromium-e2e}.test.ts test/llms-txt-shape.test.ts`:

| Surface | Before | After | Δ |
|---|---|---|---|
| `browse --proxy` (SOCKS5 with auth) | not supported | works end-to-end | new capability |
| `browse --headed` on Linux without DISPLAY | not supported | auto-Xvfb on first free display | new capability |
| `download --navigate` (browser-native) | only `page.request.fetch()` | added native download path | new capability |
| `gstack/llms.txt` index for agents | none | 47 skills + 75 commands in 11KB | new capability |
| Bridge PID validation defenses | n/a | both `/proc/<pid>/cmdline` AND start-time | full safety |
| Tests covering proxy + headed + navigate | 0 | 70+ tests across 7 files | from zero to comprehensive |

The `bridge-chromium-e2e.test.ts` is the one that proves the feature
actually works: real Chromium launches with `proxy.server =
socks5://127.0.0.1:<bridgePort>`, navigates to a local HTTP fixture,
and we assert the auth upstream's connect counter and the HTTP
fixture's hit counter both increment. Without that test we could
ship a working byte-relay and a broken Chromium integration and never
notice.

### What this means for AI agents

Any agent on any project can now hit any site. DDoS-Guard'd CDN
behind an auth-required residential SOCKS5 → `browse --proxy
socks5://user:pass@host:1080 --headed download <url> /tmp/file
--navigate` and the file lands. Linux container without DISPLAY →
`--headed` auto-spawns Xvfb, no manual setup. The `llms.txt` index
makes discovery a one-fetch operation: agents stop scanning 47
SKILL.md files and start with the right skill on the first try.

### Itemized changes

#### Added
- `browse --proxy <url>` flag. Supports SOCKS5 with username/password
auth, HTTP, and HTTPS. SOCKS5+auth runs through an embedded local
bridge (`browse/src/socks-bridge.ts`, ~250 LOC) bound to 127.0.0.1
on an ephemeral port. The bridge handles the SOCKS5 auth handshake
so Chromium (which can't prompt for SOCKS5 creds) can still use
authenticated upstreams.
- Pre-flight `testUpstream()` runs before Chromium launches: 5s total
budget, 3 retries with 500ms backoff (handles VPN warm-up race).
On failure, exits 1 with a redacted error message — no confusing
"connection refused" on first navigation.
- `browse --headed` flag with auto-Xvfb on Linux. Walks the display
range (`:99`, `:100`, ...) until `xdpyinfo` says free; never
hardcodes `:99` and never unlinks `/tmp/.X<n>-lock` for displays
it didn't create. Xvfb child PID + start-time + display recorded
in `~/.gstack/browse.json` so cleanup-on-disconnect can validate
ownership before signaling. Skips spawn when `WAYLAND_DISPLAY` is
set (Chromium uses Wayland natively).
- `download --navigate` flag (community PR #1355, attribution preserved).
Uses `page.waitForEvent('download')` and `page.goto(url, {
waitUntil: 'commit' })` instead of `page.request.fetch()`.
Required for sites where the download is triggered by browser
navigation (Content-Disposition headers, redirect chains, anti-bot
CDNs).
- `gstack/llms.txt` auto-generated from skill frontmatter and the
browse `COMMAND_DESCRIPTIONS` registry. Regenerates on every
`bun run gen:skill-docs`. Strict mode (used in tests) refuses any
skill missing `name` or `description` in its frontmatter.

#### Changed
- Stealth narrowed to `navigator.webdriver` masking only. The
pre-existing `launchHeaded` patches that faked `navigator.plugins`
and `navigator.languages` were removed because modern
fingerprinters check those for consistency with `userAgent`/
`platform`, and synthesized fixed values can flag MORE bot-like,
not less. The cdc_/__webdriver runtime cleanup and Permissions API
patch are kept — those remove ChromeDriver-injected artifacts
rather than synthesize natural-browser values.
- Browse daemon refuses to silently restart on `--proxy`/`--headed`
flag mismatch. Existing daemon with config A + new invocation with
config B → exits 1 with a `browse disconnect` hint. No silent
state loss.
- Cred policy: passing creds in BOTH the URL and `BROWSE_PROXY_USER`/
`BROWSE_PROXY_PASS` env vars now fails fast with a clear error.
Silent override was a debugging trap.

#### Fixed
- N/A — all-new code paths.

#### For contributors
- New module boundary: `browse/src/socks-bridge.ts`,
`browse/src/proxy-config.ts`, `browse/src/proxy-redact.ts`,
`browse/src/xvfb.ts`, `browse/src/stealth.ts`. Each is small,
testable in isolation, and has matching `*.test.ts` coverage.
- 70+ new tests across 7 files. The `bridge-chromium-e2e.test.ts`
test launches real Chromium through the bridge and asserts the
request actually traversed it (upstream connect counter + HTTP
fixture hit counter both increment).
- `socks` npm dependency added (~30KB).
- Xvfb + x11-utils added to `.github/docker/Dockerfile.ci` so
`headed-xvfb`/`headed-orphan-cleanup` exercise the Linux container
path on every CI run instead of only manual smoke tests.
- Community PR #1355 from @garrytan-agents merged; attribution
preserved on the merging commit.

## [1.27.1.0] - 2026-05-06

## **Plan-mode reviews now refuse to dump findings without asking. Four gate-tier tests catch the regression on every PR.**
Expand Down
2 changes: 1 addition & 1 deletion SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -862,7 +862,7 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`.
| Command | Description |
|---------|-------------|
| `archive [path]` | Save complete page as MHTML via CDP |
| `download <url|@ref> [path] [--base64]` | Download URL or media element to disk using browser cookies |
| `download <url|@ref> [path] [--base64] [--navigate]` | Download URL or media element to disk using browser cookies. Use --navigate for URLs that trigger browser downloads (CDN redirects, Content-Disposition, anti-bot protected sites) |
| `scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]` | Bulk download all media from page. Writes manifest.json |

### Interaction
Expand Down
2 changes: 1 addition & 1 deletion TODOS.md
Original file line number Diff line number Diff line change
Expand Up @@ -1562,7 +1562,7 @@ Shipped in v0.6.5. TemplateContext in gen-skill-docs.ts bakes skill name into pr

**What:** Write a postinstall script that patches Playwright's CDP layer to suppress `Runtime.enable` and use `addBinding` for context ID discovery, same approach as rebrowser-patches. Eliminates the `navigator.webdriver`, `cdc_` markers, and other CDP artifacts that sites like Google use to detect automation.

**Why:** Our current stealth patches (UA override, navigator.webdriver=false, fake plugins) work on most sites but Google still triggers captchas. The real detection is at the CDP protocol level. rebrowser-patches proved the approach works but their patches target Playwright 1.52.0 and don't apply to our 1.58.2. We need our own patcher using string matching instead of line-number diffs. 6 files, ~200 lines of patches total.
**Why:** Our current stealth narrows to `navigator.webdriver` masking + ChromeDriver `cdc_` runtime cleanup + Permissions API patch (v1.28.0.0 narrowed it from also faking plugins/languages, since modern fingerprinters punish inconsistent fakes more than they punish admitted defaults). That's enough for most sites but Google still triggers captchas, because the real detection is at the CDP protocol level. rebrowser-patches proved the approach works but their patches target Playwright 1.52.0 and don't apply to our 1.58.2. We need our own patcher using string matching instead of line-number diffs. 6 files, ~200 lines of patches total.

**Context:** Full analysis of rebrowser-patches source: patches 6 files in `playwright-core/lib/server/` (crConnection.js, crDevTools.js, crPage.js, crServiceWorker.js, frames.js, page.js). Key technique: suppress `Runtime.enable` (the main CDP detection vector), use `Runtime.addBinding` + `CustomEvent` trick to discover execution context IDs without it. Our extension communicates via Chrome extension APIs, not CDP Runtime, so it should be unaffected. Write E2E tests that verify: (1) extension still loads and connects, (2) Google.com loads without captcha, (3) sidebar chat still works.

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.27.1.0
1.28.0.0
Loading
Loading