Incremental Sync Architecture
Motivation
WP Packages currently runs a full pipeline on every sync cycle: discover packages → fetch updates → build ~140k files to disk → deploy via symlink swap → upload to R2. This worked when Composer v1 required a complete provider tree, but since dropping v1 support, the build directory is vestigial overhead. Every run rewrites all files regardless of whether anything changed, and the R2 sync walks the entire build directory doing byte comparisons — O(total packages) instead of O(changed packages).
Goal
Replace the build-directory pipeline with a DB-driven architecture where SQLite is the single source of truth. Packages get a content_hash (what the data looks like) and a deployed_hash (what's live on R2). Finding what needs uploading becomes a single query: WHERE content_hash != deployed_hash. No intermediate files, no filesystem walking, no manifest.
How It Works
Three-step pipeline: Discover → Update → Sync
- Discover checks what packages exist and which ones changed (via SVN revision log). Cheap — no API calls.
- Update fetches full metadata from wp.org only for changed packages, normalizes versions, and computes
content_hash. If the hash changed, the package is marked dirty.
- Sync queries for dirty packages, serializes their Composer JSON, uploads to R2 in parallel, then stamps
deployed_hash. Crash-safe — if interrupted, the next run picks up where it left off.
DB-backed serving for local dev: the HTTP server serializes Composer metadata directly from SQLite on each request, eliminating the build step entirely for development.
Conditional packages.json upload: the root Composer config is effectively static, so it's uploaded with If-None-Match — a no-op on most runs.
Phases
- Schema + Content Hash — Add
content_hash, deployed_hash, and content_changed_at columns. Extract serialization logic into a pure composer package. Compute hashes at update time.
- DB-Backed Serve Layer — Serve
/p2/{type}/{name}.json and /packages.json directly from SQLite. Remove the dev command in favor of Makefile-composed CLI commands.
- R2 Sync — The main cut-over. Replace filesystem-based build + deploy with DB-driven sync. Combine
builds and sync_runs tables into a single pipeline_runs table. Delete ~1,200 lines of build/deploy/filesystem code.
- Test Infrastructure — Update existing integration tests (mock wp.org server and gofakes3 already built) for the new architecture. Add a full round-trip test: seed DB → sync to fake S3 → resolve with Composer.
- Metadata Changes Feed — Packagist-compatible
/metadata/changes.json endpoint powered by the content_changed_at column, enabling third-party mirrors to poll for updates efficiently.
Phases are sequential — each builds on the previous — but Phase 2 can coexist with the old pipeline (the serve layer reads from DB while the old pipeline still runs), making the transition incremental.
Incremental Sync Architecture
Motivation
WP Packages currently runs a full pipeline on every sync cycle: discover packages → fetch updates → build ~140k files to disk → deploy via symlink swap → upload to R2. This worked when Composer v1 required a complete provider tree, but since dropping v1 support, the build directory is vestigial overhead. Every run rewrites all files regardless of whether anything changed, and the R2 sync walks the entire build directory doing byte comparisons — O(total packages) instead of O(changed packages).
Goal
Replace the build-directory pipeline with a DB-driven architecture where SQLite is the single source of truth. Packages get a
content_hash(what the data looks like) and adeployed_hash(what's live on R2). Finding what needs uploading becomes a single query:WHERE content_hash != deployed_hash. No intermediate files, no filesystem walking, no manifest.How It Works
Three-step pipeline: Discover → Update → Sync
content_hash. If the hash changed, the package is marked dirty.deployed_hash. Crash-safe — if interrupted, the next run picks up where it left off.DB-backed serving for local dev: the HTTP server serializes Composer metadata directly from SQLite on each request, eliminating the build step entirely for development.
Conditional
packages.jsonupload: the root Composer config is effectively static, so it's uploaded withIf-None-Match— a no-op on most runs.Phases
content_hash,deployed_hash, andcontent_changed_atcolumns. Extract serialization logic into a purecomposerpackage. Compute hashes at update time./p2/{type}/{name}.jsonand/packages.jsondirectly from SQLite. Remove thedevcommand in favor of Makefile-composed CLI commands.buildsandsync_runstables into a singlepipeline_runstable. Delete ~1,200 lines of build/deploy/filesystem code./metadata/changes.jsonendpoint powered by thecontent_changed_atcolumn, enabling third-party mirrors to poll for updates efficiently.Phases are sequential — each builds on the previous — but Phase 2 can coexist with the old pipeline (the serve layer reads from DB while the old pipeline still runs), making the transition incremental.