Skip to content

feat(core): clip-model hf- ids minted at parse, emitted as data-hf-id (R1)#1270

Open
vanceingalls wants to merge 4 commits into
mainfrom
06-07-feat_core_clip-model_hf-_ids_minted_at_parse_emitted_as_data-hf-id_r1_
Open

feat(core): clip-model hf- ids minted at parse, emitted as data-hf-id (R1)#1270
vanceingalls wants to merge 4 commits into
mainfrom
06-07-feat_core_clip-model_hf-_ids_minted_at_parse_emitted_as_data-hf-id_r1_

Conversation

@vanceingalls

@vanceingalls vanceingalls commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

What

Wires ensureHfIds (PR #1269) into parseHtml so every parsed composition gets stable data-hf-id attributes before the element model is built.

Clip model id chain:

const id = el.getAttribute("data-hf-id") || el.id || `element-${++idCounter}`;

data-hf-id is now the primary identity; existing HTML id attrs and the counter fallback are preserved for backwards compatibility.

Bridge for legacy callers: updateElementInHtml and removeElementFromHtml in htmlParser.ts now try [data-hf-id="${id}"] when getElementById fails, so string-based callers that pass a hf-xxxx value continue to work without knowing about the new field.

Why

Without this PR, ensureHfIds is a no-op — the helper exists but nothing calls it. This is the commit that makes T2 tests turn green and makes every parseHtml call produce stable hf- ids. Depends on PR #1269.

How

// before DOMParser.parseFromString:
const withIds = ensureHfIds(html);
const doc = parser.parseFromString(withIds, "text/html");

htmlParser.ts baseline snapshots updated to reflect the hf- id format (see 168c0452c).

Test plan

  • T2 spec: 3 previously-.fails tests now pass — elements get hf- prefixed ids, format matches /^hf-[a-z0-9]{4}$/, adding a sibling doesn't change existing ids
  • All pre-existing htmlParser.test.ts baselines updated and passing
  • Existing stableIds.test.ts baselines (round-trip, uniqueness, existing-id preservation) still green

@vanceingalls vanceingalls changed the base branch from 06-07-feat_core_ensurehfids_node-id_pass_shared_hf-_mint_helper_r1_ to graphite-base/1270 June 8, 2026 17:24
@vanceingalls vanceingalls marked this pull request as draft June 8, 2026 18:09
@vanceingalls vanceingalls marked this pull request as ready for review June 8, 2026 21:44
miguel-heygen
miguel-heygen previously approved these changes Jun 8, 2026

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimal and correct wire-up. The three-level fallback data-hf-id → el.id → counter is the right priority order, and emitting data-hf-id="${element.id}" from generateElementHtml is the correct write-back strategy that pins ids across round-trips.

One concern:

P2 — linkedom in the browser bundle:
hfIds.ts imports from linkedom, which htmlParser.ts now depends on. If htmlParser.ts runs in browser context (it uses DOMParser), linkedom gets bundled into the browser payload (~500KB minified). Is this module tree-shaken or only used in the Node.js/worker path? If it's in the browser bundle, it's a non-trivial addition for a parsing helper. Worth confirming the bundle impact is acceptable before this merges, or considering whether ensureHfIds can be called upstream (in the server-side render pipeline) instead.

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny wiring PR — 12/8 across 3 files. generateElementHtml now emits data-hf-id alongside id, and parseHtml calls ensureHfIds then reads data-hf-id || el.id || \element-${++idCounter}`for the clip id. Test updates flip hardcoded id literals to/^hf-[a-z0-9]{4}$/` regex matches.

Concern

[Legacy clip-model id-roundtrip produces non-conformant data-hf-id values] When a clip-model with id="my-title" (user-set, no data-hf-id) goes through parse → emit:

  1. parseHtml: reads data-hf-id || el.id → falls through to el.id = "my-title". So element.id = "my-title".
  2. generateElementHtml: emits id="my-title" AND data-hf-id="my-title". Both share the same value.
  3. Re-parse: ensureHfIds sees data-hf-id="my-title", pins it. parseHtml reads "my-title" again. Stable.

The targeting code in #1271/#1272 doesn't care about the value pattern — it uses [data-hf-id="..."] exact-match — so functionally this works. But the spec test it("[spec] generated hf- ids match /^hf-[a-z0-9]{4}$/") only fires for elements WITHOUT a pre-existing id. So a clip-model authored before R1 lands carries non-conformant data-hf-id values forever (or until manually re-minted).

Two options worth considering:

  1. Force-mint a hf- prefix in generateElementHtml when element.id doesn't match /^hf-[a-z0-9]{4}$/ — the element.id stays whatever it was, but data-hf-id gets a freshly-minted hf-prefixed value. This converges all clip-models to hf-spec on next save.
  2. Document the migration as intentional — call out in the PR body / a code comment that legacy clip-models will carry their user-set ids in data-hf-id until the round-3 write-back lands. Then it's not a surprise.

If the latter is the plan (which is consistent with "not wired in yet"), worth a one-liner so future archaeologists don't think it's a bug.

Nits

[generateElementHtml change is one line — should the constant be derived?] \data-hf-id="${element.id}"`duplicates theidvalue verbatim. If element.id ever drifts away from the hf-id contract (e.g. a future emitter path sets element.id to something legacy-shaped), the data-hf-id will silently follow. A`data-hf-id="${getStableHfId(element)}"`indirection — wheregetStableHfId` validates or re-mints — would isolate the contract.

Not blocking; the current shape is fine for R1.

[Tests flipped from id-literal to regex-match are appropriately strict] toMatch(/^hf-[a-z0-9]{4}$/) matches the spec exactly. The one find-by-type change (result.elements.find((e) => e.type === "video")) instead of find-by-id is a clean adaptation — keeps the test's intent (verify the video clip is parsed) without coupling to the now-dynamic id.

What I didn't verify

  • That generateElementHtml is the only emission path for clip HTML. The briefing called out "DOM render, JSON snapshot, debug dump, any test fixtures" as paths that might also serialize. A quick grep for data-start= or data-end= in other generators would close that, but the diff scope here only touches one emitter — if there's another path, that's a follow-up not a blocker on this PR.

Verdict

Clean wiring. Stacked on top of #1269 correctly. Main thing worth deciding is whether the legacy-id roundtrip (non-conformant data-hf-id values) is the intended migration path or whether generateElementHtml should force-mint a hf- prefix. Both are defensible — pick whichever you prefer; if it's the former, one-line comment closes the loop. Leaving as a comment.

Review by Rames D Jusso

vanceingalls added a commit that referenced this pull request Jun 8, 2026
…review)

Addresses Rames' review on #1270: clarifies that a pre-R1 clip authored with
id="my-title" round-trips as data-hf-id="my-title" (non-hf-shaped but stable,
exact-match) by design — targeting uses exact [data-hf-id="…"] match and does
not require the hf- shape; legacy values re-mint only at the R7 write-back. Not
a bug. Comment-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vanceingalls

Copy link
Copy Markdown
Collaborator Author

Addressed review (86562c2): documented the legacy-id round-trip — a pre-R1 id="my-title" round-trips as data-hf-id="my-title" by design (exact-match targeting doesn't require the hf- shape; re-mint happens at the R7 write-back). Re P2 (linkedom in browser bundle): confirmed it's not — studio/src doesn't import parseHtml/htmlParser; linkedom's usages are all server-side (studio-api routes, sourceMutation, compiler). The browser-safe property is for future SDK embedded mode, no current bundle impact.

@vanceingalls vanceingalls force-pushed the 06-07-feat_core_clip-model_hf-_ids_minted_at_parse_emitted_as_data-hf-id_r1_ branch from 86562c2 to beb8b9f Compare June 9, 2026 04:01
@vanceingalls vanceingalls changed the base branch from graphite-base/1270 to main June 9, 2026 04:01
@vanceingalls vanceingalls dismissed miguel-heygen’s stale review June 9, 2026 04:01

The base branch was changed.

@vanceingalls vanceingalls changed the base branch from main to graphite-base/1270 June 9, 2026 04:02
@vanceingalls vanceingalls force-pushed the 06-07-feat_core_clip-model_hf-_ids_minted_at_parse_emitted_as_data-hf-id_r1_ branch from beb8b9f to fde1c2d Compare June 9, 2026 04:03
@vanceingalls vanceingalls changed the base branch from graphite-base/1270 to 06-07-feat_core_ensurehfids_node-id_pass_shared_hf-_mint_helper_r1_ June 9, 2026 04:03

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building on Rames's review.

Rames flagged two things I want to sharpen:

Escalating: legacy id roundtrip is documented now, but the R7 write-back path must not re-mint over a pre-existing non-conformant data-hf-id. The comment in htmlParser.ts (L192-200) correctly explains that a clip with id="my-title" will get data-hf-id="my-title" and "will be re-minted once R7 write-back persists freshly-minted ids." But persistHfIdsIfNeeded in #1289 calls ensureHfIds, and ensureHfIds (per hfIds.ts:51) skips elements that already have data-hf-id set — so the legacy data-hf-id="my-title" will NOT be re-minted. The comment's promised migration will never fire. Either update the comment to say "legacy values persist forever unless the user hand-edits the source" or update persistHfIdsIfNeeded to force-mint hf-prefixed ids for non-conformant values. The current behavior is safe (exact-match targeting still works on "my-title"), but the comment is misleading.

Confirming: linkedom bundle concern. Rames flagged that hfIds.ts imports linkedom which now flows into htmlParser.ts. Worth a bundle analyzer check — this matters for any consumer that ships htmlParser into a browser context. If it's workers-only, fine. If it's browser-main, linkedom at ~500KB needs to be justified or replaced with a document.createTreeWalker path in-browser.

New: generateElementHtml emits data-hf-id="${element.id}" unconditionally, but element.id may already be an hf- id (round-trip) or a legacy id. For the round-trip case this is correct (same id pinned). For a freshly-created element from the Studio, element.id is whatever the Studio assigned — typically a counter like element-12. That gets emitted as data-hf-id="element-12", which then gets pinned by ensureHfIds and survives forever. The hf- prefix is never applied in this case — the Studio path never goes through parseHtml's ensureHfIds call before emitting. This is a second case where the expected mint never fires.

Overall the wiring is correct and the T2 tests turning green confirms the primary contract. But the migration story for non-hf-prefixed ids needs to be either fixed or explicitly documented as "not part of R1."

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 (delta since f391075 — one new commit fde1c2d docs(core): document legacy-id round-trip in clip-model readback (R1 review)).

Vance picked Option 2 from my round-1 ask — comment-as-documentation rather than force-mint. The added comment in htmlParser.ts:196-202 is the right shape: explicitly calls out the legacy round-trip path (id="my-title"data-hf-id="my-title"), spells out why it's safe (targeting uses exact-match, not shape-match), and pins when migration happens (R7 write-back). Future archaeologists won't read this as a bug.

My round-1 nit about getStableHfId(element) indirection in generateElementHtml isn't addressed — that was a stylistic call from me and your current shape is fine for R1. Leaving it.

Clean execution; leaving as a comment.

Review by Rames D Jusso

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

31d9074 fixes my migration comment concern — the comment now correctly says legacy values are NOT re-minted automatically. ✅

The Studio-created element path (emitting data-hf-id="${element.id}" where element.id may be element-12) is still in the diff but is a separate issue from the comment; behavior-identical to before. If that's out of scope for R1, fine — just confirming it's a known gap, not a surprise.

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 3 (delta since fde1c2d — one new commit 31d9074e updating the legacy-id comment).

Pulled the current file at 31d9074e head. The comment at htmlParser.ts:195-203 now reads:

"...This is safe indefinitely: targeting uses exact [data-hf-id="…"] match (it does not require the hf- prefix). ensureHfIds skips elements that already carry data-hf-id, so legacy values are NOT re-minted automatically — they persist until the user re-saves the composition through Studio. Not a bug."

That's Option (a) landed clean. The corrected comment now accurately describes the actual behavior across the R1/R7 boundary — ensureHfIds skip-pinned semantics are stated explicitly, the "re-mints only at R7 write-back" claim is removed, and the user-action-required reset path ("user re-saves through Studio") is named. This is what I should have caught in my round-2 ack — the comment is now what I'd want to see.

Owning my round-2 miss: I acked the prior comment without verifying its R7-behavior claim against ensureHfIds's skip-pinned condition, despite having reviewed that exact behavior in my own #1289 round-1. Saved the lesson to memory (feedback_verify_cross_pr_claims_against_other_pr_code.md) — applies to any future review that ack's a cross-PR claim. Won't repeat.

Ready from where I sit; leaving as a comment.

Review by Rames D Jusso

jrusso1020
jrusso1020 previously approved these changes Jun 9, 2026

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved on behalf of James.

@vanceingalls vanceingalls changed the base branch from 06-07-feat_core_ensurehfids_node-id_pass_shared_hf-_mint_helper_r1_ to graphite-base/1270 June 9, 2026 06:28
vanceingalls added a commit that referenced this pull request Jun 9, 2026
…1269)

## What

Adds `ensureHfIds(html: string): string` in `packages/core/src/parsers/hfIds.ts`. Single DOM pass (via linkedom) that mints a `data-hf-id` attribute on every eligible element before the caller sees the markup.

**Id derivation:** FNV-1a 32-bit hash of `tagName | sorted-attrs(\x00/\x01 separated) | ownText`, last 4 chars of base-36, `hf-` prefix. Collision resolution appends a sibling counter and re-hashes. Preserves existing ids (elements with `data-hf-id` already set are skipped). Excludes non-visual tags: `script`, `style`, `template`, `meta`, `link`, `noscript`, `base`.

**Fragment handling:** detects bare HTML fragments (no `<!doctype` / `<html`) and wraps in a full document shell before parsing, then returns `body.innerHTML` — matching the pattern used by `parseSourceDocument` in `sourceMutation`.

## Why

Counter-based ids (`element-0`, `element-1`, …) are positional. Inserting a new layer at position 0 shifts every id below it. The R1 milestone requires content-based, stable ids so that targeting operations (split, patch, probe) stay valid across re-parses and element insertions. T2 spec (`stableIds.test.ts`) defines the contract: same content → same id, adding a sibling doesn't change other ids, format matches `/^hf-[a-z0-9]{4}$/`.

## How

- `toHfId(hash)` — `slice(-4)` of `hash.toString(36)` for better distribution across the suffix space
- `data-hf-id` is excluded from the hash input (prevents circular dependency on the attribute being set)
- Already-assigned ids tracked in a `Set`; duplicates get a counter suffix before re-hashing
- Fragment detection: `/<!doctype|<html[\s>]/i.test(html)` — if bare, wrap→parse→`body.innerHTML`

## Test plan

- [x] T2 spec: `packages/core/src/parsers/stableIds.test.ts` — 7 tests pass (3 were `.fails` stubs targeting R1; 4 were pre-existing baselines that must not regress)
- [x] No changes to existing htmlParser tests needed at this layer (wiring is PR #1270)
vanceingalls and others added 2 commits June 9, 2026 06:28
…review)

Addresses Rames' review on #1270: clarifies that a pre-R1 clip authored with
id="my-title" round-trips as data-hf-id="my-title" (non-hf-shaped but stable,
exact-match) by design — targeting uses exact [data-hf-id="…"] match and does
not require the hf- shape; legacy values re-mint only at the R7 write-back. Not
a bug. Comment-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vanceingalls vanceingalls force-pushed the 06-07-feat_core_clip-model_hf-_ids_minted_at_parse_emitted_as_data-hf-id_r1_ branch from 31d9074 to abad19f Compare June 9, 2026 06:28
@graphite-app graphite-app Bot changed the base branch from graphite-base/1270 to main June 9, 2026 06:29
@graphite-app graphite-app Bot dismissed jrusso1020’s stale review June 9, 2026 06:29

The base branch was changed.

The original comment said legacy data-hf-id values "are re-minted only
once the R7 write-back persists freshly-minted ids to source" — which is
incorrect. ensureHfIds skips elements that already carry data-hf-id, so
legacy values (e.g. data-hf-id="my-title") persist indefinitely and are
NOT automatically re-minted. Exact-match targeting still works correctly.
Update comment to reflect actual behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vanceingalls vanceingalls force-pushed the 06-07-feat_core_clip-model_hf-_ids_minted_at_parse_emitted_as_data-hf-id_r1_ branch from abad19f to 9ddd145 Compare June 9, 2026 06:29
const withIds = ensureHfIds(html);
const parser = new DOMParser();
const doc = parser.parseFromString(html, "text/html");
const doc = parser.parseFromString(withIds, "text/html");
Pre-R1 tests expected clip ids to reflect legacy `id=` attributes.
After R1, ensureHfIds runs first and mints data-hf-id — so clip.id
reflects the minted hf- value unless the element already has data-hf-id.

Fix: add explicit data-hf-id to test HTML elements where tests assert
specific id values. Update no-id test to expect hf- format (/^hf-[a-z0-9]{4}$/)
instead of the pre-R1 generated-id fallback (/^element-\d+$/).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants