Skip to content

FE-769: Add String token dimension type (string interning)#8956

Open
kube wants to merge 2 commits into
cf/fe-1121-add-uuid-discrete-type-to-petrinautfrom
cf/fe-769-add-string-discrete-type-support
Open

FE-769: Add String token dimension type (string interning)#8956
kube wants to merge 2 commits into
cf/fe-1121-add-uuid-discrete-type-to-petrinautfrom
cf/fe-769-add-string-discrete-type-support

Conversation

@kube

@kube kube commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

🌟 What is the purpose of this PR?

Adds string as a fifth colour element type. Strings are variable-length, so they cannot live in the fixed-stride packed token structs — instead each frame stores a 64-bit reference into an append-only per-run string intern pool that is owned by the simulation, not the frame. A full design document ships with the PR: libs/@hashintel/petrinaut-core/docs/string-interning.md.

🔗 Related links

🚫 Blocked by

🔍 What does this change?

The pool (engine/string-pool.ts):

  • Append-only for the duration of a run; entries immutable once assigned; never compacted mid-run — an ID written into frame 3 must still resolve when scrubbing at frame 900, since the interactive worker retains the full frame history. Fresh pool per init.
  • ID 0 is pre-seeded as "", so zeroed buffers decode sanely and string-free nets ship zero protocol overhead.
  • maxSize guard (1M distinct values) fails loudly with a targeted message if a kernel generates unbounded unique strings — the pathology that made us reject interning for UUIDs, contained here by design.

Buffers: new u64 physical kind (8 B, align 8) holding the pool ID via the existing BigUint64Array view. Stride math, byte-range compaction, and all whole-token moves are untouched — references are just bytes.

Pool distribution (the "not part of the frame" consequence):

  • Interactive: each SimulationFramePayload carries an append-only newStrings: { baseId, values } delta; the main-thread frame store accumulates its own pool copy (ordered, baseId-asserted) and frame readers decode through it. Delta ordering guarantees every stored frame is decodable on arrival.
  • Monte Carlo: one pool per run, never crossing threads — MC frames are ephemeral and metrics read them in-worker.

Semantics: runtime TokenRecords hold plain JS strings; coercion is total (String(value), missing → ""); interning is deterministic (same run ⇒ same IDs) and equal strings always share an ID; kernels/markings/scenarios write, dynamics read-only (?: never derivative); Distribution on a string field stays an error (LSP + runtime).

UI: String option in the dimension type select and the playground; spreadsheet string columns (text editing, identity parse, Delete → ""); the playground memory view shows the pool-reference round-trip (input "hello world" → pool id 1 → "hello world").

Docs: dimension-type list + kernel notes in the user guide; string row in the architecture format-v2 table; the design document.

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

  • modifies an npm-publishable library and I have added a changeset file(s)

📜 Does this require a change to the docs?

The changes in this PR:

  • require changes to docs which are made as part of this PR
    • ⚠️ the token-type screenshot in petri-net-extensions.md shows a four-option dropdown; the UI now has five — please re-capture

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

  • do not affect the execution graph

⚠️ Known issues

  • Unbounded distinct-string workloads grow the pool for the whole run (inherent to interning; guarded by maxSize with a clear error — see the design doc's trade-off section).
  • A frame alone is no longer sufficient to decode string fields: it needs the pool prefix up to its highest referenced ID (guaranteed by delta ordering in the store; called out in the design doc).

🐾 Next steps

  • Enum element type (per the parent epic) can build on the same pool mechanics with ahead-of-time values.
  • Pool statistics in run summaries; refcounting/GC between runs if profiling ever motivates it.

🛡 What tests cover this?

  • New string-pool.test.ts (dedup, reserved "", valuesFrom, maxSize guard) and frame-store.test.ts (delta accumulation, ordering assertion, reset)
  • token-layout u64 field tests; kernel interning (same string twice → same ID in the bytes), forwarding, String() coercion, missing → "", Distribution-on-string throws
  • MC run with a string element + expression metric; compile-scenario passthrough; LSP checker typing tests; spreadsheet/playground tests
  • Totals: 623 core + 141 UI tests, lint:tsc + lint:eslint clean

❓ How to test this?

  1. Checkout the branch, yarn dev in libs/@hashintel/petrinaut.
  2. Add a String dimension to a type; type free text into initial-state cells; run a simulation and confirm values survive transitions and display everywhere (visualizers, metrics via state.places.X.tokens[0].label).
  3. Author a kernel forwarding a string (label: input.Source[0].label) and one producing new strings; confirm equal strings behave identically.
  4. Open Dev / Token Encoding Playground, add a String dimension: the memory view shows the u64 pool reference and the input → pool id → value round-trip.

📹 Demo

The playground story shows the wire format: a string field as a u64 pool reference with its round-trip in the hover panel.

kube and others added 2 commits July 4, 2026 02:55
Variable-length strings cannot live in fixed-stride packed token
structs, so frames store a 64-bit reference (new u64 physical kind)
into an append-only per-run StringPool owned by the simulation — not
the frame:

- StringPool: id 0 pre-seeded as "" (zeroed buffers decode sanely),
  entries immutable once assigned, never compacted mid-run (IDs stay
  valid for the whole retained frame history), fresh pool per init,
  maxSize guard (1M distinct values) fails loudly on unbounded
  unique-string workloads.
- Interactive runs ship append-only `newStrings` payload deltas; the
  main-thread frame store accumulates its own pool copy (ordered,
  baseId-asserted) and frame readers decode through it. Monte Carlo
  pools are per-run and never cross threads.
- Runtime records hold plain JS strings; coercion is total
  (String(value), missing → ""); interning is deterministic and equal
  strings always share an ID. Distribution on string stays an error;
  dynamics read strings but cannot write them.
- LSP types string elements as `string` end to end; spreadsheet gains
  text cells; type properties and the playground gain the String
  option (the memory view shows the pool-reference round-trip).
- Design record: docs/string-interning.md (options considered,
  mutability analysis, trade-offs, future iterations).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e-769-add-string-discrete-type-support

# Conflicts:
#	libs/@hashintel/petrinaut-core/docs/architecture/engine.html
Copilot AI review requested due to automatic review settings July 4, 2026 00:59
@cursor

cursor Bot commented Jul 4, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Touches core simulation encoding, worker frame protocol, and frame history decoding—incorrect pool delta ordering would corrupt string fields on scrub/replay, though tests assert ordering and round-trips.

Overview
Introduces string as a fifth token dimension: user code sees plain JS strings (value equality), while format-v2 frames keep fixed stride via 8-byte u64 pool references into an append-only per-run StringPool on SimulationInstance (id 0 = "").

Engine & protocol: token-layout gains a u64 physical kind; encode/decode and kernel output paths intern/coerce strings (missing"", non-strings via String(value)); distributions on string fields still error; dynamics read strings but cannot write them. Interactive runs attach optional newStrings: { baseId, values } on frame payloads; the main-thread frame store applies deltas in order and decodes through an accumulated pool copy. Monte Carlo keeps one pool per run in-worker (no protocol change).

Surface area: schema/LSP virtual types/AI cheatsheet, scenario compile & clipboard validation, type properties + initial-state/scenario spreadsheets, token-encoding playground, user guide and string-interning.md architecture docs. Minor changeset bumps @hashintel/petrinaut and @hashintel/petrinaut-core.

Reviewed by Cursor Bugbot for commit 8d623ca. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions github-actions Bot added area/infra Relates to version control, CI, CD or IaC (area) area/libs Relates to first-party libraries/crates/packages (area) type/eng > frontend Owned by the @frontend team area/apps > hash.design Affects the `hash.design` design site (app) labels Jul 4, 2026
@vercel

vercel Bot commented Jul 4, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hash Ready Ready Preview, Comment Jul 4, 2026 1:08am
petrinaut Ready Ready Preview, Comment Jul 4, 2026 1:08am
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
hashdotdesign-tokens Ignored Ignored Jul 4, 2026 1:08am

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8d623ca. Configure here.

block.byteLength,
);
return readTokenRecord(layout, views, 0);
return readTokenRecord(layout, views, 0, stringPool);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test helper omits string pool

Low Severity

In the same file, buildTokenBytes, decodeTokenBlock, and makeTestFrame pass a StringPool into readTokenRecord / encodeTokenToBytes, but decodePlaceTokens still calls readTokenRecord without a pool. For layouts with string elements, that throws the new programmer-error from token-layout.ts, so engine tests that decode frames via this helper cannot exercise string tokens even when the simulation instance has a pool.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8d623ca. Configure here.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds string as a new token dimension type across Petrinaut UI + @hashintel/petrinaut-core, implementing per-run string interning via an append-only StringPool and shipping append-only pool deltas alongside interactive worker frame payloads so main-thread frame history remains decodable.

Changes:

  • Introduces StringPool and a new u64 physical kind for string pool references in packed token buffers, with encode/decode wired through the pool.
  • Extends the interactive worker protocol + main-thread frame store to accumulate newStrings deltas and decode historical frames against the accumulated pool.
  • Updates UI editors (type dropdown, spreadsheets, playground) and documentation to support string dimensions end-to-end.

Reviewed changes

Copilot reviewed 48 out of 48 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
libs/@hashintel/petrinaut/src/ui/views/Editor/panels/SimulateView/scenarios/scenario-mapping.ts Keeps string scenario columns as literal text when mapping rows to spreadsheet values.
libs/@hashintel/petrinaut/src/ui/views/Editor/panels/PropertiesPanel/type-properties/subviews/main.tsx Adds “String” to the dimension type selector in type properties.
libs/@hashintel/petrinaut/src/ui/views/Editor/panels/PropertiesPanel/place-properties/subviews/place-initial-state/initial-state-editor.tsx Preserves string initial-marking values as text in the spreadsheet editor.
libs/@hashintel/petrinaut/src/ui/lib/compile-visualizer.ts Extends visualizer token prop typing to include string values.
libs/@hashintel/petrinaut/src/ui/dev/token-encoding-playground/token-memory-view.tsx Updates playground memory view to display string pool IDs + decoded text round-trip.
libs/@hashintel/petrinaut/src/ui/dev/token-encoding-playground/playground-monaco.ts Updates Monaco defs generation to type string dimensions as string.
libs/@hashintel/petrinaut/src/ui/dev/token-encoding-playground/physical-layout.ts Uses a throwaway StringPool in the playground encoder/decoder path for string fields.
libs/@hashintel/petrinaut/src/ui/dev/token-encoding-playground/physical-layout.test.ts Adds playground tests for string field layout and pool-reference round-trips.
libs/@hashintel/petrinaut/src/ui/dev/token-encoding-playground/dimension-editor.tsx Adds “String” to the playground dimension type selector.
libs/@hashintel/petrinaut/src/ui/components/spreadsheet.tsx Adds string column type support (parsing, tooltips, input type handling).
libs/@hashintel/petrinaut/src/ui/components/spreadsheet.stories.tsx Extends spreadsheet Storybook story with a string column + data.
libs/@hashintel/petrinaut/docs/petri-net-extensions.md Documents the new String dimension type and its discrete semantics.
libs/@hashintel/petrinaut-core/src/types/sdcpn.ts Adds "string" to ColorElementType and includes string in token attribute runtime union.
libs/@hashintel/petrinaut-core/src/simulation/worker/simulation.worker.ts Ships append-only newStrings deltas and resets delta state per init/reset.
libs/@hashintel/petrinaut-core/src/simulation/worker/simulation.worker.test.ts Tests initial-marking delta shipping and omission when no string fields exist.
libs/@hashintel/petrinaut-core/src/simulation/worker/frame-payload.ts Extends worker frame payload type with optional newStrings delta.
libs/@hashintel/petrinaut-core/src/simulation/runtime/frame-store.ts Accumulates main-thread string pool copy and asserts delta ordering before storing frames.
libs/@hashintel/petrinaut-core/src/simulation/runtime/frame-store.test.ts New tests for pool accumulation, ordering assertions, and clear() reset behavior.
libs/@hashintel/petrinaut-core/src/simulation/monte-carlo/transition-effect.ts Ensures MC decode/encode paths use the run’s string pool; adjusts error formatting.
libs/@hashintel/petrinaut-core/src/simulation/monte-carlo/monte-carlo-simulator.test.ts Adds an end-to-end MC test covering string interning + metric decoding.
libs/@hashintel/petrinaut-core/src/simulation/monte-carlo/frame-reader.ts Decodes MC tokens using the run-local string pool.
libs/@hashintel/petrinaut-core/src/simulation/frames/frame-reader.ts Extends frame reader compilation to accept a StringPoolReader for string decoding.
libs/@hashintel/petrinaut-core/src/simulation/frames/frame-reader.test.ts Adds coverage for decoding string fields through a provided pool accessor.
libs/@hashintel/petrinaut-core/src/simulation/engine/types.ts Adds stringPool to SimulationInstance so the pool is owned per run/init.
libs/@hashintel/petrinaut-core/src/simulation/engine/token-values.ts Adds string default/coercion and guards against encoding/decoding strings without a pool.
libs/@hashintel/petrinaut-core/src/simulation/engine/token-layout.ts Adds u64 physical kind, pool reader/writer types, and string pool integration for read/write/encode.
libs/@hashintel/petrinaut-core/src/simulation/engine/token-layout.test.ts Adds layout + round-trip tests for string fields stored as u64 pool references.
libs/@hashintel/petrinaut-core/src/simulation/engine/token-layout.test-helpers.ts Threads stringPool through test helpers so decoding works for string layouts.
libs/@hashintel/petrinaut-core/src/simulation/engine/string-pool.ts New append-only StringPool implementation with max-size guard and delta support.
libs/@hashintel/petrinaut-core/src/simulation/engine/string-pool.test.ts New unit tests for deduping, reserved "", valuesFrom, and max-size guard.
libs/@hashintel/petrinaut-core/src/simulation/engine/execute-transitions.test.ts Ensures test simulation instances include a string pool.
libs/@hashintel/petrinaut-core/src/simulation/engine/encode-kernel-token.ts Interns string outputs in kernel encoding and stores u64 pool IDs into buffers.
libs/@hashintel/petrinaut-core/src/simulation/engine/compute-possible-transition.ts Decodes inputs via the simulation pool and interns outputs via the pool; adjusts error formatting.
libs/@hashintel/petrinaut-core/src/simulation/engine/compute-possible-transition.test.ts Adds kernel output tests for string interning, forwarding, defaults, and Distribution rejection.
libs/@hashintel/petrinaut-core/src/simulation/engine/build-simulation.ts Constructs a per-run StringPool and uses it while packing the initial marking + decoding dynamics input.
libs/@hashintel/petrinaut-core/src/simulation/authoring/scenario/compile-scenario.test.ts Adds compile-scenario tests ensuring string columns pass through literally and default correctly.
libs/@hashintel/petrinaut-core/src/simulation/api.ts Clarifies initial marking value semantics for string attributes.
libs/@hashintel/petrinaut-core/src/schemas/scenario-schema.ts Clarifies schema docs: strings are literal for string elements; uuid strings still coerce for uuid elements.
libs/@hashintel/petrinaut-core/src/schemas/entity-schemas.ts Extends element type enum and schema descriptions to include string semantics + interning.
libs/@hashintel/petrinaut-core/src/lsp/lib/generate-virtual-files.ts Types string elements as string in LSP-generated TS defs (incl. metric session token record unions).
libs/@hashintel/petrinaut-core/src/lsp/lib/checker.test.ts Adds LSP checker tests for string typing, kernel output acceptance, and Distribution rejection.
libs/@hashintel/petrinaut-core/src/index.ts Exports StringPool and pool reader/writer types from the package entrypoint.
libs/@hashintel/petrinaut-core/src/default-codes.ts Adds default source literals for string attributes in generated templates.
libs/@hashintel/petrinaut-core/src/clipboard/serialize.test.ts Updates an invalid-type fixture now that "string" is a valid element type.
libs/@hashintel/petrinaut-core/src/ai.ts Updates code-surface guidance to include string typing and scenario semantics.
libs/@hashintel/petrinaut-core/docs/string-interning.md New design/decision doc describing the string interning architecture and trade-offs.
libs/@hashintel/petrinaut-core/docs/architecture/engine.html Updates the format-v2 table and notes the interactive newStrings delta protocol.
.changeset/fe-769-string-token-dimension-type.md Changeset documenting the new string element type and storage semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 754 to +758
<tr
// eslint-disable-next-line react/no-array-index-key -- Row position is stable and meaningful
key={`row-${rowIndex}-${row.map(formatCellValue).join("-")}`}
key={`row-${rowIndex}-${row
.map(formatCellValue)
.join("-")}`}
Comment on lines 74 to +78
* - `uuid` attributes are OPTIONAL (omitted values are auto-generated from
* the seeded simulation RNG) and also accept UUID strings and the
* `Uuid.generate()` / `Uuid.from(value)` sentinels.
* - Other discrete attributes (`integer`, `boolean`) must be plain values.
* - Other discrete attributes (`integer`, `boolean`, `string`) must be plain
* values (`string` never takes a Distribution or a sentinel).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/apps > hash.design Affects the `hash.design` design site (app) area/infra Relates to version control, CI, CD or IaC (area) area/libs Relates to first-party libraries/crates/packages (area) type/eng > frontend Owned by the @frontend team

Development

Successfully merging this pull request may close these issues.

2 participants