hashintel · kube · Jul 4, 2026 · Jul 4, 2026
diff --git a/.changeset/fe-769-string-token-dimension-type.md b/.changeset/fe-769-string-token-dimension-type.md
@@ -0,0 +1,6 @@
+---
+"@hashintel/petrinaut": minor
+"@hashintel/petrinaut-core": minor
+---
+
+Add the `string` token element type: free-form text represented as plain JS strings in all user code, compared by value, and stored via per-run string interning — frame buffers hold 64-bit references into an append-only `StringPool` owned by the simulation (interactive runs ship append-only `newStrings` pool deltas alongside frame payloads; Monte Carlo runs keep one pool per run). Kernel outputs take plain strings (missing values become `""`, non-strings stringify via `String(value)`); Distributions on string elements are rejected and dynamics cannot write them. The type-properties panel, initial-state and scenario spreadsheets, and the token-encoding playground gain String columns/dimensions.
diff --git a/libs/@hashintel/petrinaut-core/docs/architecture/engine.html b/libs/@hashintel/petrinaut-core/docs/architecture/engine.html
@@ -331,6 +331,13 @@ <h2>Format v2 — packed token structs (current)</h2>
         lane-compare without combining for equality, ~4× faster). Never routed
         through <code>number</code> (NaN-payload hazard). Kernel outputs are
         optional — omitted values auto-generate from the seeded RNG.</td></tr>
+    <tr><td><code>string</code> (FE-769)</td><td><code>u64</code> pool reference — 8 B</td>
+        <td>The frame stores an ID into an append-only per-run string intern
+        pool (<code>engine/string-pool.ts</code>); the pool lives on
+        <code>SimulationInstance</code>, not on the frame, so frames stay
+        fixed-stride and byte-copyable. Equal strings share one ID; id 0 is
+        the pre-seeded <code>""</code>, so zeroed buffers decode cleanly.
+        See <code>docs/string-interning.md</code>.</td></tr>
   </tbody>
 </table>
 <ul>
@@ -344,7 +351,10 @@ <h2>Format v2 — packed token structs (current)</h2>
     <code>encodeTokenToBytes</code>) is the only code that indexes token bytes;
     all whole-token moves are byte-range copies (<code>Uint8Array.set</code>).
     The raw <code>getPlaceTokenValues</code> reader was removed from the public
-    API — raw f64 access is meaningless under mixed widths.</li>
+    API — raw f64 access is meaningless under mixed widths. Because the string
+    pool never crosses the worker boundary with the frames, each frame payload
+    ships an append-only <code>newStrings</code> delta that the main-thread
+    frame store accumulates and hands to the frame reader for decoding.</li>
 </ul>
 
 <h2>Determinism</h2>

diff --git a/libs/@hashintel/petrinaut-core/docs/string-interning.md b/libs/@hashintel/petrinaut-core/docs/string-interning.md
@@ -0,0 +1,137 @@
+# String interning for `string` token elements
+
+Status: implemented (FE-769). Companion to the format-v2 packed token layout
+described in `docs/architecture/engine.html`.
+
+## Problem
+
+Format v2 stores each place's tokens as fixed-stride packed structs inside one
+contiguous frame buffer. Everything the engine does well depends on that
+fixed stride:
+
+- **O(1) token addressing** — `byteOffset + tokenIndex * strideBytes`.
+- **Byte-range moves** — removals/additions compact and append tokens with
+  `Uint8Array.set`, never interpreting field contents.
+- **Shared typed-array views** — f64/u8/u64 views over the whole token
+  region, no `DataView` in hot paths.
+
+`string` elements are variable-length. A JS string cannot be dropped into a
+fixed-width struct field without either truncating it or breaking the stride
+invariants, so the value itself cannot live in the frame.
+
+## Options considered and rejected
+
+1. **Inline variable-length bytes in the token region.** Tokens would no
+   longer have a per-colour `sizeof`; addressing becomes a prefix-sum walk,
+   stride math dies, and every byte-range compaction in
+   `compute-place-next-state` / Monte Carlo `frame-operations` would need to
+   understand field contents. Rejected — it destroys the core property of the
+   format.
+2. **Fixed-width truncated char arrays** (e.g. 32 bytes of UTF-8 per field).
+   Keeps the stride but is lossy (silent truncation is a correctness bug, not
+   a trade-off), wasteful for short strings, and still forces a max length
+   into the schema. Rejected.
+3. **Per-frame string tables.** Self-contained frames, but the table is
+   duplicated into every retained frame. The interactive worker retains the
+   whole frame history for scrubbing — thousands of frames sharing mostly
+   identical string sets would multiply memory by the frame count. Rejected.
+4. **Strings only in decoded `TokenRecord`s** (never in frames). The frame
+   would no longer round-trip: re-encoding a frame's tokens, replaying, or
+   reading a historical frame would lose string values. Rejected — frames are
+   the source of truth for simulation state.
+
+## Chosen design: append-only per-run intern pool
+
+- `engine/string-pool.ts` defines `StringPool`: append-only map from string →
+  small integer ID. **id 0 is pre-seeded as `""`**, so a zeroed buffer decodes
+  to the empty string and "missing value" needs no sentinel.
+- Frames store a **u64 pool reference** per string field (`PhysicalKind
+"u64"`, 8 bytes, align 8), read and written through the existing
+  `BigUint64Array` view. IDs are small integers; `Number(id)` / `BigInt(id)`
+  convert at the boundary.
+- **The pool is part of the simulation, not of the frame.** It lives on
+  `SimulationInstance.stringPool`, created fresh by `buildSimulation` per
+  init/run.
+- Encoding is total: kernel/marking/scenario values coerce via
+  `String(value)` with `undefined`/`null` → `""`, then intern. Distributions
+  on string fields keep throwing the discrete-element error. Dynamics can
+  read string fields (the decode passes the pool) but never write them
+  (derivative type is `?: never`).
+
+### Interactive worker
+
+The pool lives in the worker; frames are posted to the main thread. Rather
+than shipping the pool (or the strings) with every frame, each
+`SimulationFramePayload` carries an **append-only delta**:
+
+```
+newStrings?: { baseId: number; values: string[] }
+```
+
+The worker tracks `sentStringCount` (starting at 1 — `""` is pre-seeded on
+both sides) and attaches `pool.valuesFrom(sentStringCount)` whenever new
+entries exist. The main-thread frame store owns an accumulated `string[]`
+copy: `appendBatch` asserts `baseId === pool.length` and pushes the values
+before storing the frame, then hands a pool accessor to the compiled frame
+reader. Delta ordering guarantees the invariant a reader needs: every frame
+only references IDs at or below the pool length reached once its own delta is
+applied (pool prefix ≤ frame).
+
+### Monte Carlo
+
+Each `MonteCarloRun` builds its own simulation via `buildSimulation`, so each
+run has its own pool. Frames never leave the worker; the run-local metric
+frame reader decodes with the run's pool and metric frames carry plain
+decoded values. No protocol change.
+
+## Mutability analysis
+
+- **Append-only, immutable entries.** An ID written into any retained frame
+  must decode to the same string for the whole frame history — scrubbing and
+  replay read old frames against the current pool. Reassigning or compacting
+  IDs mid-run would silently corrupt history.
+- **No mid-run GC/compaction.** Deciding an entry is dead requires scanning
+  every retained frame's string fields; the savings don't justify the cost or
+  the invalidation risk. Growth is bounded by the number of _distinct_
+  strings, not tokens.
+- **Reset boundaries.** A fresh pool per `init` (interactive) and per run
+  (Monte Carlo); the main-thread copy resets on `frameStore.clear()`
+  (`Simulation.reset`). Nothing survives a run.
+
+## Determinism
+
+Intern order is execution order: the same net, marking, parameters, seed and
+dt produce the same sequence of interned strings and therefore the same IDs
+and identical frame bytes. Equal strings always share an ID (interned
+equality), and no RNG state is consumed by interning — string handling cannot
+perturb stochastic draws.
+
+## Known issues / accepted trade-offs
+
+- **Unbounded growth for unique-string workloads.** A kernel emitting
+  `order-${n}` interned per firing grows the pool for the whole run. The
+  constructor's `maxSize` guard (default 1,000,000 distinct values) turns the
+  pathological case into a clear error ("string pool exceeded N distinct
+  values — are kernels generating unbounded unique strings?") instead of
+  silent memory exhaustion.
+- **Frames alone no longer decode string fields.** A frame is only meaningful
+  together with a pool prefix of sufficient length. The delta ordering above
+  guarantees this on the main thread; `readTokenRecord` throws if a layout
+  contains string fields and no pool is supplied (programmer error, not a
+  runtime condition).
+- **Main-thread pool copy duplicates memory.** The strings themselves are
+  shared by reference after structured clone materialises them once per
+  delta; the duplicated cost is the array of references, which is negligible
+  next to frame retention.
+
+## Future iterations
+
+- **Refcounting/GC between runs** if pools are ever shared across runs (they
+  currently are not — reset makes GC unnecessary).
+- **Enum element type** building on the same pool mechanics with a closed,
+  schema-declared value set (ticket exists in the parent epic): same u64
+  representation, but validation instead of open interning, and stable IDs
+  derivable from the schema.
+- **Pool statistics in run summaries** (distinct-string count, byte
+  estimate), cheap to expose from `StringPool.size` for diagnosing
+  string-heavy models.
diff --git a/libs/@hashintel/petrinaut-core/src/ai.ts b/libs/@hashintel/petrinaut-core/src/ai.ts
@@ -269,12 +269,12 @@ Validate every code-writing change. After any tool call that writes code — lam
 Place names are part of the code surface: lambdas/kernels read \`input.PlaceName\`, metrics read \`state.places.PlaceName.count\`, and scenario code-mode initial state keys are place names. Renaming a place via \`updatePlace\` requires updating every dependent lambda, kernel, dynamics, metric, visualizer, and scenario in the same batch — otherwise you will silently break references.
 
 Code-surface cheatsheet (exact shapes expected by the runtime):
-- Transition lambda (\`transition.lambdaCode\`): \`export default Lambda((input, parameters) => …)\`. Available when stochasticity is enabled OR when colours are enabled and the transition has at least one standard or read input arc from a coloured place. \`input.PlaceName\` is a tuple sized to the input arc weight for coloured standard and read input arcs; token attributes are typed by the colour element: real/integer → number, boolean → boolean, uuid → bigint. Read arcs expose tokens in \`input\` but do not consume them when the transition fires. Inhibitor arcs and uncoloured input places are NOT in \`input\`. Predicate → boolean; stochastic → non-negative rate in firings per simulation second (0 disables, Infinity always fires). Must be deterministic. If unavailable or empty, the runtime uses true for predicate-style transitions and Infinity for stochastic-style transitions.
-- Transition kernel (\`transition.transitionKernelCode\`): \`export default TransitionKernel((input, parameters) => …)\`. Available only for transitions with coloured output places. Return \`{ OutputPlaceName: [token, …] }\` sized to the output arc weight. Include only coloured output places; uncoloured output places are auto-populated. Output values must match element types: real/integer use numbers, boolean uses booleans. uuid attributes are OPTIONAL in output tokens: omit them to auto-generate a fresh UUID deterministically from the seeded simulation RNG, use \`Uuid.generate()\` for an explicit fresh UUID, \`Uuid.from(value)\` to derive one from any value, or forward an input token's uuid bigint unchanged; plain non-UUID values (numbers, arbitrary strings) are converted deterministically via UUIDv5. When stochasticity is enabled, real attributes may use \`Distribution.Gaussian(mean, sd)\` / \`Distribution.Uniform(min, max)\` / \`Distribution.Lognormal(mu, sigma)\` (never integer/boolean/uuid attributes), and chained \`.map(fn)\` on the same distribution shares one draw. When stochasticity is disabled, kernel outputs must use plain values only. Leave empty when no coloured outputs exist.
-- Differential equation (\`differentialEquation.code\`): \`export default Dynamics((tokens, parameters) => …)\`. \`tokens\` is THIS place's tokens only. Return an array of the same length whose entries provide derivatives for real-valued elements only (i.e. dx/dt, not the new value); integer, boolean, and uuid elements are discrete and remain unchanged by dynamics. The equation's \`colorId\` MUST match every referencing place's \`colorId\`.
+- Transition lambda (\`transition.lambdaCode\`): \`export default Lambda((input, parameters) => …)\`. Available when stochasticity is enabled OR when colours are enabled and the transition has at least one standard or read input arc from a coloured place. \`input.PlaceName\` is a tuple sized to the input arc weight for coloured standard and read input arcs; token attributes are typed by the colour element: real/integer → number, boolean → boolean, uuid → bigint, string → string (plain JS strings everywhere, compared by value). Read arcs expose tokens in \`input\` but do not consume them when the transition fires. Inhibitor arcs and uncoloured input places are NOT in \`input\`. Predicate → boolean; stochastic → non-negative rate in firings per simulation second (0 disables, Infinity always fires). Must be deterministic. If unavailable or empty, the runtime uses true for predicate-style transitions and Infinity for stochastic-style transitions.
+- Transition kernel (\`transition.transitionKernelCode\`): \`export default TransitionKernel((input, parameters) => …)\`. Available only for transitions with coloured output places. Return \`{ OutputPlaceName: [token, …] }\` sized to the output arc weight. Include only coloured output places; uncoloured output places are auto-populated. Output values must match element types: real/integer use numbers, boolean uses booleans, string uses plain strings (REQUIRED in the token type; a missing/undefined value becomes the empty string \`""\`, and non-string values are stringified via \`String(value)\`). uuid attributes are OPTIONAL in output tokens: omit them to auto-generate a fresh UUID deterministically from the seeded simulation RNG, use \`Uuid.generate()\` for an explicit fresh UUID, \`Uuid.from(value)\` to derive one from any value, or forward an input token's uuid bigint unchanged; plain non-UUID values (numbers, arbitrary strings) are converted deterministically via UUIDv5. When stochasticity is enabled, real attributes may use \`Distribution.Gaussian(mean, sd)\` / \`Distribution.Uniform(min, max)\` / \`Distribution.Lognormal(mu, sigma)\` (never integer/boolean/uuid/string attributes), and chained \`.map(fn)\` on the same distribution shares one draw. When stochasticity is disabled, kernel outputs must use plain values only. Leave empty when no coloured outputs exist.
+- Differential equation (\`differentialEquation.code\`): \`export default Dynamics((tokens, parameters) => …)\`. \`tokens\` is THIS place's tokens only. Return an array of the same length whose entries provide derivatives for real-valued elements only (i.e. dx/dt, not the new value); integer, boolean, uuid, and string elements are discrete and remain unchanged by dynamics (they can be read from input tokens but never written). The equation's \`colorId\` MUST match every referencing place's \`colorId\`.
 - Place visualizer (\`place.visualizerCode\`): \`export default Visualization(({ tokens, parameters }) => <JSX/>)\`. Classic React runtime — do NOT import React, do NOT use \`<>…</>\` fragments, do NOT use hooks. Convention: return a sized \`<svg viewBox="0 0 W H">…</svg>\`.
 - Metric (\`metric.code\`): a plain function body — NOT a module, no \`export default\`, no wrapper. The only variable in scope is \`state\`. Must \`return\` a finite number. Example: \`return state.places.Infected.count / (state.places.Susceptible.count + state.places.Infected.count + state.places.Recovered.count);\`. \`parameters\` and \`scenario\` are NOT available inside metrics.
-- Scenario per_place initial state: \`content\` keys are place IDs; uncoloured values are expressions with \`parameters\` and \`scenario\` in scope; coloured values are row arrays in colour element order using numbers and booleans; uuid columns accept UUID strings (any other text converts deterministically to a UUID via UUIDv5).
+- Scenario per_place initial state: \`content\` keys are place IDs; uncoloured values are expressions with \`parameters\` and \`scenario\` in scope; coloured values are row arrays in colour element order using numbers and booleans; string columns take literal text; uuid columns accept UUID strings (any other text converts deterministically to a UUID via UUIDv5).
 - Scenario code-mode initial state: function body returning \`{ PlaceName: tokens }\` keyed by NAME (asymmetric with per_place IDs); unknown names are silently dropped.
 - Parameter access in any code surface: use \`parameters.<variableName>\` where \`<variableName>\` is the parameter's lower_snake_case \`variableName\` value (e.g. \`parameters.crash_threshold\`, never \`parameters.crashThreshold\`).
 

diff --git a/libs/@hashintel/petrinaut-core/src/clipboard/serialize.test.ts b/libs/@hashintel/petrinaut-core/src/clipboard/serialize.test.ts
@@ -494,7 +494,7 @@ describe("parseClipboardPayload", () => {
             name: "Token",
             iconSlug: "circle",
             displayColor: "#FF0000",
-            elements: [{ elementId: "e1", name: "val", type: "string" }],
+            elements: [{ elementId: "e1", name: "val", type: "complex" }],
           },
         ],
         differentialEquations: [],

diff --git a/libs/@hashintel/petrinaut-core/src/default-codes.ts b/libs/@hashintel/petrinaut-core/src/default-codes.ts
@@ -9,17 +9,23 @@ const defaultTokenAttributeSource = (
     case "integer":
     case "real":
       return "0";
+    case "string":
+      return '""';
     case "uuid":
       return "Uuid.generate()";
   }
 };
 
 export function generateDefaultVisualizerCode(type: Color): string {
-  return `// This function defines how to visualize the tokens in the place of type "${type.name}".
+  return `// This function defines how to visualize the tokens in the place of type "${
+    type.name
+  }".
 // It receives the current tokens and parameters.
 export default Visualization(({ tokens, parameters }) => {
   return <svg viewBox="0 0 800 600">
-    {tokens.map(({ ${type.elements.map((el) => el.name).join(", ")} }, index) => (
+    {tokens.map(({ ${type.elements
+      .map((el) => el.name)
+      .join(", ")} }, index) => (
       // Example: simple circle for each token
       <circle />
     ))}
@@ -47,7 +53,9 @@ export function generateDefaultDifferentialEquationCode(type: Color): string {
     }; // Example: all real-valued derivatives = 1`
       : `{}; // This type has no real-valued attributes; discrete values are unchanged by dynamics`;
 
-  return `// This function defines the differential equation for the place of type "${type.name}".
+  return `// This function defines the differential equation for the place of type "${
+    type.name
+  }".
 // The function receives the current tokens in this place and the parameters.
 // It should return derivatives for real-valued token attributes in this place.
 export default Dynamics((tokens, parameters) => {
@@ -90,7 +98,11 @@ export default Lambda((tokensByPlace, parameters) => {
   //  2. Infinity means always enabled
   //  3. Any other number is the average rate per second
 
-  ${lambdaType === "predicate" ? "return true; // Always enabled (alternative: return Infinity;)" : "return 1.0; // Average firing rate of once per second"}
+  ${
+    lambdaType === "predicate"
+      ? "return true; // Always enabled (alternative: return Infinity;)"
+      : "return 1.0; // Average firing rate of once per second"
+  }
 });`;
 
 export function generateDefaultTransitionKernelCode(
@@ -117,7 +129,9 @@ export default TransitionKernel((tokensByPlace, parameters) => {
       ${Array.from({ length: arc.weight })
         .map(
           () =>
-            `{ ${arc.type.elements.map((el) => `${el.name}: ${defaultTokenAttributeSource(el)}`).join(", ")} }`,
+            `{ ${arc.type.elements
+              .map((el) => `${el.name}: ${defaultTokenAttributeSource(el)}`)
+              .join(", ")} }`,
         )
         .join(",\n      ")}
     ],`,