Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changeset/fe-769-string-token-dimension-type.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
"@hashintel/petrinaut": minor
"@hashintel/petrinaut-core": minor
---

Add the `string` token element type: free-form text represented as plain JS strings in all user code, compared by value, and stored via per-run string interning β€” frame buffers hold 64-bit references into an append-only `StringPool` owned by the simulation (interactive runs ship append-only `newStrings` pool deltas alongside frame payloads; Monte Carlo runs keep one pool per run). Kernel outputs take plain strings (missing values become `""`, non-strings stringify via `String(value)`); Distributions on string elements are rejected and dynamics cannot write them. The type-properties panel, initial-state and scenario spreadsheets, and the token-encoding playground gain String columns/dimensions.
12 changes: 11 additions & 1 deletion libs/@hashintel/petrinaut-core/docs/architecture/engine.html
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,13 @@ <h2>Format v2 β€” packed token structs (current)</h2>
lane-compare without combining for equality, ~4Γ— faster). Never routed
through <code>number</code> (NaN-payload hazard). Kernel outputs are
optional β€” omitted values auto-generate from the seeded RNG.</td></tr>
<tr><td><code>string</code> (FE-769)</td><td><code>u64</code> pool reference β€” 8 B</td>
<td>The frame stores an ID into an append-only per-run string intern
pool (<code>engine/string-pool.ts</code>); the pool lives on
<code>SimulationInstance</code>, not on the frame, so frames stay
fixed-stride and byte-copyable. Equal strings share one ID; id 0 is
the pre-seeded <code>""</code>, so zeroed buffers decode cleanly.
See <code>docs/string-interning.md</code>.</td></tr>
</tbody>
</table>
<ul>
Expand All @@ -344,7 +351,10 @@ <h2>Format v2 β€” packed token structs (current)</h2>
<code>encodeTokenToBytes</code>) is the only code that indexes token bytes;
all whole-token moves are byte-range copies (<code>Uint8Array.set</code>).
The raw <code>getPlaceTokenValues</code> reader was removed from the public
API β€” raw f64 access is meaningless under mixed widths.</li>
API β€” raw f64 access is meaningless under mixed widths. Because the string
pool never crosses the worker boundary with the frames, each frame payload
ships an append-only <code>newStrings</code> delta that the main-thread
frame store accumulates and hands to the frame reader for decoding.</li>
</ul>

<h2>Determinism</h2>
Expand Down
137 changes: 137 additions & 0 deletions libs/@hashintel/petrinaut-core/docs/string-interning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# String interning for `string` token elements

Status: implemented (FE-769). Companion to the format-v2 packed token layout
described in `docs/architecture/engine.html`.

## Problem

Format v2 stores each place's tokens as fixed-stride packed structs inside one
contiguous frame buffer. Everything the engine does well depends on that
fixed stride:

- **O(1) token addressing** β€” `byteOffset + tokenIndex * strideBytes`.
- **Byte-range moves** β€” removals/additions compact and append tokens with
`Uint8Array.set`, never interpreting field contents.
- **Shared typed-array views** β€” f64/u8/u64 views over the whole token
region, no `DataView` in hot paths.

`string` elements are variable-length. A JS string cannot be dropped into a
fixed-width struct field without either truncating it or breaking the stride
invariants, so the value itself cannot live in the frame.

## Options considered and rejected

1. **Inline variable-length bytes in the token region.** Tokens would no
longer have a per-colour `sizeof`; addressing becomes a prefix-sum walk,
stride math dies, and every byte-range compaction in
`compute-place-next-state` / Monte Carlo `frame-operations` would need to
understand field contents. Rejected β€” it destroys the core property of the
format.
2. **Fixed-width truncated char arrays** (e.g. 32 bytes of UTF-8 per field).
Keeps the stride but is lossy (silent truncation is a correctness bug, not
a trade-off), wasteful for short strings, and still forces a max length
into the schema. Rejected.
3. **Per-frame string tables.** Self-contained frames, but the table is
duplicated into every retained frame. The interactive worker retains the
whole frame history for scrubbing β€” thousands of frames sharing mostly
identical string sets would multiply memory by the frame count. Rejected.
4. **Strings only in decoded `TokenRecord`s** (never in frames). The frame
would no longer round-trip: re-encoding a frame's tokens, replaying, or
reading a historical frame would lose string values. Rejected β€” frames are
the source of truth for simulation state.

## Chosen design: append-only per-run intern pool

- `engine/string-pool.ts` defines `StringPool`: append-only map from string β†’
small integer ID. **id 0 is pre-seeded as `""`**, so a zeroed buffer decodes
to the empty string and "missing value" needs no sentinel.
- Frames store a **u64 pool reference** per string field (`PhysicalKind
"u64"`, 8 bytes, align 8), read and written through the existing
`BigUint64Array` view. IDs are small integers; `Number(id)` / `BigInt(id)`
convert at the boundary.
- **The pool is part of the simulation, not of the frame.** It lives on
`SimulationInstance.stringPool`, created fresh by `buildSimulation` per
init/run.
- Encoding is total: kernel/marking/scenario values coerce via
`String(value)` with `undefined`/`null` β†’ `""`, then intern. Distributions
on string fields keep throwing the discrete-element error. Dynamics can
read string fields (the decode passes the pool) but never write them
(derivative type is `?: never`).

### Interactive worker

The pool lives in the worker; frames are posted to the main thread. Rather
than shipping the pool (or the strings) with every frame, each
`SimulationFramePayload` carries an **append-only delta**:

```
newStrings?: { baseId: number; values: string[] }
```

The worker tracks `sentStringCount` (starting at 1 β€” `""` is pre-seeded on
both sides) and attaches `pool.valuesFrom(sentStringCount)` whenever new
entries exist. The main-thread frame store owns an accumulated `string[]`
copy: `appendBatch` asserts `baseId === pool.length` and pushes the values
before storing the frame, then hands a pool accessor to the compiled frame
reader. Delta ordering guarantees the invariant a reader needs: every frame
only references IDs at or below the pool length reached once its own delta is
applied (pool prefix ≀ frame).

### Monte Carlo

Each `MonteCarloRun` builds its own simulation via `buildSimulation`, so each
run has its own pool. Frames never leave the worker; the run-local metric
frame reader decodes with the run's pool and metric frames carry plain
decoded values. No protocol change.

## Mutability analysis

- **Append-only, immutable entries.** An ID written into any retained frame
must decode to the same string for the whole frame history β€” scrubbing and
replay read old frames against the current pool. Reassigning or compacting
IDs mid-run would silently corrupt history.
- **No mid-run GC/compaction.** Deciding an entry is dead requires scanning
every retained frame's string fields; the savings don't justify the cost or
the invalidation risk. Growth is bounded by the number of _distinct_
strings, not tokens.
- **Reset boundaries.** A fresh pool per `init` (interactive) and per run
(Monte Carlo); the main-thread copy resets on `frameStore.clear()`
(`Simulation.reset`). Nothing survives a run.

## Determinism

Intern order is execution order: the same net, marking, parameters, seed and
dt produce the same sequence of interned strings and therefore the same IDs
and identical frame bytes. Equal strings always share an ID (interned
equality), and no RNG state is consumed by interning β€” string handling cannot
perturb stochastic draws.

## Known issues / accepted trade-offs

- **Unbounded growth for unique-string workloads.** A kernel emitting
`order-${n}` interned per firing grows the pool for the whole run. The
constructor's `maxSize` guard (default 1,000,000 distinct values) turns the
pathological case into a clear error ("string pool exceeded N distinct
values β€” are kernels generating unbounded unique strings?") instead of
silent memory exhaustion.
- **Frames alone no longer decode string fields.** A frame is only meaningful
together with a pool prefix of sufficient length. The delta ordering above
guarantees this on the main thread; `readTokenRecord` throws if a layout
contains string fields and no pool is supplied (programmer error, not a
runtime condition).
- **Main-thread pool copy duplicates memory.** The strings themselves are
shared by reference after structured clone materialises them once per
delta; the duplicated cost is the array of references, which is negligible
next to frame retention.

## Future iterations

- **Refcounting/GC between runs** if pools are ever shared across runs (they
currently are not β€” reset makes GC unnecessary).
- **Enum element type** building on the same pool mechanics with a closed,
schema-declared value set (ticket exists in the parent epic): same u64
representation, but validation instead of open interning, and stable IDs
derivable from the schema.
- **Pool statistics in run summaries** (distinct-string count, byte
estimate), cheap to expose from `StringPool.size` for diagnosing
string-heavy models.
8 changes: 4 additions & 4 deletions libs/@hashintel/petrinaut-core/src/ai.ts
Original file line number Diff line number Diff line change
Expand Up @@ -269,12 +269,12 @@ Validate every code-writing change. After any tool call that writes code β€” lam
Place names are part of the code surface: lambdas/kernels read \`input.PlaceName\`, metrics read \`state.places.PlaceName.count\`, and scenario code-mode initial state keys are place names. Renaming a place via \`updatePlace\` requires updating every dependent lambda, kernel, dynamics, metric, visualizer, and scenario in the same batch β€” otherwise you will silently break references.

Code-surface cheatsheet (exact shapes expected by the runtime):
- Transition lambda (\`transition.lambdaCode\`): \`export default Lambda((input, parameters) => …)\`. Available when stochasticity is enabled OR when colours are enabled and the transition has at least one standard or read input arc from a coloured place. \`input.PlaceName\` is a tuple sized to the input arc weight for coloured standard and read input arcs; token attributes are typed by the colour element: real/integer β†’ number, boolean β†’ boolean, uuid β†’ bigint. Read arcs expose tokens in \`input\` but do not consume them when the transition fires. Inhibitor arcs and uncoloured input places are NOT in \`input\`. Predicate β†’ boolean; stochastic β†’ non-negative rate in firings per simulation second (0 disables, Infinity always fires). Must be deterministic. If unavailable or empty, the runtime uses true for predicate-style transitions and Infinity for stochastic-style transitions.
- Transition kernel (\`transition.transitionKernelCode\`): \`export default TransitionKernel((input, parameters) => …)\`. Available only for transitions with coloured output places. Return \`{ OutputPlaceName: [token, …] }\` sized to the output arc weight. Include only coloured output places; uncoloured output places are auto-populated. Output values must match element types: real/integer use numbers, boolean uses booleans. uuid attributes are OPTIONAL in output tokens: omit them to auto-generate a fresh UUID deterministically from the seeded simulation RNG, use \`Uuid.generate()\` for an explicit fresh UUID, \`Uuid.from(value)\` to derive one from any value, or forward an input token's uuid bigint unchanged; plain non-UUID values (numbers, arbitrary strings) are converted deterministically via UUIDv5. When stochasticity is enabled, real attributes may use \`Distribution.Gaussian(mean, sd)\` / \`Distribution.Uniform(min, max)\` / \`Distribution.Lognormal(mu, sigma)\` (never integer/boolean/uuid attributes), and chained \`.map(fn)\` on the same distribution shares one draw. When stochasticity is disabled, kernel outputs must use plain values only. Leave empty when no coloured outputs exist.
- Differential equation (\`differentialEquation.code\`): \`export default Dynamics((tokens, parameters) => …)\`. \`tokens\` is THIS place's tokens only. Return an array of the same length whose entries provide derivatives for real-valued elements only (i.e. dx/dt, not the new value); integer, boolean, and uuid elements are discrete and remain unchanged by dynamics. The equation's \`colorId\` MUST match every referencing place's \`colorId\`.
- Transition lambda (\`transition.lambdaCode\`): \`export default Lambda((input, parameters) => …)\`. Available when stochasticity is enabled OR when colours are enabled and the transition has at least one standard or read input arc from a coloured place. \`input.PlaceName\` is a tuple sized to the input arc weight for coloured standard and read input arcs; token attributes are typed by the colour element: real/integer β†’ number, boolean β†’ boolean, uuid β†’ bigint, string β†’ string (plain JS strings everywhere, compared by value). Read arcs expose tokens in \`input\` but do not consume them when the transition fires. Inhibitor arcs and uncoloured input places are NOT in \`input\`. Predicate β†’ boolean; stochastic β†’ non-negative rate in firings per simulation second (0 disables, Infinity always fires). Must be deterministic. If unavailable or empty, the runtime uses true for predicate-style transitions and Infinity for stochastic-style transitions.
- Transition kernel (\`transition.transitionKernelCode\`): \`export default TransitionKernel((input, parameters) => …)\`. Available only for transitions with coloured output places. Return \`{ OutputPlaceName: [token, …] }\` sized to the output arc weight. Include only coloured output places; uncoloured output places are auto-populated. Output values must match element types: real/integer use numbers, boolean uses booleans, string uses plain strings (REQUIRED in the token type; a missing/undefined value becomes the empty string \`""\`, and non-string values are stringified via \`String(value)\`). uuid attributes are OPTIONAL in output tokens: omit them to auto-generate a fresh UUID deterministically from the seeded simulation RNG, use \`Uuid.generate()\` for an explicit fresh UUID, \`Uuid.from(value)\` to derive one from any value, or forward an input token's uuid bigint unchanged; plain non-UUID values (numbers, arbitrary strings) are converted deterministically via UUIDv5. When stochasticity is enabled, real attributes may use \`Distribution.Gaussian(mean, sd)\` / \`Distribution.Uniform(min, max)\` / \`Distribution.Lognormal(mu, sigma)\` (never integer/boolean/uuid/string attributes), and chained \`.map(fn)\` on the same distribution shares one draw. When stochasticity is disabled, kernel outputs must use plain values only. Leave empty when no coloured outputs exist.
- Differential equation (\`differentialEquation.code\`): \`export default Dynamics((tokens, parameters) => …)\`. \`tokens\` is THIS place's tokens only. Return an array of the same length whose entries provide derivatives for real-valued elements only (i.e. dx/dt, not the new value); integer, boolean, uuid, and string elements are discrete and remain unchanged by dynamics (they can be read from input tokens but never written). The equation's \`colorId\` MUST match every referencing place's \`colorId\`.
- Place visualizer (\`place.visualizerCode\`): \`export default Visualization(({ tokens, parameters }) => <JSX/>)\`. Classic React runtime β€” do NOT import React, do NOT use \`<>…</>\` fragments, do NOT use hooks. Convention: return a sized \`<svg viewBox="0 0 W H">…</svg>\`.
- Metric (\`metric.code\`): a plain function body β€” NOT a module, no \`export default\`, no wrapper. The only variable in scope is \`state\`. Must \`return\` a finite number. Example: \`return state.places.Infected.count / (state.places.Susceptible.count + state.places.Infected.count + state.places.Recovered.count);\`. \`parameters\` and \`scenario\` are NOT available inside metrics.
- Scenario per_place initial state: \`content\` keys are place IDs; uncoloured values are expressions with \`parameters\` and \`scenario\` in scope; coloured values are row arrays in colour element order using numbers and booleans; uuid columns accept UUID strings (any other text converts deterministically to a UUID via UUIDv5).
- Scenario per_place initial state: \`content\` keys are place IDs; uncoloured values are expressions with \`parameters\` and \`scenario\` in scope; coloured values are row arrays in colour element order using numbers and booleans; string columns take literal text; uuid columns accept UUID strings (any other text converts deterministically to a UUID via UUIDv5).
- Scenario code-mode initial state: function body returning \`{ PlaceName: tokens }\` keyed by NAME (asymmetric with per_place IDs); unknown names are silently dropped.
- Parameter access in any code surface: use \`parameters.<variableName>\` where \`<variableName>\` is the parameter's lower_snake_case \`variableName\` value (e.g. \`parameters.crash_threshold\`, never \`parameters.crashThreshold\`).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -494,7 +494,7 @@ describe("parseClipboardPayload", () => {
name: "Token",
iconSlug: "circle",
displayColor: "#FF0000",
elements: [{ elementId: "e1", name: "val", type: "string" }],
elements: [{ elementId: "e1", name: "val", type: "complex" }],
},
],
differentialEquations: [],
Expand Down
24 changes: 19 additions & 5 deletions libs/@hashintel/petrinaut-core/src/default-codes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,23 @@ const defaultTokenAttributeSource = (
case "integer":
case "real":
return "0";
case "string":
return '""';
case "uuid":
return "Uuid.generate()";
}
};

export function generateDefaultVisualizerCode(type: Color): string {
return `// This function defines how to visualize the tokens in the place of type "${type.name}".
return `// This function defines how to visualize the tokens in the place of type "${
type.name
}".
// It receives the current tokens and parameters.
export default Visualization(({ tokens, parameters }) => {
return <svg viewBox="0 0 800 600">
{tokens.map(({ ${type.elements.map((el) => el.name).join(", ")} }, index) => (
{tokens.map(({ ${type.elements
.map((el) => el.name)
.join(", ")} }, index) => (
// Example: simple circle for each token
<circle />
))}
Expand Down Expand Up @@ -47,7 +53,9 @@ export function generateDefaultDifferentialEquationCode(type: Color): string {
}; // Example: all real-valued derivatives = 1`
: `{}; // This type has no real-valued attributes; discrete values are unchanged by dynamics`;

return `// This function defines the differential equation for the place of type "${type.name}".
return `// This function defines the differential equation for the place of type "${
type.name
}".
// The function receives the current tokens in this place and the parameters.
// It should return derivatives for real-valued token attributes in this place.
export default Dynamics((tokens, parameters) => {
Expand Down Expand Up @@ -90,7 +98,11 @@ export default Lambda((tokensByPlace, parameters) => {
// 2. Infinity means always enabled
// 3. Any other number is the average rate per second

${lambdaType === "predicate" ? "return true; // Always enabled (alternative: return Infinity;)" : "return 1.0; // Average firing rate of once per second"}
${
lambdaType === "predicate"
? "return true; // Always enabled (alternative: return Infinity;)"
: "return 1.0; // Average firing rate of once per second"
}
});`;

export function generateDefaultTransitionKernelCode(
Expand All @@ -117,7 +129,9 @@ export default TransitionKernel((tokensByPlace, parameters) => {
${Array.from({ length: arc.weight })
.map(
() =>
`{ ${arc.type.elements.map((el) => `${el.name}: ${defaultTokenAttributeSource(el)}`).join(", ")} }`,
`{ ${arc.type.elements
.map((el) => `${el.name}: ${defaultTokenAttributeSource(el)}`)
.join(", ")} }`,
)
.join(",\n ")}
],`,
Expand Down
Loading
Loading