Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .agent_rules/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,4 @@ uv pip install -e . # Install deps

---

**Pipeline Version**: 1.9.0 | **Steps**: 25 | **Tests**: latest recorded full suite with Ollama integration excludes: 2,381 passed, 17 skipped, 1 xfailed; collect-only inventory is 2,399 tests | **MCP Tools**: verify with `src/tests/mcp/test_mcp_audit.py`
**Pipeline Version**: 2.0.0 | **Steps**: 25 | **Tests**: latest recorded full suite with Ollama integration excludes: 2,393 passed, 17 skipped, 1 xfailed; collect-only inventory is 2,411 tests | **MCP Tools**: verify with `src/tests/mcp/test_mcp_audit.py`
4 changes: 2 additions & 2 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
This guide details the architecture of the Generalized Notation Notation (GNN) system. It complements `DOCS.md` and `doc/pipeline/README.md` with an implementation-oriented perspective for developers.

**Last Updated**: 2026-06-12
**Version**: 1.9.0
**Version**: 2.0.0
**Status**: Maintained
**Pipeline Steps**: 25 (0-24)

Expand Down Expand Up @@ -323,7 +323,7 @@ Each agent implements comprehensive performance monitoring:

---

**Architecture Version**: 1.9.0
**Architecture Version**: 2.0.0
**Last Updated**: 2026-06-12
**Status**: ✅ Production Ready
**Compliance**: Thin orchestrator pattern
Expand Down
21 changes: 20 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,24 @@ No unreleased changes yet.

---

## [2.0.0] — 2026-06-12

### Added
- **Semantic fidelity release gate**: `scripts/run_semantic_fidelity_gate.py` writes `gnn_semantic_fidelity_ledger_v1` artifacts for maintained model families.
- **Strict semantic contracts**: representative fixtures now preserve model identity, variables, edges, dimensions, parameter shapes, equations, time, and ontology mappings across JSON parse/serialize/parse checks.
- **Cross-framework reliability release gate**: `scripts/run_cross_framework_reliability.py` writes `gnn_cross_framework_reliability_ledger_v1` artifacts with compatible, required, and unsupported backend statuses.
- **GridWorld three-backend comparison**: GridWorld is profiled for PyMDP, RxInfer, and ActiveInference.jl, including seed, trace length, matrix-shape, and matrix-provenance parity.

### Changed
- GridWorld model-family acceptance now requests PyMDP, RxInfer, and ActiveInference.jl for the v2 comparison fixture instead of a PyMDP-only profile.
- Roadmap next target moves to v3.0.0 for durable streams, long-running sessions, and auditable container plans.

### Fixed
- JSON serialization now emits equation objects instead of lossy stringified dataclasses, preventing silent semantic round-trip drift.
- Cross-framework reliability no longer certifies aggregate Step 12 success without successful non-skipped execution-detail rows and current simulation payloads for required backends.

---

## [1.9.0] — 2026-06-12

### Added
Expand Down Expand Up @@ -149,7 +167,8 @@ No unreleased changes yet.
- pytest test suite with comprehensive coverage
- MCP tool registration framework

[Unreleased]: https://github.com/ActiveInferenceInstitute/GeneralizedNotationNotation/compare/v1.9.0...HEAD
[Unreleased]: https://github.com/ActiveInferenceInstitute/GeneralizedNotationNotation/compare/v2.0.0...HEAD
[2.0.0]: https://github.com/ActiveInferenceInstitute/GeneralizedNotationNotation/compare/v1.9.0...v2.0.0
[1.9.0]: https://github.com/ActiveInferenceInstitute/GeneralizedNotationNotation/compare/v1.8.0...v1.9.0
[1.8.0]: https://github.com/ActiveInferenceInstitute/GeneralizedNotationNotation/compare/v1.6.0...v1.8.0
[1.6.0]: https://github.com/ActiveInferenceInstitute/GeneralizedNotationNotation/compare/v1.3.0...v1.6.0
Expand Down
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ authors:
# This entry acknowledges all contributors. Individual contributors can be listed above if desired.

title: "GeneralizedNotationNotation (GNN)"
version: 1.9.0 # Current stable release
version: 2.0.0 # Current stable release
date-released: 2026-06-12

abstract: |
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,11 @@

**Smékal, J., & Friedman, D. A. (2023)**. *Generalized Notation Notation for Active Inference Models*. Active Inference Journal.
**Last Updated**: 2026-06-12
**Version**: 1.9.0
**Version**: 2.0.0
**Status**: ✅ Production Ready (Active Inference Institute)
**Test Suite Inventory (measured 2026-06-12)**: 184 `test_*.py` files under `src/tests/`; `uv run --extra dev python -m pytest --collect-only src/tests/ -q --tb=no --ignore=src/tests/llm/test_llm_ollama.py --ignore=src/tests/llm/test_llm_ollama_integration.py` collected 2,399 tests. Latest recorded full suite evidence with the same Ollama integration excludes is 2,381 passed, 17 skipped, 1 xfailed.
**Features (v1.9.0)**: model-family acceptance and interpretability ledgers for basics, discrete, continuous, hierarchical, multi-agent, precision, structured, gridworld, and scaling-study fixtures; explicit profiled unsupported Step 11/12 skips for continuous/hierarchical families; maintained template CLI (`gnn templates list`, `gnn templates show`, `gnn pull`); packaged template assets with checksum/collision handling; authenticated local MCP HTTP orchestration; pre-commit/devcontainer tooling; structured PyMDP 1.0 POMDP execution; PyMDP Scaling Study; and MCP Full Module Exposure.
**Next Target**: v2.0.0 semantic fidelity and cross-framework reliability hardening.
**Test Suite Inventory (measured 2026-06-12)**: 186 `test_*.py` files under `src/tests/`; `uv run --extra dev python -m pytest --collect-only src/tests/ -q --tb=no --ignore=src/tests/llm/test_llm_ollama.py --ignore=src/tests/llm/test_llm_ollama_integration.py` collected 2,411 tests. Latest recorded full suite evidence with the same Ollama integration excludes is 2,393 passed, 17 skipped, 1 xfailed.
**Features (v2.0.0)**: semantic fidelity ledgers across all maintained model families, strict JSON parse/serialize/parse preservation for variables, edges, dimensions, parameter shapes, equations, time, and ontology mappings; cross-framework reliability ledgers with explicit compatible/unsupported backend statuses; GridWorld comparison across PyMDP, RxInfer, and ActiveInference.jl; model-family acceptance and interpretability ledgers; maintained template CLI (`gnn templates list`, `gnn templates show`, `gnn pull`); authenticated local MCP HTTP orchestration; structured PyMDP 1.0 POMDP execution; PyMDP Scaling Study; and MCP Full Module Exposure.
**Next Target**: v3.0.0 long-running orchestration, durable observation streams, and auditable container plans.
📖 **DOI:** [10.5281/zenodo.7803328](https://doi.org/10.5281/zenodo.7803328)
📁 **Archive:** [zenodo.org/records/7803328](https://zenodo.org/records/7803328)

Expand Down
34 changes: 17 additions & 17 deletions TO-DO.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,11 @@
# TO-DO — GNN Pipeline Roadmap

**Last Updated**: 2026-06-12
**Current Version**: 1.9.0
**Next Target**: v2.0.0 (semantic fidelity and cross-framework reliability)

**Current Evidence (2026-06-12)**: v1.9.0 focused family/report suite
`17 passed`; command-of-record collect-only inventory is `2399` collected tests
across 184 `test_*.py` files with the documented Ollama integration ignores.
Latest full local suite evidence with the same Ollama ignores is
`2381 passed, 17 skipped, 1 xfailed`. The all-family strict acceptance passed
for 9 families; continuous/hierarchical Step 11/12 recorded as profiled
unsupported skips with `0` raw failed Step 11/12 counts. v1.8.0 focused
release smokes passed for `gnn templates list`, `gnn templates show
pomdp-gridworld-3x3`, dry-run `gnn pull` to `/tmp/gnn-pull`, and authenticated
MCP HTTP tests (`12 passed`; combined CLI/MCP/capability suite `32 passed`);
`just lint` passes.
**Current Version**: 2.0.0
**Next Target**: v3.0.0 (long-running orchestration, durable streams, and auditable container plans)

**Current Evidence (2026-06-12)**: v2.0.0 semantic fidelity gate passed for 9 families (`gnn_semantic_fidelity_ledger_v1`); cross-framework reliability gate passed for 9 families (`gnn_cross_framework_reliability_ledger_v1`) with GridWorld compared PyMDP, RxInfer, and ActiveInference.jl and all other unprofiled backends recorded with explicit unsupported statuses. Command-of-record collect-only inventory is `2411` collected tests across 186 `test_*.py` files with the documented Ollama integration ignores. Latest full local suite evidence with the same Ollama ignores is `2393 passed, 17 skipped, 1 xfailed`. v1.9 all-family strict acceptance remains green for 9 families; continuous and hierarchical Step 11/12 remain profiled unsupported skips with `0` raw failed Step 11/12 counts. v1.8.0 focused release smokes passed for `gnn templates list`, `gnn templates show pomdp-gridworld-3x3`, dry-run `gnn pull` to `/tmp/gnn-pull`, and authenticated
MCP HTTP tests (`12 passed`; combined CLI/MCP/capability suite `32 passed`); `just lint` passes.

---

Expand Down Expand Up @@ -111,12 +102,21 @@ uv run --extra dev python src/main.py --target-dir input/gnn_files/discrete --ou

---

## 🧪 v2.0.0 — Semantic Fidelity & Cross-Framework Reliability
## v2.0.0 — Semantic Fidelity & Cross-Framework Reliability (Released)

> **Scope**: Upgrade GNN from broad fixture acceptance to stronger semantic preservation, cross-format round trips, and cross-framework equivalence checks.
> **Released**: 2026-06-12 (tag: `v2.0.0`)

- [x] **Semantic Round-Trip Gates** — Require representative model families to preserve variables, edges, dimensions, parameter shapes, equations, time, and ontology mappings across the maintained strict JSON interchange path. `scripts/run_semantic_fidelity_gate.py` passed for all 9 manifest families and wrote `gnn_semantic_fidelity_ledger_v1` artifacts.
- [x] **Cross-Framework Result Comparisons** — Compare compatible backend outputs through `scripts/run_cross_framework_reliability.py`; required backends need Step 11/12 evidence, successful non-skipped Step 12 execution detail rows, current simulation payloads, matching seeds when present, trace lengths, and matrix-shape/provenance parity. The all-family gate passed for 9 families; GridWorld compared PyMDP, RxInfer, and ActiveInference.jl, while JAX, NumPyro, PyTorch, and DisCoPy remain explicit unsupported statuses unless profiled for a compatible family.

- [ ] **Semantic Round-Trip Gates** — Require representative model families to preserve variables, edges, dimensions, and key matrix contracts across maintained formats.
- [ ] **Cross-Framework Result Comparisons** — Compare compatible PyMDP, RxInfer, JAX, NumPyro, PyTorch, ActiveInference.jl, and DisCoPy outputs with explicit skipped/failed states for unavailable frameworks.
### Acceptance
```bash
uv run --extra dev python -m pytest src/tests/pipeline/test_semantic_fidelity_gate.py src/tests/pipeline/test_cross_framework_reliability.py -q
uv run --extra dev python scripts/run_semantic_fidelity_gate.py --manifest input/model_family_manifest.json --output-dir /tmp/gnn-semantic-fidelity --strict
uv run --extra dev python scripts/run_cross_framework_reliability.py --manifest input/model_family_manifest.json --output-dir /tmp/gnn-cross-framework --strict
uv run --extra dev python scripts/run_model_family_acceptance.py --manifest input/model_family_manifest.json --output-dir /tmp/gnn-family-acceptance-all --strict
```

---

Expand Down
2 changes: 1 addition & 1 deletion input/model_family_manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
"name": "gridworld",
"description": "Gridworld POMDP fixture used for cross-framework acceptance checks.",
"target_dir": "input/gnn_files/pomdp_gridworld",
"frameworks": "pymdp",
"frameworks": "pymdp,rxinfer,activeinference_jl",
"representative_files": ["pomdp_gridworld_3x3.md"]
},
{
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "generalized-notation-notation"
version = "1.9.0"
version = "2.0.0"
description = "A text-based language for standardizing Active Inference generative models"
readme = "README.md"
requires-python = ">=3.11,<3.14"
Expand Down
38 changes: 38 additions & 0 deletions scripts/check_capability_contracts.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,11 @@ def run_audit() -> List[str]:
and "**Next Target**: v2.0.0" not in todo_text
):
failures.append("TO-DO.md: v1.9.0 release must set v2.0.0 as next target")
if (
"**Current Version**: 2.0.0" in todo_text
and "**Next Target**: v3.0.0" not in todo_text
):
failures.append("TO-DO.md: v2.0.0 release must set v3.0.0 as next target")

readme_tests = _read("src/tests/README.md")
maintained_dirs, direct_test_dirs = _maintained_test_directory_counts()
Expand Down Expand Up @@ -190,6 +195,19 @@ def run_audit() -> List[str]:
"collect-only inventory",
"full suite evidence",
),
"Semantic Round-Trip Gates": (
"semantic fidelity gate passed for 9 families",
"gnn_semantic_fidelity_ledger_v1",
"variables, edges, dimensions, parameter shapes, equations, time, and ontology mappings",
"scripts/run_semantic_fidelity_gate.py",
),
"Cross-Framework Result Comparisons": (
"cross-framework reliability gate passed for 9 families",
"gnn_cross_framework_reliability_ledger_v1",
"GridWorld compared PyMDP, RxInfer, and ActiveInference.jl",
"explicit unsupported statuses",
"scripts/run_cross_framework_reliability.py",
),
}
for item in guarded_pending_items:
if f"- [x] **{item}**" in todo_text:
Expand Down Expand Up @@ -245,6 +263,26 @@ def run_audit() -> List[str]:
if not _exists(required):
failures.append(f"v1.9 model-family contract missing: {required}")

for required in (
"scripts/run_semantic_fidelity_gate.py",
"scripts/run_cross_framework_reliability.py",
"src/pipeline/semantic_fidelity.py",
"src/pipeline/cross_framework_reliability.py",
"src/report/semantic_fidelity.py",
"src/report/cross_framework_reliability.py",
"src/tests/pipeline/test_semantic_fidelity_gate.py",
"src/tests/pipeline/test_cross_framework_reliability.py",
):
if not _exists(required):
failures.append(f"v2.0 reliability contract missing: {required}")

if "pymdp,rxinfer,activeinference_jl" not in _read(
"input/model_family_manifest.json"
):
failures.append(
"input/model_family_manifest.json: GridWorld must profile a real multi-backend comparison"
)

if "WebSocket" in todo_text:
if not _contains(
"src/gui/websocket_bridge.py",
Expand Down
66 changes: 66 additions & 0 deletions scripts/run_cross_framework_reliability.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#!/usr/bin/env python3
"""Run profiled cross-framework reliability checks for maintained families."""

from __future__ import annotations

import argparse
import sys
from pathlib import Path

REPO_ROOT = Path(__file__).resolve().parents[1]
SRC_DIR = REPO_ROOT / "src"
if str(SRC_DIR) not in sys.path:
sys.path.insert(0, str(SRC_DIR))

from pipeline.cross_framework_reliability import (
MAINTAINED_FRAMEWORKS,
run_cross_framework_reliability,
)


def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--manifest",
type=Path,
default=Path("input/model_family_manifest.json"),
help="Path to the model-family manifest",
)
parser.add_argument(
"--families",
default="",
help="Comma-separated family names to run; defaults to all families",
)
parser.add_argument(
"--frameworks",
default=",".join(MAINTAINED_FRAMEWORKS),
help="Comma-separated maintained frameworks to profile",
)
parser.add_argument(
"--output-dir",
type=Path,
required=True,
help="Directory for reliability artifacts",
)
parser.add_argument("--strict", action="store_true", help="Fail on mismatch")
args = parser.parse_args(argv)

families = [item.strip() for item in args.families.split(",") if item.strip()]
frameworks = [item.strip() for item in args.frameworks.split(",") if item.strip()]
try:
ledger = run_cross_framework_reliability(
args.manifest,
args.output_dir,
family_names=families,
frameworks=frameworks,
strict=args.strict,
)
except (FileNotFoundError, KeyError, RuntimeError, ValueError) as exc:
print(f"FAIL: {exc}", file=sys.stderr)
return 1
print(f"Cross-framework reliability {ledger['status']}: {args.output_dir}")
return 0 if ledger["status"] == "passed" else 1


if __name__ == "__main__":
raise SystemExit(main())
63 changes: 63 additions & 0 deletions scripts/run_semantic_fidelity_gate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/usr/bin/env python3
"""Run strict semantic fidelity checks for maintained model families."""

from __future__ import annotations

import argparse
import sys
from pathlib import Path

REPO_ROOT = Path(__file__).resolve().parents[1]
SRC_DIR = REPO_ROOT / "src"
if str(SRC_DIR) not in sys.path:
sys.path.insert(0, str(SRC_DIR))

from pipeline.semantic_fidelity import run_semantic_fidelity_gate


def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--manifest",
type=Path,
default=Path("input/model_family_manifest.json"),
help="Path to the model-family manifest",
)
parser.add_argument(
"--families",
default="",
help="Comma-separated family names to run; defaults to all families",
)
parser.add_argument(
"--formats",
default="json",
help="Comma-separated serializer/parser formats to check",
)
parser.add_argument(
"--output-dir",
type=Path,
required=True,
help="Directory for semantic fidelity artifacts",
)
parser.add_argument("--strict", action="store_true", help="Fail on mismatch")
args = parser.parse_args(argv)

families = [item.strip() for item in args.families.split(",") if item.strip()]
formats = [item.strip() for item in args.formats.split(",") if item.strip()]
try:
ledger = run_semantic_fidelity_gate(
args.manifest,
args.output_dir,
family_names=families,
formats=formats,
strict=args.strict,
)
except (FileNotFoundError, KeyError, RuntimeError, ValueError) as exc:
print(f"FAIL: {exc}", file=sys.stderr)
return 1
print(f"Semantic fidelity {ledger['status']}: {args.output_dir}")
return 0 if ledger["status"] == "passed" else 1


if __name__ == "__main__":
raise SystemExit(main())
6 changes: 3 additions & 3 deletions src/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,9 +190,9 @@ graph TD
--ignore=src/tests/llm/test_llm_ollama_integration.py`. Re-include the two Ollama files
when `ollama` is installed and reachable.
- **Current test inventory (2026-06-12)**: 184 `test_*.py` files under `src/tests/`;
the command-of-record collect pass with Ollama integration tests ignored collected 2,399 tests.
the command-of-record collect pass with Ollama integration tests ignored collected 2,411 tests.
Latest recorded full suite evidence with the same Ollama integration excludes is
2,381 passed, 17 skipped, 1 xfailed.
2,393 passed, 17 skipped, 1 xfailed.
- All 25 orchestrator scripts comply with the <150 line thin orchestrator pattern.
- Maintained source/test documentation coverage is enforced by `doc/development/docs_audit.py --strict`.

Expand Down Expand Up @@ -342,6 +342,6 @@ pytest --cov=src --cov-report=term-missing
---

**Last Updated**: 2026-06-12
**Pipeline Version**: 1.9.0
**Pipeline Version**: 2.0.0
**Total Steps**: 25 (0-24)
**Status**: Maintained
7 changes: 6 additions & 1 deletion src/gnn/parsers/json_serializer.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,12 @@ def serialize(self, model: GNNInternalRepresentation) -> str:
for param in model.parameters
],
"equations": [
str(eq)
{
"label": getattr(eq, "label", None),
"content": getattr(eq, "content", ""),
"format": getattr(eq, "format", "latex"),
"description": getattr(eq, "description", ""),
}
for eq in (model.equations if hasattr(model, "equations") else [])
],
"time_specification": self._serialize_time_spec(model.time_specification)
Expand Down
Loading
Loading