bsesic · bsesic · May 28, 2026 · May 22, 2026 · May 22, 2026 · May 22, 2026
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # TRACE
 
-**Textual Reuse, Alignment, and Collation Engine** — a Python library for pairwise philological alignment with pluggable language packs.
+**Textual Reuse, Alignment, and Collation Engine** — a Python library for philological alignment with pluggable language packs. Pairwise (v0.1) and simultaneous multi-witness (v0.2) alignment.
 
 [![CI](https://github.com/bsesic/trace/actions/workflows/workflow.yml/badge.svg)](https://github.com/bsesic/trace/actions/workflows/workflow.yml)
 [![PyPI version](https://img.shields.io/pypi/v/tracealign.svg)](https://pypi.org/project/tracealign/)
@@ -17,20 +17,39 @@ TRACE is designed for textual criticism, manuscript witness comparison, and the
 
 - **Tokenizer pipeline** with editorial-marker awareness (`[reconstructed]`, `⟦deletion⟧`, `〈insertion〉`, `(expanded)`, lacunae).
 - **Tiered scoring** returning `(score, reason)` per token pair — `EXACT`, `NIQQUD_STRIPPED`, `PLENE_DEFECTIVE`, `ABBREVIATION`, `ORTHOGRAPHIC`, `INSERTION`, `OMISSION`, `NO_MATCH`.
-- **Semi-global Needleman–Wunsch** with affine gap penalties (Gotoh) and a **multi-token abbreviation lookahead** (`ר"י` ↔ `רבי ישמעאל`).
+- **Pairwise aligner** — semi-global Needleman–Wunsch with affine gap penalties (Gotoh) and a multi-token abbreviation lookahead (`ר"י` ↔ `רבי ישמעאל`).
+- **Multi-witness aligner** (v0.2) — N witnesses aligned simultaneously into a canonical variant graph (DAG) plus a derived aligned table view, via pairwise distances → UPGMA guide tree → POA-based progressive merge. Determinism is pinned by a permutation-invariance property test; correctness by a lossless-reconstruction property test.
 - **Hebrew language pack** with niqqud strip, plene/defective skeleton matching, gershayim/maqqef tokenizer hooks, and a seed lexicon of rabbinic abbreviations (extendable via `Lexica.merge()`).
-- **I/O** for plain text, JSON (round-trip), eScriptorium exports (with bbox + line metadata), and TEI XML (`<tei:w>` mode + flow-text fallback).
-- **Reproducible** — every `AlignmentResult` carries `trace_version` and `language_pack_version` in its params.
+- **I/O** for plain text, JSON (round-trip for both pairwise and multi-witness results), eScriptorium exports (with bbox + line metadata), and TEI XML (`<tei:w>` mode + flow-text fallback).
+- **Reproducible** — every `AlignmentResult` / `MultiAlignmentResult` carries `trace_version` and `language_pack_version` in its params.
 
 ## Installation
 
 ```bash
 pip install tracealign
 ```
 
-Requires Python 3.10+. Pulls `pydantic`, `numpy`, `lxml`, and `rapidfuzz`.
+Requires Python 3.10, 3.11, or 3.12. Pulls `pydantic`, `numpy`, `lxml`, and `rapidfuzz`.
 
-## Quick start
+### From source
+
+```bash
+git clone https://github.com/bsesic/trace.git
+cd trace
+pip install -e ".[dev]"
+```
+
+The `dev` extra adds `pytest` and `flake8` (the project's quality gates). For documentation contributions, use `pip install -e ".[docs]"` to add Sphinx, furo, and myst-parser.
+
+### Verifying the install
+
+```bash
+python -c "import tracealign; print(tracealign.__version__, tracealign.list_languages())"
+```
+
+Should print the current version and `['hbo']` (the Hebrew language pack registers itself on import).
+
+## Quick start — pairwise
 
 ```python
 import tracealign
@@ -62,31 +81,67 @@ summary: {EXACT: 3, NIQQUD_STRIPPED: 1, PLENE_DEFECTIVE: 1, ABBREVIATION: 1}
         אמר ↔ אמר          exact              1.00
 ```
 
-See **[the documentation](https://tracealign.readthedocs.io/en/latest/)** for installation details, the full API, FAQs, and the design rationale.
+## Quick start — multi-witness (v0.2)
+
+```python
+import tracealign
+
+witnesses = {
+    "W1": tracealign.tokenize("שלום עולם רַבִּי דויד אמר",  lang="hbo", seq_label="W1"),
+    "W2": tracealign.tokenize("שלום עולם רבי דוד אמר",       lang="hbo", seq_label="W2"),
+    "W3": tracealign.tokenize("שלום עולם ר\"י אמר",          lang="hbo", seq_label="W3"),
+    "W4": tracealign.tokenize("שלום עולם רבי דוד אמר טוב",   lang="hbo", seq_label="W4"),
+}
+
+result = tracealign.align_multi(witnesses, lang="hbo")
+
+print(result.guide_tree.format_text())
+print(result.table.format_text())
+
+for node in result.graph.variants():
+    readings = {wid: t.text for wid, t in node.tokens.items()}
+    print(node.id, readings)
+```
+
+The `MultiAlignmentResult` exposes a canonical `VariantGraph` (DAG with witness trails), a derived `AlignedTable` (re-anchorable to any witness for presentation), a `GuideTree` (UPGMA-built, carrying the original distance matrix — useful for downstream stemmatic work), and the same reproducibility-aware `params` snapshot the pairwise aligner produces.
+
+JSON persistence works the same way as the pairwise aligner, in its own module:
+
+```python
+from tracealign.io import multi_result as mr_io
+
+mr_io.dump(result, "alignment.json")
+restored = mr_io.load("alignment.json")
+```
+
+See **[the documentation](https://tracealign.readthedocs.io/en/latest/)** for the full API, more usage examples, the algorithm details, FAQs, and the design rationale.
 
 ## Documentation
 
 | Section | What it covers |
 |---|---|
-| [Installation](https://tracealign.readthedocs.io/en/latest/installation.html) | pip / from source / dev setup |
-| [Usage](https://tracealign.readthedocs.io/en/latest/usage.html) | Tokenize, align, work with the result, custom lexica |
-| [Details](https://tracealign.readthedocs.io/en/latest/details.html) | Tokenizer pipeline, scoring tiers, DP algorithm |
-| [FAQ](https://tracealign.readthedocs.io/en/latest/faq.html) | Common questions about scope, language packs, performance |
+| [Installation](https://tracealign.readthedocs.io/en/latest/installation.html) | pip / from source / dev setup / docs build |
+| [Usage](https://tracealign.readthedocs.io/en/latest/usage.html) | Tokenize, pairwise align, multi-witness align, work with the result, custom lexica, I/O |
+| [Details](https://tracealign.readthedocs.io/en/latest/details.html) | Tokenizer pipeline, scoring tiers, pairwise DP algorithm, multi-witness POA pipeline |
+| [FAQ](https://tracealign.readthedocs.io/en/latest/faq.html) | Common questions about scope, language packs, performance, multi-witness semantics |
 | [Contributing](https://tracealign.readthedocs.io/en/latest/contributing.html) | Development workflow, TDD discipline, branch model |
 
 ## Project status
 
 | | |
 |---|---|
-| Current release | 0.1.1 |
-| Roadmap | [docs/ROADMAP.md](docs/ROADMAP.md) |
-| Design spec | [docs/superpowers/specs/2026-04-28-trace-v0.1-design.md](docs/superpowers/specs/2026-04-28-trace-v0.1-design.md) |
-| Future sub-projects | Multi-witness master graph · Geniza anchor detection · Text-reuse · Critical edition / apparatus |
+| Current PyPI release | 0.1.3 (v0.2.0 in flight on `feature/v0.2-multi-witness`) |
+| Roadmap | [docs/ROADMAP.md](docs/ROADMAP.md) — ten-stage long-term vision |
+| v0.1 design spec | [docs/superpowers/specs/2026-04-28-trace-v0.1-design.md](docs/superpowers/specs/2026-04-28-trace-v0.1-design.md) |
+| v0.2 design spec | [docs/superpowers/specs/2026-05-21-trace-v0.2-multi-witness-design.md](docs/superpowers/specs/2026-05-21-trace-v0.2-multi-witness-design.md) |
+| Released stages | 1 (pairwise + Hebrew pack) |
+| In progress | 2 (master alignment graph / multi-witness) |
+| Future sub-projects | Geniza anchor detection · Text-reuse · Apparatus / critical edition · Cross-tradition Hexapla · Stemmatic reconstruction · Allusion detection · Citation graphs · Reception history |
 
-## License
+## Citation
 
-[MIT](LICENSE) © 2026 Benjamin Schnabel.
+If you use TRACE in academic work, please cite via the [Zenodo concept DOI](https://doi.org/10.5281/zenodo.20315408) (always resolves to the latest archived release) or pick a specific version DOI from the Zenodo record. A `CITATION.cff` is at the repo root — GitHub's "Cite this repository" button generates APA / BibTeX / RIS automatically from it.
 
-## Citation
+## License
 
-If you use TRACE in academic work, please cite the repository — a Zenodo DOI will follow with the first non-pre-release tag.
+[MIT](LICENSE) © 2026 Benjamin Schnabel.
diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
@@ -34,7 +34,7 @@ The full ambition spans ten stages, each its own brainstorm → spec → plan
 | # | Stage | Capability it unlocks | Status |
 |---|---|---|---|
 | 1 | **Pairwise aligner + Hebrew pack** | TRACE v0.1 — paarweise Alignment-Kernel | ✅ released 0.1.3 |
-| 2 | **Master alignment graph** | Simultaneous multi-witness alignment (Sifra full witness set, Tanhuma) | planned (v0.2) |
+| 2 | **Master alignment graph** | Simultaneous multi-witness alignment (Sifra full witness set, Tanhuma) | in progress (v0.2 feature/v0.2-multi-witness) |
 | 3 | **Geniza fragment anchor detection** | Matching small fragments against a large candidate pool (hundreds of Sifra Genizah fragments) | planned |
 | 4 | **Text-reuse detection** | Finding recurring phrases and verbatim citations across a corpus (biblical citations in rabbinic literature, recurring rabbinic formulae) | planned |
 | 5 | **Apparatus / critical-edition generation** | Producing publication-grade critical editions (lemmas, sigla, Fließtext) directly from alignment output | planned |

diff --git a/docs/details.md b/docs/details.md
@@ -198,3 +198,55 @@ src/tracealign/
     escriptorium.py      # eScriptorium JSON importer
     tei.py               # TEI XML importer
 ```
+
+## Multi-witness alignment (v0.2)
+
+`align_multi` extends the pairwise aligner to N witnesses. The pipeline is three-phase:
+
+### Phase 1 — Pairwise distances
+
+Every pair of witnesses is aligned with `tracealign.align()` (the v0.1 pairwise aligner) and the distance is computed as `1 − total_score`. The result is a symmetric `N × N` distance matrix; the diagonal is zero. Witness ids are sorted lexicographically before computing, making the matrix independent of dict insertion order.
+
+### Phase 2 — UPGMA guide tree
+
+A binary guide tree is built from the distance matrix using **UPGMA** (Unweighted Pair Group Method with Arithmetic Mean). At every iteration the closest cluster pair is merged. Ties are broken on the canonical `(min, max)` lexicographic order of cluster members, guaranteeing determinism. The tree's `height` field carries the cumulative UPGMA distance — a starting point for later stemmatic work.
+
+### Phase 3 — Progressive POA-based merge
+
+The guide tree is walked in post-order to produce a canonical merge sequence (closely-related witnesses are merged first). The first witness seeds the graph as a linear chain. Each subsequent witness is aligned to the current graph via **partial-order alignment (POA)** — a DP over the topologically sorted graph nodes. Three transitions:
+
+| Transition | Effect on graph |
+|---|---|
+| Match | Merge the new token into an existing node's `tokens[witness_id]`; extend the witness set on the incoming edge. |
+| Insertion in sequence (gap in graph) | Add a new node holding only this witness's token; new edge `prev → new`. |
+| Deletion (skip graph node) | The new witness's path bypasses this node — recorded by an edge that skips it. |
+
+`node_match_score` aggregates the per-constituent tiered score across the witnesses already in the target node. The default mode `"max"` is permissive (CollateX-aligned); `"mean"` and `"min"` are configurable.
+
+### Correctness guarantees
+
+Two properties are pinned by tests:
+
+- **Lossless reconstruction.** For every input witness `w`, the path through the result graph yields exactly the original token sequence.
+- **Permutation invariance.** The same set of witnesses in any input dict order produces the same alignment (same witness paths, same variant loci).
+
+### Data flow
+
+```
+align_multi(witnesses, lang, config)
+   │
+   ▼
+pairwise_distances        — Phase 1: O(N²/2) pairwise alignments
+   │
+   ▼
+build_upgma               — Phase 2: deterministic binary tree
+   │
+   ▼
+progressive_merge         — Phase 3: post-order POA-based merge
+   │
+   ▼
+VariantGraph
+   │
+   ├──► AlignedTable      — derived view, re-anchorable
+   └──► MultiAlignmentResult (graph + table + guide_tree + summary + params)
+```
diff --git a/docs/faq.md b/docs/faq.md
@@ -88,4 +88,43 @@ Not specced yet. Candidates from the v0.1 spec:
 - Per-project editorial-bracket preset bundles.
 - Performance pass (NumPy vectorization or Cython hot path).
 
-Plus the four long-term sub-projects: master alignment graph, Geniza anchor detection, text-reuse, apparatus generation.
+The master alignment graph (multi-witness alignment) shipped as v0.2 — see below. Future long-term stages: Geniza anchor detection, text-reuse, apparatus generation, cross-tradition Hexapla, stemmatic reconstruction, allusion detection, citation graphs, reception history.
+
+## How does multi-witness alignment differ from pairwise?
+
+`tracealign.align()` aligns exactly two witnesses. `tracealign.align_multi()` (v0.2) aligns N witnesses at once into a single canonical structure — a variant graph (DAG) where every witness has a trail through the graph, plus a derived aligned table view. Variant loci surface as nodes whose constituent witnesses disagree.
+
+For two witnesses the two paths give similar information; for three or more the multi-witness graph is much more useful than running every pair separately, because it gives one consistent set of variant positions rather than O(N²) overlapping pairwise alignments.
+
+## Is `align_multi` deterministic?
+
+Yes. The result is independent of the dict insertion order of the witnesses. Three sources of order-stability are pinned by tests:
+
+1. `pairwise_distances` sorts witness ids lexicographically before computing the matrix.
+2. UPGMA tie-breaking uses the canonical `(min, max)` lexicographic order of cluster members.
+3. The topological sort during sequence-vs-graph alignment is stable with respect to node id.
+
+A dedicated property test (`test_permutation_invariance`) re-runs `align_multi` with reordered inputs and asserts that witness paths and variant loci are identical.
+
+## How big can multi-witness alignments get?
+
+The v0.2 target is Sifra-scale: 5–15 witnesses, 1000–5000 tokens each. Larger witness sets (NT-scale, hundreds of witnesses) need anchor-based decomposition, which is a future stage. Geniza fragments specifically are handled in their own future stage (anchor detection against a large candidate pool), not by adding them all to one master graph.
+
+## Why UPGMA and not Neighbor-Joining for the guide tree?
+
+UPGMA is simpler and gives a binary tree with clear cumulative-distance heights — useful as a draft stemma input for the eventual stemmatic-reconstruction stage. UPGMA's "molecular clock" assumption is a known limitation in phylogenetics but is acceptable for ordering the merge sequence in v0.2. Neighbor-Joining is a future v0.x candidate when proper stemmatic reconstruction goes live.
+
+## Can I add a new witness to an existing alignment incrementally?
+
+Not in v0.2.0 — `align_multi` builds the entire graph in a single call. An incremental "add one witness" API is a v0.2.x candidate; it builds naturally on the existing `align_sequence_to_graph` primitive but requires API design (e.g. should the guide tree be re-balanced? should existing alignment relationships be allowed to change?). Open a discussion or issue if you need this.
+
+## How do I persist a multi-witness result?
+
+```python
+from tracealign.io import multi_result as mr_io
+
+mr_io.dump(result, "alignment.json")
+restored = mr_io.load("alignment.json")
+```
+
+`tracealign.io.multi_result` is a dedicated module separate from `tracealign.io.result` (the pairwise JSON I/O). The round-trip preserves the entire result, including the guide tree's distance matrix — important for later stages that reuse it.
diff --git a/docs/index.md b/docs/index.md
@@ -1,17 +1,18 @@
 # TRACE
 
-**Textual Reuse, Alignment, and Collation Engine** — a Python library for pairwise philological alignment with pluggable language packs.
+**Textual Reuse, Alignment, and Collation Engine** — a Python library for philological alignment with pluggable language packs. Pairwise (v0.1) and simultaneous multi-witness (v0.2) alignment.
 
 TRACE is built for textual criticism, manuscript witness comparison, and the creation of digital synopses and critical editions. The core is language-agnostic; the first shipped language pack covers Biblical and Rabbinic Hebrew (`hbo`).
 
 ## At a glance
 
 - **Tokenizer pipeline** with editorial-marker awareness (`[reconstructed]`, `⟦deletion⟧`, `〈insertion〉`, `(expanded)`, lacunae).
 - **Tiered scoring** that returns *(score, reason)* per token pair — `EXACT`, `NIQQUD_STRIPPED`, `PLENE_DEFECTIVE`, `ABBREVIATION`, `ORTHOGRAPHIC`, `INSERTION`, `OMISSION`, `NO_MATCH`.
-- **Semi-global Needleman–Wunsch** with affine gap penalties (Gotoh) and a multi-token abbreviation lookahead (`ר"י` ↔ `רבי ישמעאל`).
+- **Pairwise aligner** — semi-global Needleman–Wunsch with affine gap penalties (Gotoh) and a multi-token abbreviation lookahead (`ר"י` ↔ `רבי ישמעאל`).
+- **Multi-witness aligner** (v0.2) — N witnesses aligned simultaneously into a canonical variant graph plus a derived aligned table, via pairwise distances → UPGMA guide tree → POA-based progressive merge. Determinism and lossless reconstruction are pinned by property tests.
 - **Hebrew language pack** with niqqud strip, plene/defective skeleton matching, gershayim/maqqef tokenizer hooks, and a seed lexicon of rabbinic abbreviations (extendable via `Lexica.merge()`).
-- **I/O** for plain text, JSON (round-trip), eScriptorium exports, and TEI XML.
-- **Reproducible**: every `AlignmentResult` carries `trace_version` and `language_pack_version` in its params.
+- **I/O** for plain text, JSON (round-trip for both pairwise and multi-witness results), eScriptorium exports, and TEI XML.
+- **Reproducible**: every `AlignmentResult` / `MultiAlignmentResult` carries `trace_version` and `language_pack_version` in its params.
 
 ## Get going
 
@@ -28,7 +29,7 @@ contributing
 
 ## Project status
 
-TRACE is an early-stage research library. v0.1.x ships the pairwise aligner and the Hebrew pack; future sub-projects cover multi-witness master graphs, Geniza fragment anchor detection, text-reuse detection, and apparatus / critical-edition generation. See the [roadmap](https://github.com/bsesic/trace/blob/main/docs/ROADMAP.md) for the long-term plan.
+TRACE is an early-stage research library. v0.1.x ships the pairwise aligner and the Hebrew pack; v0.2 adds the multi-witness master alignment graph. Future stages cover Geniza fragment anchor detection, text-reuse detection, apparatus / critical-edition generation, cross-tradition Hexapla-style alignment, stemmatic reconstruction, allusion detection, citation graphs, and multi-millennial reception history. See the [roadmap](https://github.com/bsesic/trace/blob/main/docs/ROADMAP.md) for the long-term ten-stage plan.
 
 ## License