Summary
Add a cross-lingual alignment path to TRACE: an anchor-based alignment mode plus a
delegation hook for filling the spans between anchors with an external embedding aligner.
This is the central problem in the scholarly-editions use case (Judeo-Arabic ↔ Tibbonide
Hebrew, Arabic ↔ Latin) and is roadmap Stage 6 (cross-tradition / Hexapla-style), pulled
forward by a concrete adopter need.
Motivation
TRACE today is monolingual: all scoring tiers assume same-language token comparison. A
translation shares essentially no surface form with its source, so NW over the token stream
does not apply. The robust, defensible approach for this material is a hybrid:
- a high-precision anchor net — named entities, numerals, divine names, citation formulae,
and a calque-cognate lexicon bootstrapped from the text (the Tibbonides' mechanical calquing
makes this unusually tractable) — as the alignment skeleton;
- an embedding aligner (dicta / LaBSE / Bertalign-style) filling only the short monotonic
spans between anchors, where out-of-distribution embeddings still behave.
TRACE's reason-tagged, standoff variant-graph model is the natural home for the anchor skeleton;
the embedding fill should be an external adapter TRACE delegates to, not a built-in dependency.
Scope
- Anchor detection / alignment mode: a cross-lingual alignment entry point that aligns two
sequences in different languages by first matching anchors (pluggable anchor extractors:
numerals, NE lists, citation-formula patterns, a user-supplied bilingual lemma/calque
lexicon), producing fixed correspondence points expressed as matches with a new ANCHOR
reason tag.
- Between-anchor fill via delegation: define a clean interface (a callback/protocol) so the
caller supplies an embedding-aligner adapter for the spans between anchors. TRACE constrains
it to short monotonic windows fenced by anchors; TRACE does not vendor any embedding model.
- Output: anchored + filled alignment in the existing standoff/variant-graph representation,
every match carrying its reason (ANCHOR vs. delegated-fill) and the producing adapter id, so
the result stays auditable.
Acceptance criteria
Notes / risks
Roadmap: brings forward Stage 6 (cross-tradition alignment).
Summary
Add a cross-lingual alignment path to TRACE: an anchor-based alignment mode plus a
delegation hook for filling the spans between anchors with an external embedding aligner.
This is the central problem in the scholarly-editions use case (Judeo-Arabic ↔ Tibbonide
Hebrew, Arabic ↔ Latin) and is roadmap Stage 6 (cross-tradition / Hexapla-style), pulled
forward by a concrete adopter need.
Motivation
TRACE today is monolingual: all scoring tiers assume same-language token comparison. A
translation shares essentially no surface form with its source, so NW over the token stream
does not apply. The robust, defensible approach for this material is a hybrid:
and a calque-cognate lexicon bootstrapped from the text (the Tibbonides' mechanical calquing
makes this unusually tractable) — as the alignment skeleton;
spans between anchors, where out-of-distribution embeddings still behave.
TRACE's reason-tagged, standoff variant-graph model is the natural home for the anchor skeleton;
the embedding fill should be an external adapter TRACE delegates to, not a built-in dependency.
Scope
sequences in different languages by first matching anchors (pluggable anchor extractors:
numerals, NE lists, citation-formula patterns, a user-supplied bilingual lemma/calque
lexicon), producing fixed correspondence points expressed as matches with a new
ANCHORreason tag.
caller supplies an embedding-aligner adapter for the spans between anchors. TRACE constrains
it to short monotonic windows fenced by anchors; TRACE does not vendor any embedding model.
every match carrying its reason (
ANCHORvs. delegated-fill) and the producing adapter id, sothe result stays auditable.
Acceptance criteria
align_crosslingual(seq_a, seq_b, lang_a, lang_b, anchors=..., fill_adapter=...)).ANCHORreason tag added; provenance of fill matches recorded.flake8clean.Notes / risks
docs/superpowers/specs/andbrainstorm before implementation, consistent with prior stages.
quality are only meaningful measured together.
Roadmap: brings forward Stage 6 (cross-tradition alignment).