Cross-lingual alignment path: anchor-based mode + embedding-fill delegation hook

## Summary

Add a cross-lingual alignment path to TRACE: an **anchor-based alignment mode** plus a
**delegation hook** for filling the spans between anchors with an external embedding aligner.
This is the central problem in the scholarly-editions use case (Judeo-Arabic ↔ Tibbonide
Hebrew, Arabic ↔ Latin) and is roadmap **Stage 6 (cross-tradition / Hexapla-style)**, pulled
forward by a concrete adopter need.

## Motivation

TRACE today is monolingual: all scoring tiers assume same-language token comparison. A
translation shares essentially no surface form with its source, so NW over the token stream
does not apply. The robust, defensible approach for this material is a **hybrid**:

1. a high-precision **anchor net** — named entities, numerals, divine names, citation formulae,
   and a calque-cognate lexicon bootstrapped from the text (the Tibbonides' mechanical calquing
   makes this unusually tractable) — as the alignment skeleton;
2. an **embedding aligner** (dicta / LaBSE / Bertalign-style) filling only the short monotonic
   spans *between* anchors, where out-of-distribution embeddings still behave.

TRACE's reason-tagged, standoff variant-graph model is the natural home for the anchor skeleton;
the embedding fill should be an external adapter TRACE delegates to, not a built-in dependency.

## Scope

- **Anchor detection / alignment mode:** a cross-lingual alignment entry point that aligns two
  sequences in *different* languages by first matching anchors (pluggable anchor extractors:
  numerals, NE lists, citation-formula patterns, a user-supplied bilingual lemma/calque
  lexicon), producing fixed correspondence points expressed as matches with a new `ANCHOR`
  reason tag.
- **Between-anchor fill via delegation:** define a clean interface (a callback/protocol) so the
  caller supplies an embedding-aligner adapter for the spans between anchors. TRACE constrains
  it to short monotonic windows fenced by anchors; TRACE does **not** vendor any embedding model.
- **Output:** anchored + filled alignment in the existing standoff/variant-graph representation,
  every match carrying its reason (`ANCHOR` vs. delegated-fill) and the producing adapter id, so
  the result stays auditable.

## Acceptance criteria

- [ ] A documented cross-lingual API (e.g. `align_crosslingual(seq_a, seq_b, lang_a, lang_b, anchors=..., fill_adapter=...)`).
- [ ] Anchor extractors for numerals and a user-supplied bilingual lexicon, with tests on a small synthetic Arabic↔Hebrew fixture.
- [ ] A no-op / pluggable fill adapter interface with a reference stub (so the path is testable without an embedding model dependency).
- [ ] `ANCHOR` reason tag added; provenance of fill matches recorded.
- [ ] No new heavy/runtime ML dependency added to the core package (embedding adapters live behind the interface, optional extras at most).
- [ ] TDD; suite green on 3.10/3.11/3.12; `flake8` clean.

## Notes / risks

- This is a design-heavy item — write a short spec under `docs/superpowers/specs/` and
  brainstorm before implementation, consistent with prior stages.
- Validation must be against a hand-aligned gold sample; segmentation (#18) and alignment
  quality are only meaningful measured together.

Roadmap: brings forward **Stage 6 (cross-tradition alignment)**.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-lingual alignment path: anchor-based mode + embedding-fill delegation hook #17

Summary

Motivation

Scope

Acceptance criteria

Notes / risks

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cross-lingual alignment path: anchor-based mode + embedding-fill delegation hook #17

Description

Summary

Motivation

Scope

Acceptance criteria

Notes / risks

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions