Skip to content

Upgrade UMLS data from 2021 to latest release #56

@AlexMikhalev

Description

@AlexMikhalev

Summary

Current words_cui.tsv is from April 2021. Download fresh 2025/2026 UMLS release using new UMLS account credentials. Rebuild umls_automata.bin.zst artifact.

Details

  • Current automata has ~1.4M patterns from 2021 UMLS release
  • New release will have updated concepts, retired CUIs, and new terms
  • Validate pattern count and entity extraction quality against 18 evaluation cases
  • Ensure no regression in safety gate behavior (e.g., Pembrolizumab/EGFR blocking)

Acceptance Criteria

  • Download latest UMLS release (2025AA or 2026AA)
  • Rebuild umls_automata.bin.zst with updated data
  • All 18 evaluation cases still pass 3-gate harness
  • Document pattern count delta (old vs new)

Priority: P2

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions