Skip to content

feat: enhance meaning units and utility#156

Merged
asteier2026 merged 6 commits into
mainfrom
asteier2026/feature/quality-improvements
May 15, 2026
Merged

feat: enhance meaning units and utility#156
asteier2026 merged 6 commits into
mainfrom
asteier2026/feature/quality-improvements

Conversation

@asteier2026
Copy link
Copy Markdown
Contributor

Changes include:

  • Meaning unit prompt changes:
    -- Meaning units should preserve only the minimum semantic detail
    necessary to retain the text's functional utility.
    -- <segmentation_rules> updated (emphasizing one idea per unit)
    -- Each meaning unit now has an importance. We later use this to weight quality_qa failures. Eventually we may use it if we repair quality_qa. We may also use it if we implement rare semantic bundles.
  • Quality QA Compare: <grading_rules> updated (be more lenient on generalization and take importance into account)
  • Updates to do weighted utility and carry importance through

@asteier2026 asteier2026 requested a review from a team as a code owner May 12, 2026 22:43
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR enhances the meaning-unit pipeline with a two-level importance field (critical / important) that flows from extraction through QA generation into a weighted utility score calculation. The grading prompt is updated to score abstracted answers more leniently for "important" units and more strictly for "critical" ones.

  • Adds MeaningUnitImportance enum and attaches it as a required field on MeaningUnitSchema and QualityQAItemSchema; both updated prompts instruct the LLM to carry the value through.
  • Introduces importance-weighted mean in compute_utility_score (critical = 2×, important = 1×), with a safe flat-mean fallback; _compute_metrics_columns is updated to look up importances from COL_QUALITY_QA and COL_QUALITY_QA is now correctly listed as a required column in its decorator.
  • Tests cover the weighted, mismatched-length, and None fallback cases for compute_utility_score.

Confidence Score: 5/5

Safe to merge. The importance weighting flows correctly through extraction, QA generation, and scoring with a safe flat-mean fallback when data is missing.

All code paths for the new importance field are correctly wired: the enum is declared, carried through both LLM prompt stages, stored in the schemas, looked up by ID in _compute_metrics_columns, and passed correctly to compute_utility_score. The required_columns decorator was updated to include COL_QUALITY_QA, preventing silent missing-data bugs. Tests cover the weighted, mismatch, and None fallback cases. The only observation is a prompt-level ambiguity about how strictly to score critical units with abstracted answers, which does not affect code correctness.

No files require special attention.

Important Files Changed

Filename Overview
src/anonymizer/engine/rewrite/evaluate.py Adds importance-weighted utility scoring; correctly updates required_columns, ids lookup, and fallback logic.
src/anonymizer/engine/schemas/rewrite.py Adds MeaningUnitImportance enum and importance field to MeaningUnitSchema and QualityQAItemSchema; no backward-compat default was intentionally omitted per team decision.
src/anonymizer/engine/rewrite/qa_generation.py Prompt additions correctly define importance criteria and instruct the LLM to carry importance through to output items.
tests/engine/test_evaluate.py Good new tests covering weighted scoring, ordering, mismatch, and None fallback for compute_utility_score.
tests/engine/test_parsers.py Parser tests updated to include the new required importance field in fixture payloads.
tests/engine/test_qa_generation.py Stub meaning units updated to supply the required importance field on MeaningUnitSchema.
src/anonymizer/engine/schemas/init.py MeaningUnitImportance exported correctly from the schemas package.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["LLM: Meaning Unit Extraction\n(MeaningUnitsSchema)"] -->|"units with importance:\ncritical | important"| B["Serialize to JSON\n(_serialize_meaning_units)"]
    B --> C["LLM: QA Generation\n(QualityQAPairsSchema)"]
    C -->|"items with importance\ncarried through"| D["LLM: Quality Re-answer\n(QualityAnswersSchema)"]
    D --> E["LLM: Quality Compare\n(QACompareResultsSchema)\ngrading rules now importance-aware"]
    E -->|"ids + scores"| F["_compute_metrics_columns"]
    C -->|"importance lookup\nby item.id"| F
    F -->|"importance_levels list"| G["compute_utility_score\ncritical=2.0x, important=1.0x\nweighted mean"]
    G --> H["COL_UTILITY_SCORE"]
Loading

Reviews (6): Last reviewed commit: "Update tests/engine/test_evaluate.py" | Re-trigger Greptile

Comment thread src/anonymizer/engine/schemas/rewrite.py
Comment thread src/anonymizer/engine/schemas/rewrite.py
Comment thread src/anonymizer/engine/schemas/rewrite.py
@asteier2026
Copy link
Copy Markdown
Contributor Author

We will never have the legacy problem greptile is referring to. There are couple small changes needed by the sensitivity PR that are duplicated in this PR, so that both will merge smoothly: Removing reference to "sensitiive_attribute" and adding checks in validate_protection_consistency for combined_risk_level.

Comment thread src/anonymizer/engine/schemas/rewrite.py
Comment thread src/anonymizer/engine/rewrite/evaluate.py
@lipikaramaswamy
Copy link
Copy Markdown
Collaborator

I think this looks good, but to be sure, could you merge/rebase latest main into this PR? I did that locally and the diff becomes much clearer: the duplicated sensitivity-disposition changes are absorbed from main, and this PR is left with the intended quality/importance changes. But I wanna make sure that propagates correctly.

Also one small test ask: could we add direct coverage for the new weighted utility behavior in compute_utility_score? For example, [0.0, 1.0] with ["critical", "important"] should differ from the reversed ordering, and a mismatched importance list should fall back to the flat mean.

@asteier2026 asteier2026 requested a review from a team as a code owner May 15, 2026 20:43
@asteier2026 asteier2026 force-pushed the asteier2026/feature/quality-improvements branch from 8111523 to 99ed115 Compare May 15, 2026 20:51
Copy link
Copy Markdown
Collaborator

@lipikaramaswamy lipikaramaswamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The rebase makes the diff much cleaner: the sensitivity-disposition changes are coming from main, and this PR is focused on meaning-unit importance / weighted utility.

Thanks for adding the weighted utility tests.

Q/non-blocking: do we still want compute_utility_score to fall back to a flat mean when importance_levels is missing or length-mismatched? Since importance is now required and this is going into a new release (0.2.0), a mismatch could arguably fail loudly instead of silently using the old scoring behavior.

Comment thread tests/engine/test_evaluate.py Outdated
Co-authored-by: lipikaramaswamy <31832945+lipikaramaswamy@users.noreply.github.com>
@asteier2026 asteier2026 merged commit 4eefca9 into main May 15, 2026
11 checks passed
@asteier2026 asteier2026 deleted the asteier2026/feature/quality-improvements branch May 15, 2026 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants