Fix stale calibration targets by deriving time_period from dataset#505
Fix stale calibration targets by deriving time_period from dataset#505
Conversation
- Remove hardcoded CBO_YEAR and TREASURY_YEAR constants - Add --dataset CLI argument to etl_national_targets.py - Derive time_period from sim.default_calculation_period - Default to HuggingFace production dataset The dataset itself is now the single source of truth for the calibration year, preventing future drift when updating to new base years. Closes #503 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The CBO income_tax parameter represents positive-only receipts (refundable credit payments in excess of liability are classified as outlays, not negative receipts). Using income_tax_positive matches this definition. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ee54587 to
69406d6
Compare
All ETL scripts now derive their target year from the dataset's default_calculation_period instead of hardcoding years. This ensures all calibration targets stay synchronized when updating to a new base year annually. Updated scripts: - create_initial_strata.py - etl_age.py - etl_irs_soi.py (with configurable --lag for IRS data delay) - etl_medicaid.py - etl_snap.py - etl_state_income_tax.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update parse_ucgid to recognize both 5001800US (118th) and 5001900US (119th Congress) - Expand Puerto Rico and territory filters to handle both Congress code formats - Update TERRITORY_UCGIDS and NON_VOTING_GEO_IDS with 119th Congress codes This ensures consistent redistricting alignment: 2024 ACS data uses 119th Congress codes natively, and IRS SOI data is converted via the 116th→119th mapping matrix. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Revert deterministic hash-based medicaid/SSI seed logic in cps.py, update Makefile seed to 3526. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Needed for income_tax_positive variable used in loss.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@MaxGhenis we're doing pretty well on the new income tax target from CBO The SNAP CBO target looks equally good. We're roughly 25% off on social security, ssi, and eitc, which is not great obviously. I still would highly recommend pushing this through and we can adjust from here. We're going to be in 2024 finally for local areas and mapped to the 119th congress. |
PR Review🔴 Critical (Must Fix)
🟡 Should Address
🟢 Suggestions
Validation Summary
Recommendation: COMMENTThe core fix (deriving CBO/Treasury year from the dataset) is sound and addresses the 18% income tax gap described in #503. The |

Summary
CBO_YEARandTREASURY_YEARconstants frometl_national_targets.py--datasetCLI argument to specify the source datasettime_periodfromsim.default_calculation_period- the dataset itself is now the single source of truthRoot Cause
The ETL had hardcoded year constants:
But the calibration runs at
time_period=2024. This caused an 18% gap for income tax alone ($2,051B vs $2,426B).The Fix
Instead of hardcoding years, we now derive the time period from the dataset:
This ensures CBO/Treasury targets always match the dataset's year, preventing future drift when updating to new base years annually.
Usage
Test plan
make databaseto regenerate policy_data.dbincome_taxtarget is ~$2,426B (not $2,051B)Closes #503
🤖 Generated with Claude Code