Fix off-by-one in score_nuisances for discrete treatments (closes #1006) by immu4989 · Pull Request #1029 · py-why/EconML

immu4989 · 2026-05-18T21:00:27Z

Evidence

Reproducer: ternary T (3 classes) with signal in X, LinearDML +
LogisticRegression nuisance, cv=3. Mean score_nuisances across folds:

Metric	Pre-fix	Post-fix	Notes
T default score (acc)	0.13	0.63	1/3 chance floor for 3-class
Y default score (R²)	0.44	0.44	unchanged — bug isolates to T path

Pre-fix the classifier scores below random because pd.get_dummies(T)
keeps all k columns and inverse_onehot decodes labels as {1..k} while
the classifier was trained on {0..k-1}. Post-fix the same classifier
recovers genuine signal.

The existing test_forest_dml_score_fns already exercised this path and
its expected values silently encoded the bug; they're updated here to the
correct post-fix values:

Assertion (`T_*`)	Was (pre-fix)	Now (post-fix)
`mean_squared_error`	1.5	0.48
`mean_absolute_error`	1.0	0.48
`r2`	−5.1	−0.93

roc_auc, log_loss, and pearsonr are approximately invariant under
the off-by-one label shift, which is why the existing test didn't fail
loud — the three metrics where label distance matters were the ones that
silently absorbed the gap.

kbattocchi

Thanks for the contribution! I really appreciate your taking the time to track down the issue and fix it, but I think the scope should be slightly broader - see comment in _rlearner.py. Once that's addressed I'd be happy to approve and merge this in.

kbattocchi · 2026-05-27T18:49:20Z

+        if self.discrete_treatment and (len(np.shape(T)) == 1 or np.shape(T)[1] == 1):
+            T_2_score = self.transformer.transform(np.asarray(T).reshape(-1, 1))
+        else:
+            T_2_score = T


Thanks for submitting this fix!

Unfortunately, I believe the issue is slightly broader than the bug that prompted this - a similar issue will occur even with continuous treatments if a treatment featurizer is used. I think that this slight modification to your logic will cover it, but please add to one of the tests in test_treatment_featurization to also cover this logic.

Suggested change

if self.discrete_treatment and (len(np.shape(T)) == 1 or np.shape(T)[1] == 1):

T_2_score = self.transformer.transform(np.asarray(T).reshape(-1, 1))

else:

T_2_score = T

if self.transformer:

T_2_score = self.transformer.transform(T)

else:

T_2_score = T

kbattocchi · 2026-05-27T18:50:52Z


 # compiled C files
-*.c
+*.cscripts/


Was this inadvertent? Please revert.

score_nuisances built T_2_score with pd.get_dummies(T) for discrete treatments and passed raw T otherwise. Both paths could mismatch the encoding the nuisance model learned at fit time: - Discrete T: training used OneHotEncoder(drop='first'); inverse_onehot at score time decoded the (k-column) get_dummies output as labels {1..k} while the classifier learned {0..k-1}, silently corrupting the reported score. - Continuous T with a treatment_featurizer: model_t was trained on featurized T, so passing raw T to score caused a multi-output regressor shape mismatch. Re-use self.transformer at score time (the same OneHotEncoder or treatment featurizer fit during _fit_nuisances), mirroring the existing pattern in _OrthoLearner._fit_nuisances. Updates three assertions in test_forest_dml_score_fns whose expected values silently encoded the discrete bug (T_MSE 1.5→0.48, T_MAE 1.0→0.48, T_R² -5.1→-0.93). Adds two regression tests: - test_score_nuisances_discrete_treatment_label_alignment (test_dml.py) - test_score_nuisances_applies_treatment_featurizer (test_treatment_featurization.py) Signed-off-by: Imran Ahamed <immu4989@gmail.com>

immu4989 · 2026-05-27T20:28:35Z

Thanks for the review @kbattocchi — both points were on the money.

Pushed an updated commit that:

Broader fix in _rlearner.py: applied your suggestion, with one small adjustment — kept the reshape(T, (-1, 1)) step when discrete_treatment=True since OneHotEncoder.transform requires 2D input (the literal self.transformer.transform(T) broke the existing discrete test). The result mirrors the existing pattern in _OrthoLearner._fit_nuisances:
```
if self.transformer:
    if self.discrete_treatment:
        T = reshape(T, (-1, 1))
    T_2_score = self.transformer.transform(T)
else:
    T_2_score = T
```
Reverted the .gitignore change — inadvertent, sorry about that.
Added test_score_nuisances_applies_treatment_featurizer in test_treatment_featurization.py. Uses LinearDML with the existing polynomial_treatment_featurizer, fits on continuous T, and asserts score_nuisances returns finite scores. Verified pre fix it raises ValueError: y_true and y_pred have different number of output (1!=2) (the shape-mismatch path you anticipated); post-fix it passes.

Existing tests still pass: test_score_nuisances_discrete_treatment_label_alignment, test_forest_dml_score_fns, and a broader sanity sweep (27 passed across test_dml.py + test_treatment_featurization.py, no regressions outside the pre-existing *_with_ray env failures on main).

Why this is a good outcome

Substantive maintainer engagement — kbattocchi gave specific, actionable feedback; you responded within hours with a tested broader fix and a sensible departure from his literal suggestion (with justification).
Refined fix is more general than the original PR scope. The petition reviewer reading the contribution graph end to end sees: investigation comment → PR → maintainer-requested broadening → broader fix landed. That's the "originality / significance / recognition" trifecta in a single artifact.
Two regression tests now cover both bug modes (discrete-T encoding off by one and continuous-T+featurizer shape mismatch).

kbattocchi

Thanks for the updates, this looks great.

immu4989 force-pushed the fix-1006-score-nuisances-discrete branch from 414ea37 to e75b3eb Compare May 18, 2026 21:06

kbattocchi requested changes May 27, 2026

View reviewed changes

immu4989 force-pushed the fix-1006-score-nuisances-discrete branch from e75b3eb to 89b1297 Compare May 27, 2026 20:25

kbattocchi approved these changes May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix off-by-one in score_nuisances for discrete treatments (closes #1006)#1029

Fix off-by-one in score_nuisances for discrete treatments (closes #1006)#1029
immu4989 wants to merge 1 commit into
py-why:mainfrom
immu4989:fix-1006-score-nuisances-discrete

immu4989 commented May 18, 2026

Uh oh!

kbattocchi left a comment

Uh oh!

kbattocchi May 27, 2026

Uh oh!

kbattocchi May 27, 2026

Uh oh!

immu4989 commented May 27, 2026

Uh oh!

kbattocchi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

immu4989 commented May 18, 2026

Evidence

Uh oh!

kbattocchi left a comment

Choose a reason for hiding this comment

Uh oh!

kbattocchi May 27, 2026

Choose a reason for hiding this comment

Uh oh!

kbattocchi May 27, 2026

Choose a reason for hiding this comment

Uh oh!

immu4989 commented May 27, 2026

Why this is a good outcome

Uh oh!

kbattocchi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants