Skip to content

fix: Thread n_jobs to first-stage 'auto' selector (closes #1009)#1030

Open
immu4989 wants to merge 1 commit into
py-why:mainfrom
immu4989:fix-1009-n_jobs-first-stage
Open

fix: Thread n_jobs to first-stage 'auto' selector (closes #1009)#1030
immu4989 wants to merge 1 commit into
py-why:mainfrom
immu4989:fix-1009-n_jobs-first-stage

Conversation

@immu4989

Copy link
Copy Markdown
Contributor

Closes #1009.

Investigation comment with the full diagnosis and selector-tree evidence:
#1009 (comment)

What changed

  • get_selector / _make_first_stage_selector gain an n_jobs kwarg
    and thread it through every estimator that supports it.
  • DML / NonParamDML use getattr(self, 'n_jobs', None) so the fix
    applies to SparseLinearDML without touching classes that don't
    expose n_jobs (LinearDML, KernelDML).
  • CausalForestDML passes self.n_jobs directly.
  • Drive-by: gate the existing n_jobs copy in _to_logisticRegression
    on sklearn < 1.8. sklearn 1.8 deprecated n_jobs on
    LogisticRegression; the copy triggered a FutureWarning once
    n_jobs actually started reaching LogisticRegressionCV.

Out of scope

  • LinearDML and KernelDML don't currently expose n_jobs. Adding it
    would be a separate API decision; flagged for follow up.
  • DRLearner family: the duplicate _make_first_stage_selector in
    econml/dr/_drlearner.py gains the kwarg for API consistency, but no
    DR call site exercises it yet. Follow-up if/when those classes grow
    an n_jobs parameter.

Verification

New regression test test_n_jobs_propagates_to_first_stage_auto_selector:

  • Pre fix: fails (SparseLinearDML -> GridSearchCV.n_jobs = None, expected 3)
  • Post fix: passes
  • Exercises SparseLinearDML (continuous + discrete) and CausalForestDML

SparseLinearDML and CausalForestDML accept n_jobs but only threaded it
into the second-stage model (Lasso / causal forest). The 'auto' first
stage built by _make_first_stage_selector → get_selector(['forest',
'linear'], ...) constructed RandomForest, GridSearchCV,
LogisticRegressionCV, and WeightedLassoCVWrapper without n_jobs, so
first-stage fits ran single-core regardless of what the user passed.

- get_selector / _make_first_stage_selector gain an n_jobs kwarg and
  thread it into every estimator whose constructor supports it.
- DML and NonParamDML pass getattr(self, 'n_jobs', None) so subclasses
  that already expose n_jobs (SparseLinearDML) get it for free without
  touching subclasses that don't (LinearDML, KernelDML).
- CausalForestDML passes self.n_jobs directly.
- Gate the existing 'n_jobs' copy in _to_logisticRegression on
  sklearn < 1.8 (sklearn 1.8 deprecated n_jobs on LogisticRegression;
  copying it post-fit was triggering a FutureWarning once n_jobs
  actually started reaching LogisticRegressionCV).

Adds test_n_jobs_propagates_to_first_stage_auto_selector — static
introspection test that walks the selector tree for SparseLinearDML
(continuous + discrete treatment) and CausalForestDML and asserts
every n_jobs-bearing leaf got the user-supplied value.

Signed-off-by: Imran Ahamed <immu4989@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

n_jobs not passed to 'auto' learners in SparseLinearDML()

1 participant