fix: Thread n_jobs to first-stage 'auto' selector (closes #1009)#1030
Open
immu4989 wants to merge 1 commit into
Open
fix: Thread n_jobs to first-stage 'auto' selector (closes #1009)#1030immu4989 wants to merge 1 commit into
immu4989 wants to merge 1 commit into
Conversation
SparseLinearDML and CausalForestDML accept n_jobs but only threaded it into the second-stage model (Lasso / causal forest). The 'auto' first stage built by _make_first_stage_selector → get_selector(['forest', 'linear'], ...) constructed RandomForest, GridSearchCV, LogisticRegressionCV, and WeightedLassoCVWrapper without n_jobs, so first-stage fits ran single-core regardless of what the user passed. - get_selector / _make_first_stage_selector gain an n_jobs kwarg and thread it into every estimator whose constructor supports it. - DML and NonParamDML pass getattr(self, 'n_jobs', None) so subclasses that already expose n_jobs (SparseLinearDML) get it for free without touching subclasses that don't (LinearDML, KernelDML). - CausalForestDML passes self.n_jobs directly. - Gate the existing 'n_jobs' copy in _to_logisticRegression on sklearn < 1.8 (sklearn 1.8 deprecated n_jobs on LogisticRegression; copying it post-fit was triggering a FutureWarning once n_jobs actually started reaching LogisticRegressionCV). Adds test_n_jobs_propagates_to_first_stage_auto_selector — static introspection test that walks the selector tree for SparseLinearDML (continuous + discrete treatment) and CausalForestDML and asserts every n_jobs-bearing leaf got the user-supplied value. Signed-off-by: Imran Ahamed <immu4989@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1009.
Investigation comment with the full diagnosis and selector-tree evidence:
#1009 (comment)
What changed
get_selector/_make_first_stage_selectorgain ann_jobskwargand thread it through every estimator that supports it.
DML/NonParamDMLusegetattr(self, 'n_jobs', None)so the fixapplies to
SparseLinearDMLwithout touching classes that don'texpose
n_jobs(LinearDML,KernelDML).CausalForestDMLpassesself.n_jobsdirectly.n_jobscopy in_to_logisticRegressionon
sklearn < 1.8. sklearn 1.8 deprecatedn_jobsonLogisticRegression; the copy triggered aFutureWarningoncen_jobsactually started reachingLogisticRegressionCV.Out of scope
LinearDMLandKernelDMLdon't currently exposen_jobs. Adding itwould be a separate API decision; flagged for follow up.
DRLearnerfamily: the duplicate_make_first_stage_selectorineconml/dr/_drlearner.pygains the kwarg for API consistency, but noDR call site exercises it yet. Follow-up if/when those classes grow
an
n_jobsparameter.Verification
New regression test
test_n_jobs_propagates_to_first_stage_auto_selector:SparseLinearDML -> GridSearchCV.n_jobs = None, expected 3)SparseLinearDML(continuous + discrete) andCausalForestDML