Refactor model logic to include all information in one folder by Innixma · Pull Request #308 · autogluon/tabarena

Innixma · 2026-05-15T23:29:16Z

Refactor: per-model folder layout, auto-discovered registry, foundation/aggregation layering

Summary

This branch reorganises everything about how models are declared, discovered, and exposed in TabArena. Every model now lives in a single canonical folder; the registry is auto-built from those folders; the top-level tabarena.models namespace is a stable public surface; and a long-standing latent circular import has been removed at the root.

The change is mechanical at the file-system level but eliminates a class of hidden bugs and removes ~five hand-maintained lists that used to drift.

40 commits, ~220 files touched (net +1.9k LOC of which a large fraction is tests + per-model __init__/hpo/info boilerplate that replaces an older diffuse structure).

Why

The pre-refactor layout had grown three problems that compounded over time:

Each model was scattered across multiple locations. The AutoGluon wrapper class lived under tabarena/benchmark/models/ag/<key>/, the search-space generator under tabarena/models/<key>/generate.py, and the MethodMetadata artefact entry inside tabarena/nips2025_utils/artifacts/. Adding a model meant editing 7+ unrelated places and keeping several hand-coded import lists in sync.
The registry was hand-maintained. Multiple parallel mappings — friendly-name → generator, AG-key → wrapper class, model → pip extras — each had to be updated for every new model. Drift between them silently dropped models from one surface while keeping them in another.
A latent circular import was masking real bugs. Per-model info.py files imported MethodMetadata from a package whose __init__.py eagerly aggregated metadata back from every per-model info.py. The discovery walk swallowed the resulting ImportError silently and moved on. As of main, CatBoost was the alphabetically-first model in the walk and was therefore the model that hit the cycle — meaning the most-used GBDT had been silently missing from MODEL_REGISTRY for an extended period without anyone noticing.

What changed

A single canonical home per model

Every model now lives in exactly one folder at tabarena/models/<key>/, with a uniform contract:

tabarena/models/<key>/
  __init__.py     # re-exports gen_<key>, <key>_info, <key>_method_metadata
  hpo.py          # the ConfigGenerator + search space
  info.py         # ModelInfo + MethodMetadata (registry visibility)
  model.py        # the AutoGluon wrapper class
  _internal/      # (optional) hand-written helpers
  _vendor/        # (optional) verbatim upstream code, with original license

hpo.py replaces the older generate.py convention. model.py is the canonical home of the wrapper class — the previous tabarena/benchmark/models/ag/ location has been retired entirely, including its top-level namespace. Multi-file models follow a standardised layout where private helpers go in _internal/ and copied-upstream code (currently only LimiX) goes in _vendor/ next to its license. Both subfolders can coexist in the same model folder for models that mix hand-written wrappers around vendored libraries.

Registry auto-discovery as the single source of truth

tabarena.models.discover_models() walks every per-model folder, imports each info.py, and collects the ModelInfo instances declared there. Everything downstream — tabarena_model_registry (AutoGluon registry), get_configs_generator_from_name (friendly-name lookup), pip_extra aggregation for pyproject extras — now derives from this single source instead of maintaining parallel lists.

Adding a new model is now a 2-place edit (the _LAZY_CLASSES map in tabarena/models/__init__.py + the relevant pyproject.toml extra) plus the per-model folder; the rest is auto-derived. The add-model skill has been rewritten to match.

The friendly-name lookup (get_configs_generator_from_name) collapsed from a hand-coded 27-entry dict into a 6-line registry lookup keyed by display_name or method. A parametrised regression test locks identity against the old behaviour: every previously-supported friendly name still returns exactly the same ConfigGenerator object.

Top-level `tabarena.models` namespace

tabarena.models is now a stable public surface. Consumers can write:

from tabarena.models import (
    RealMLPModel, TabPFNWideModel, MethodMetadata, ModelInfo,
    discover_models, get_model_registry, register_model_info,
)

Model wrapper classes and MethodMetadata are exposed via PEP 562 __getattr__ so import tabarena.models stays cheap — heavy ML libraries are loaded only on first attribute access and cached thereafter. The previous tabarena.benchmark.models.ag top-level surface was removed (it was already a thin re-export shim by the end of the migration); deep imports through that namespace are also gone.

__all__ is derived from _LAZY_CLASSES plus an explicit eager-export tuple, so it can't drift from the actual surface.

Foundation/aggregation layering: the cycle is fixed at the root

MethodMetadata moved out of the aggregator's package and into a foundation-layer module alongside ModelInfo. The structural consequence:

Per-model info.py files no longer transit a package whose __init__.py aggregates from them. The cycle is severed; the silent-skip in discover_models() is no longer load-bearing.
Cold-leaf imports of any per-model model.py work without a discover_models() warm-up — a workaround that several prior tests had to invoke.
The previously-invisible CatBoost is now correctly discovered. Registry size went from 29 → 30 entries as a direct side effect of the structural fix.

A back-compat shim at the legacy MethodMetadata path is kept so external imports remain stable.

Hardened discovery

discover_models() now logs a WARNING via stdlib logging when a per-model info.py fails to import, instead of silently dropping the model. The skip-and-continue behaviour is preserved — a broken model doesn't take down the rest of the registry — but the failure surfaces in logs so future regressions of the CatBoost-bug shape become immediately visible.

Extension registration surface

A register_model_info() API was added for extension packages whose model folders aren't reachable by discover_models()'s walk over tabarena.models. Extensions can now ship their own models that join the same registry, with automatic disambiguation when they re-declare a method name that already exists in the core registry.

Skill + documentation refresh

The .claude/skills/add-model/ skill (used by Claude to add new models) was rewritten end-to-end:

Describes the new per-model folder layout
Drops all references to the now-removed benchmark/models/ag/ namespace
Explains the auto-derived registries that no longer need manual edits
Includes templates for model.py, hpo.py, info.py, __init__.py, and the test file
Documents the _internal/ vs _vendor/ convention for multi-file models

Test coverage

A new tst/models/ test area was added with three focused suites:

test_registry.py — unit-tests the discovery walk (caching, duplicate detection, ignored-symbol filtering, extension registration, the new warning-on-failure behaviour) with a fully synthetic package fixture so the tests don't depend on the actual installed models.
test_lazy_imports.py — locks in the PEP 562 lazy property of tabarena.models, verifies MethodMetadata works through the lazy surface, confirms __all__ derivation is correct, guards against eager-re-export regressions.
test_utils.py — parametrises every previously-supported friendly model name and asserts the new registry-driven lookup returns the same ConfigGenerator object as the old hand-coded dict did.

Behaviours preserved

All previously-supported import paths for model classes resolve to the same class objects (verified by is-identity checks in tests).
MethodMetadata is still importable from its legacy tabarena.nips2025_utils.artifacts.method_metadata location for external callers, via a back-compat re-export shim.
All 27 friendly model names accepted by get_configs_generator_from_name still resolve to the same ConfigGenerator instance as before, including CPU/GPU variant ties.
The AutoGluon registry (tabarena_model_registry) still surfaces the same model classes; it just now auto-derives them instead of needing manual list edits.

Behaviours intentionally removed

The legacy top-level tabarena.benchmark.models.ag namespace and every per-model shim under it. Code that imported from these paths needs to switch to from tabarena.models import … or from tabarena.models.<key>.model import …. All in-repo call sites have already been updated.

Follow-ups deferred to subsequent PRs

These were considered in scope but explicitly deferred so this PR could land as a focused refactor:

Split _method_metadata.py (currently 962 lines mixing the foundation dataclass with S3/paper-runner/repository orchestration). After the split, MethodMetadata could be exposed eagerly at the top level without the cost concerns that motivated lazy loading.
Make nips2025_utils/artifacts/__init__.py lazy. The cycle is structurally fixed, but the eager aggregation in that package's init still forces every consumer of downloaders/uploaders to load the full metadata aggregator on first touch.
Factor CPU/GPU MethodMetadata pairs in realmlp/info.py, modernnca/info.py, tabm/info.py into a cpu_gpu_pair(...) helper.
Naming consistency in pyproject.toml: a couple of model extras (sap-rpt-oss, perpetualboosting) don't match their ModelKey, which trips up the pip-extras drift-detection.
Optional paper-output extras: the heavy plotting deps (tueplots, autorank, seaborn, matplotlib, plotly) are currently in the base install and could move into an optional paper extra.

Test results

The full model-area test sweep (tst/models/ + per-model tests + yaml-serialization tests) passes locally. Failures observed in the broader sweep are pre-existing and reproduce on main without any of these changes; none are caused by the refactor.

The two pre-existing failure modes (cold-leaf import of any per-model module without warm-up, and CatBoost silently missing from the registry) are now positively resolved by the layering fix.

LennartPurucker

Great first step, some comments:

We still have several places we need to edit for each model and several files. Can we consolidate this even more? Maybe moving the generate function into the abstract model class would be best. Can we not move the model code into the same folder? Or, in some way or form, make it all one folder for all models.
Do we have documentation for how to fill in the MethodMetadata for a new model submission? How much of it could be stored in the abstract model class in some way or form?
nit: let us wait to merge iLTM before we go ahead with this.
nit: We need to update the add-model skill later.
Can you (with Calude) add a short explanation somewhere (maybe in the PR) and tests for how the discovery functions and should be understood?

LennartPurucker · 2026-05-17T13:19:35Z

@@ -24,37 +24,40 @@ def convert_numpy_dtypes(data: dict) -> dict:


 def get_configs_generator_from_name(model_name: str):


note: we might want to move to a function that returns the model info now in future refactor steps.

Innixma · 2026-05-27T00:40:49Z

@LennartPurucker Finished the refactor, addressed comments, and updated PR description

LennartPurucker

Very cool progress, add more comments/thoguhts

LennartPurucker · 2026-05-28T07:58:47Z

+    display_name="{ModelName}",
+    compute="gpu",                          # or "cpu"
+    date="YYYY-MM-DD",                     # date of the benchmarking run (or planning date if unbenchmarked)
+    ag_key="{ag_key_without_TA}",          # e.g. "TABSTAR" (matches {ClassName}Model.ag_key without the TA- prefix)


Why do we want to have it without TA prefix here? It seems a bit confusing now. Can we simplify this / rename the field in the metadata? Moreover, if possible, could we rely solely on ag_key or ag_name in the future?

Would prefer to save skill edits like this to post-refactor once we have a clearer view on what is still left to be refined.

Sure, just commented on this as the skill was added/changed in the PR already

LennartPurucker · 2026-05-28T08:05:18Z

Can we already think about a way/structure of how we want to deprecate models now? We can only depreacte them on the LB and keep them as is here, or add some flag for it.

Prefer to think about this post-merge

LennartPurucker · 2026-05-28T08:07:28Z

+    ag_key="{ag_key_without_TA}",          # e.g. "TABSTAR" (matches {ClassName}Model.ag_key without the TA- prefix)
+    model_key="{ag_key_without_TA}",
+    config_default="{ModelName}_c1_BAG_L1",
+    can_hpo=True,


Maybe ask Claude to go over these again from examples in models/ as some of this is wrong, like the postfix is always _c1_BAG_L1 or is_bag and can_hpo is True

Would prefer to wait on this, we anyways may want to change MethodMetadata itself to not have certain keys or require certain information.

LennartPurucker · 2026-05-28T08:15:15Z

+    from tabarena.models import RealMLPModel
    from autogluon.tabular.models import LGBModel


Could I do "from tabarena.models import LGBModel" and it would do the same as "from autogluon.tabular.models import LGBModel"?

Not currently, but we could support this if we want to.

LennartPurucker · 2026-05-28T08:18:16Z

+
+
+# FIXME: Implement `best` and `best-N`
+class MethodMetadata:


Ask Claude to document this class and check that it aligns with your understanding of the class. Also, feel free to refactor this class as needed, as this might be a good time now

I'd prefer as a separate PR. This is a massive refactor already and I'm mainly just having it move files, not alter them. Otherwise it will become impossible to know if a bug sneaks in.

LennartPurucker · 2026-05-28T08:19:33Z

We need these files for backward compatibility?

Or could we move reference models and portfolios also to the /models part somehow?
E.g. /mdeols/reference or models/portfolios

And things like TabPrep artifacts to models/experimental?

Prefer to think about this for a future PR.

LennartPurucker · 2026-05-28T08:24:29Z

Can we move the test code / function also to the /models? And here, just have one file calling all tests (or a subset)? I guess we are not testing most models in CI or so, but it would be good to have all code that is relevant in a bundle for contributors

100% but I prefer to do in a separate PR.

Apply the TabSTAR Stage 1 recipe to limix, mitra, orionmsp, sap_rpt_oss, tabdpt, and tabpfn_3: each gets `hpo.py` and `info.py`, with `generate.py` reduced to a thin shim so the legacy `models/utils.py:name_to_import_map` dispatch continues to work. The corresponding `<model>_metadata` entries in `_tabarena_method_metadata_*.py` are now sourced from each model's `info.py` (single source of truth, legacy names preserved as aliases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same TabSTAR recipe: `hpo.py` becomes the canonical home for each model's search-space generator (including the `generate_configs_*` / `generate_single_config_*` helpers that the previous `generate.py` defined inline); `info.py` carries the MethodMetadata + ModelInfo bundle. `generate.py` shims preserve the legacy `models/utils.py:name_to_import_map` dispatch. The `realmlp_gpu_metadata` and `xrfm_metadata` entries in `_tabarena_method_metadata_2025_09_03.py` are now sourced from each model's `info.py` (legacy names kept as aliases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Apply the TabSTAR Stage 1 recipe to ebm, knn, lr, and perpetual_booster. Each gets `hpo.py` (search space + `generate_configs_*` helpers) and `info.py` (MethodMetadata + ModelInfo bundle); `generate.py` becomes a thin shim so `models/utils.py:name_to_import_map` keeps working. The `ebm_metadata`, `knn_metadata`, `lr_metadata`, and `perpetualbooster_metadata` entries in the dated metadata files are now sourced from each model's `info.py` (legacy names kept as aliases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

These folders host multiple AutoGluon model classes: - tabicl/: TabICLModel + TabICLv2Model (only v2 has dedicated MethodMetadata) - tabpfnv2_5/: RealTabPFNv25Model + TabPFNv26Model `hpo.py` holds the shared search-space machinery and exports both `gen_*` objects. `info.py` declares one `ModelInfo` per model class with a dedicated MethodMetadata entry; auto-discovery picks them up by `ag_name`. The dated metadata files (_2026_02_16, _2025_11_12, _2026_03_18) now source `tabiclv2_metadata`, `realtabpfn25_metadata`, and `tabpfn26_metadata` from each model's `info.py`. Legacy aliases preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Apply the TabSTAR Stage 1 recipe to the three GBDT models that previously relied on the factory loop in `_tabarena_method_metadata_2025_06_12.py`. Each gets `hpo.py` (search-space generator + `generate_configs_*` helper) and `info.py` (standalone MethodMetadata + ModelInfo bundle). The factory file now imports each model's `*_method_metadata` from its `info.py` and skips them in the loop, so there's a single source of truth per model and no duplicate registry entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same recipe: random_forest, extra_trees, fastai, and nn_torch each get `hpo.py` + `info.py` with a standalone MethodMetadata. The factory loop in `_tabarena_method_metadata_2025_06_12.py` now imports each metadata from its `info.py` and skips the corresponding method in the loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

These three models have a single `model_cls` shared between a CPU and a GPU MethodMetadata variant. Each `info.py` declares two `ModelInfo` instances (e.g. `tabm_info` + `tabm_gpu_info`) that share `model_cls` and `search_space` but differ in `method_metadata`. To support distinct keys for shared-model-class variants, the registry now keys on `method_metadata.method` (guaranteed unique) instead of `model_cls.ag_name`. No external consumers depend on the prior keying scheme. The factory in `_tabarena_method_metadata_2025_06_12.py` now sources all 12 migrated entries (3 GBDTs + 4 AG tabular + 5 multi-compute variants) from their respective `info.py` modules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The tabicl/ folder already hosted both `gen_tabicl` (TabICLModel) and `gen_tabiclv2` (TabICLv2Model) since chunk 4, but only TabICLv2 had a dedicated `MethodMetadata` entry — the older TabICL_GPU lived in the 2025_06_12 factory loop. This commit adds `tabicl_method_metadata` and `tabicl_info` so TabICL_GPU is a first-class registry entry alongside TabICLv2. `TabPFNv2_GPU` is left in the factory loop: there is no corresponding `tabarena/models/tabpfnv2/` wrapper (the `models/utils.py` dispatch entry points at a non-existent module), so it has no model_cls to attach to a ModelInfo. That dispatch entry was dead before this refactor and stays dead after; investigating it is out of scope here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Each migrated model previously kept a thin `generate.py` re-exporting `gen_<key>` from `hpo.py`, solely so `models/utils.py:name_to_import_map` and a handful of scripts could keep working. With the migration complete: - `name_to_import_map` now imports `tabarena.models.<key>.hpo.gen_<key>` directly. The dead `TabPFNv2_GPU` entry (its target module never existed) is also removed. - 5 external consumers (`tabflow/scripts/run_jobs_*`, `examples/...`, and `tst/benchmark/test_yaml_experiment_serialization.py`) updated from `<key>.generate` to `<key>.hpo`. - 24 `generate.py` shim files deleted. Verified: top-level import succeeds, registry still holds 29 entries, `get_configs_generator_from_name` dispatches correctly for all 26 friendly names, and the metadata aggregator is unchanged at 35 methods. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Each dated `_tabarena_method_metadata_*.py` previously kept a `<key>_metadata = <key>_method_metadata` alias for back-compat. With Stage 2 complete, the canonical home is per-model `info.py` — the dated files become near-empty placeholders. - 2 active consumers updated to import from `tabarena.models.<key>.info`: - `tabflow/scripts/run_evaluate_linear_model.py` (lr_metadata → lr_method_metadata) - `examples/!old/run_download_url_and_cache_to_s3_2025_09_03.py` (8 aliases) - `examples/!old/run_limix_upload.py` (3 aliases) - The aggregator `_tabarena_method_metadata.py` now imports every migrated entry directly from its per-model `info.py` (`<key>_method_metadata as <key>_metadata` re-aliasing keeps internal references stable). - Legacy aliases dropped from 6 dated files (_2025_09_03, _2025_10_20, _2025_11_12, _2026_02_16, _2026_03_18, _2026_05_13). Each now either carries only the unmigrated factory entries or is a placeholder comment. Verified: aggregator size unchanged at 35 methods, zero duplicates, registry still holds 29 entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`model_registry.py` no longer maintains a hand-curated list of TabArena custom model classes — it derives them from each `ModelInfo` in `tabarena/models/<key>/info.py`, deduplicated by `model_cls` and filtered to skip AG-builtins (whose `ag_key` is already in `ag_model_registry`). To avoid a circular import (this module is transitively required by `experiment_constructor` → `config_utils` → per-model `hpo.py`, which `get_model_registry()` triggers via `discover_models()`), the derivation runs lazily via PEP 562 `__getattr__`. `tabarena_model_registry` and `_models_to_add` are built on first access. Verified: 17 TabArena-custom classes auto-derived (matching the prior hand-curated count exactly), all registered with the expected ag_keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`ag_130_metadata` (AutoGluon 1.3 baseline) and `portfolio_metadata` (Portfolio-N200-4h) aren't per-model wrappers — they're standalone MethodMetadata entries for a baseline and a portfolio result, with no `model_cls` / `search_space` to attach. They previously lived inline in the 2025_06_12 factory file; move them to a dedicated `tabarena/baselines/info.py` to clarify the separation between per-model contributions and non-model baselines/portfolios. The 5 historical config entries the factory still produces (ExplainableBM/KNeighbors/LinearModel/RealMLP_GPU/TabDPT_GPU at `artifact_name="tabarena-2025-06-12"`) intentionally stay — they represent the original-paper artifact snapshot, distinct from each model's newer artifact-name entry. The aggregator's existing `replaced_methods` filter drops them from the latest collection while the complete collection keeps them. Verified: aggregator size unchanged at 35; complete collection still 54; `AutoGluon_v130` and `Portfolio-N200-4h` continue to flow through the complete collection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Each per-model `info.py` is now self-describing about its pip dependencies via the `pip_extra` tuple on `ModelInfo`. Backfilled for 13 models with non-empty extras: ebm, limix, modernnca, orionmsp, perpetual_booster, realmlp (both CPU + GPU), sap_rpt_oss, tabdpt, tabicl (both), tabm (both), tabpfn_3, tabpfnv2_5 (both), xrfm. `tabarena/tabarena/tools/sync_pyproject_extras.py` compares the aggregated `ModelInfo.pip_extra` against `pyproject.toml` `[project.optional-dependencies]` and reports drift. Run with `--check` to exit non-zero on mismatch (suitable for CI / precommit). Current report flags legitimate naming differences (e.g. `perpetual_booster` folder vs `perpetualboosting` extra, `sap_rpt_oss` vs `sap-rpt-oss`) — a follow-up can either align names or add synonym handling to the comparator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`betatabpfn`, `tabflex`, `TabPFNv2_GPU`, and the tabprep variants (`PrepLightGBM`, `PrepLinearModel`, `PrepTabM`, `PrepRealTabPFN-v2.5`) have benchmark-result MethodMetadata but no tabarena-side model wrapper class. `ModelInfo` requires a `model_cls` and `search_space`, so these entries can't be migrated to per-model `info.py` modules as-is. Stage D is therefore a documentation-only stage: add module docstrings to `_tabarena_method_metadata_2025_09_03.py` and `_tabarena_method_metadata_2026_01_23_tabprep.py` explaining why these entries stay there rather than moving to `tabarena/models/<key>/info.py`. If wrappers are ever added for these models (or if the tabprep entries are folded into the underlying model's `info.py` as additional `ModelInfo` instances), Stage D can revisit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Full Stage A (physically moving 15 wrapper directories from `benchmark/models/ag/<key>/` to `tabarena/models/<key>/` and updating ~69 import paths) is high blast-radius for a single commit. This commit takes the smaller intermediate step: each per-model folder gains a `model.py` that re-exports the wrapper class(es) from their canonical location. The per-model layout is now uniform on the import surface (`tabarena.models.<key>` has `hpo.py` + `info.py` + `model.py` + `__init__.py`); 69 legacy import paths keep working unchanged. 15 shims added: - ebm, knn, limix, modernnca, orionmsp, perpetual_booster, realmlp, sap_rpt_oss, tabdpt, tabm, tabpfn_3, tabstar, xrfm - tabicl (re-exports both TabICLModel + TabICLv2Model) - tabpfnv2_5 (re-exports both RealTabPFNv25Model + TabPFNv26Model) Follow-up to fully complete Stage A: physically relocate the wrapper files to `tabarena/models/<key>/`, flip the shim direction so the legacy path re-imports from the new location, then phase out the legacy benchmark/models/ag/ tree once all 69 consumers migrate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Exposes `tabarena.models.register_model_info(info: ModelInfo)` so external packages (e.g. `tabarena_extensions`) can declare additional models without needing `discover_models()` to walk their package tree. Extensions sometimes redeclare a method already in the core registry (e.g. a re-benchmarked LinearModel with a different `artifact_name`). When that happens, the function keys the new entry as `f"{method}@{artifact_name}"`, preserving the core entry under the bare method name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous lazy access pattern predefined `_tabarena_model_registry` and `_models_to_add` as module-level `None` globals. That short-circuits Python's `__getattr__` fallback (which only fires on missing attributes), so `from tabarena.benchmark.models.model_registry import _models_to_add` returned `None` instead of building the list. Move the lazy cache into a module-level `_lazy_state: dict` and remove the predefined globals. Now first access via `__getattr__` builds and caches both values; subsequent accesses read from the dict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`ConfigSpace` is declared as an optional extra (`search_spaces`) — base installs don't include it. Pre-refactor, the per-model `generate.py` files that needed ConfigSpace were only loaded on demand via `name_to_import_map` lambdas, so the import never fired during a plain `import tabarena`. Stage 2's per-model migration moved the same code into `hpo.py`, which `info.py` imports eagerly so `discover_models()` can build the registry. That made the top-level `from ConfigSpace import ...` fire on every `import tabarena`, breaking CI (`ModuleNotFoundError: No module named 'ConfigSpace'`). Fix: in catboost, lightgbm, xgboost, extra_trees, random_forest, and xrfm hpo.py, move the `from ConfigSpace import ...` inside the `generate_configs_*` function body. Module import is now ConfigSpace-free; the actual search-space construction (which only fires when someone calls `gen_<key>.generate_all_bag_experiments(...)` or `generate_configs_<key>(...)`) still requires it as before. Verified: with ConfigSpace blocked at import, all 6 hpo.py modules load, `MODEL_REGISTRY` builds (29 entries), and `_models_to_add` resolves (17 TabArena-custom classes). With ConfigSpace available, the config generators still produce configs unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Move the 6 S3/R2 transfer modules out of nips2025_utils/artifacts/ and into a new tabarena/models/_artifacts/ sub-package, co-located with the MethodMetadata they depend on. Drop the unused AbstractArtifactLoader / AbstractArtifactUploader bases. Module names drop the redundant `method_` prefix since the package scope already implies it. No shims: all real consumers (the lazy imports inside _method_metadata.py and method_artifact_manager.py) are updated in-place. Includes ruff --fix cleanups on the touched files.

Tests for tabarena/tabarena/models/<key>/ now live at tst/models/test_<key>.py, matching the existing flat layout that already houses test_lazy_imports.py, test_registry.py, and test_utils.py. tst/benchmark/models/ is removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

LennartPurucker

Merge as you see fit, I think the general new workflow is great. I leave it up to you what to add and what to add in antoher PR

LennartPurucker · 2026-05-29T17:02:10Z

Big one, let's goooo!

Innixma requested a review from LennartPurucker May 15, 2026 23:29

LennartPurucker reviewed May 17, 2026

View reviewed changes

Innixma force-pushed the refactor_model_directories branch 3 times, most recently from 6e722c9 to d99ca6b Compare May 27, 2026 00:05

Innixma force-pushed the refactor_model_directories branch from deaab65 to dc47de4 Compare May 27, 2026 22:31

LennartPurucker reviewed May 28, 2026

View reviewed changes

Innixma force-pushed the refactor_model_directories branch 2 times, most recently from 697cbc4 to ffca867 Compare May 28, 2026 23:16

Innixma and others added 20 commits May 28, 2026 23:50

Refactor TabSTAR location

9ce1191

Update rf/xt default to use_child_oof=False

e4612c3

Innixma and others added 24 commits May 28, 2026 23:50

Move model files and utils

16cc6a8

Move knn files and utils

ac44d7b

Move knn files and utils

abf015c

Removed unused shims

f1d6bfd

Update skills

9e04b25

Update model imports

427ba4e

Update model imports

b8627ab

Update model imports

95cd569

Update method_metadata location

3e6d3fe

Update method_metadata import

7935d0d

Cleanup

8124b4f

Update

9d2ec23

Update

67d5f80

fix test

06c3144

Update method metadata collection location

ed75302

Update plotting logic

2cac0ae

Fix plotting pareto fronts

8589593

Refactor iltm format

8475b64

Minor plotting improvements

c27a2ff

address comment

5968855

Update skill

f1b3791

Update EBM

245d3bd

Innixma force-pushed the refactor_model_directories branch from e0aff69 to a1d33b8 Compare May 28, 2026 23:51

LennartPurucker self-requested a review May 29, 2026 07:42

LennartPurucker approved these changes May 29, 2026

View reviewed changes

Innixma merged commit 8e0e89d into main May 29, 2026
6 checks passed

LennartPurucker deleted the refactor_model_directories branch June 2, 2026 10:21

		@@ -24,37 +24,40 @@ def convert_numpy_dtypes(data: dict) -> dict:


		def get_configs_generator_from_name(model_name: str):

		from tabarena.models import RealMLPModel
		from autogluon.tabular.models import LGBModel

Conversation

Innixma commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactor: per-model folder layout, auto-discovered registry, foundation/aggregation layering

Summary

Why

What changed

A single canonical home per model

Registry auto-discovery as the single source of truth

Top-level tabarena.models namespace

Foundation/aggregation layering: the cycle is fixed at the root

Hardened discovery

Extension registration surface

Skill + documentation refresh

Test coverage

Behaviours preserved

Behaviours intentionally removed

Follow-ups deferred to subsequent PRs

Test results

Uh oh!

LennartPurucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Innixma commented May 27, 2026

Uh oh!

LennartPurucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LennartPurucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LennartPurucker commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

Innixma commented May 15, 2026 •

edited

Loading

Top-level `tabarena.models` namespace