Skip to content

Refactor model logic to include all information in one folder#308

Merged
Innixma merged 51 commits into
mainfrom
refactor_model_directories
May 29, 2026
Merged

Refactor model logic to include all information in one folder#308
Innixma merged 51 commits into
mainfrom
refactor_model_directories

Conversation

@Innixma

@Innixma Innixma commented May 15, 2026

Copy link
Copy Markdown
Collaborator

Refactor: per-model folder layout, auto-discovered registry, foundation/aggregation layering

Summary

This branch reorganises everything about how models are declared, discovered, and exposed in TabArena. Every model now lives in a single canonical folder; the registry is auto-built from those folders; the top-level tabarena.models namespace is a stable public surface; and a long-standing latent circular import has been removed at the root.

The change is mechanical at the file-system level but eliminates a class of hidden bugs and removes ~five hand-maintained lists that used to drift.

40 commits, ~220 files touched (net +1.9k LOC of which a large fraction is tests + per-model __init__/hpo/info boilerplate that replaces an older diffuse structure).

Why

The pre-refactor layout had grown three problems that compounded over time:

  1. Each model was scattered across multiple locations. The AutoGluon wrapper class lived under tabarena/benchmark/models/ag/<key>/, the search-space generator under tabarena/models/<key>/generate.py, and the MethodMetadata artefact entry inside tabarena/nips2025_utils/artifacts/. Adding a model meant editing 7+ unrelated places and keeping several hand-coded import lists in sync.

  2. The registry was hand-maintained. Multiple parallel mappings — friendly-name → generator, AG-key → wrapper class, model → pip extras — each had to be updated for every new model. Drift between them silently dropped models from one surface while keeping them in another.

  3. A latent circular import was masking real bugs. Per-model info.py files imported MethodMetadata from a package whose __init__.py eagerly aggregated metadata back from every per-model info.py. The discovery walk swallowed the resulting ImportError silently and moved on. As of main, CatBoost was the alphabetically-first model in the walk and was therefore the model that hit the cycle — meaning the most-used GBDT had been silently missing from MODEL_REGISTRY for an extended period without anyone noticing.

What changed

A single canonical home per model

Every model now lives in exactly one folder at tabarena/models/<key>/, with a uniform contract:

tabarena/models/<key>/
  __init__.py     # re-exports gen_<key>, <key>_info, <key>_method_metadata
  hpo.py          # the ConfigGenerator + search space
  info.py         # ModelInfo + MethodMetadata (registry visibility)
  model.py        # the AutoGluon wrapper class
  _internal/      # (optional) hand-written helpers
  _vendor/        # (optional) verbatim upstream code, with original license

hpo.py replaces the older generate.py convention. model.py is the canonical home of the wrapper class — the previous tabarena/benchmark/models/ag/ location has been retired entirely, including its top-level namespace. Multi-file models follow a standardised layout where private helpers go in _internal/ and copied-upstream code (currently only LimiX) goes in _vendor/ next to its license. Both subfolders can coexist in the same model folder for models that mix hand-written wrappers around vendored libraries.

Registry auto-discovery as the single source of truth

tabarena.models.discover_models() walks every per-model folder, imports each info.py, and collects the ModelInfo instances declared there. Everything downstream — tabarena_model_registry (AutoGluon registry), get_configs_generator_from_name (friendly-name lookup), pip_extra aggregation for pyproject extras — now derives from this single source instead of maintaining parallel lists.

Adding a new model is now a 2-place edit (the _LAZY_CLASSES map in tabarena/models/__init__.py + the relevant pyproject.toml extra) plus the per-model folder; the rest is auto-derived. The add-model skill has been rewritten to match.

The friendly-name lookup (get_configs_generator_from_name) collapsed from a hand-coded 27-entry dict into a 6-line registry lookup keyed by display_name or method. A parametrised regression test locks identity against the old behaviour: every previously-supported friendly name still returns exactly the same ConfigGenerator object.

Top-level tabarena.models namespace

tabarena.models is now a stable public surface. Consumers can write:

from tabarena.models import (
    RealMLPModel, TabPFNWideModel, MethodMetadata, ModelInfo,
    discover_models, get_model_registry, register_model_info,
)

Model wrapper classes and MethodMetadata are exposed via PEP 562 __getattr__ so import tabarena.models stays cheap — heavy ML libraries are loaded only on first attribute access and cached thereafter. The previous tabarena.benchmark.models.ag top-level surface was removed (it was already a thin re-export shim by the end of the migration); deep imports through that namespace are also gone.

__all__ is derived from _LAZY_CLASSES plus an explicit eager-export tuple, so it can't drift from the actual surface.

Foundation/aggregation layering: the cycle is fixed at the root

MethodMetadata moved out of the aggregator's package and into a foundation-layer module alongside ModelInfo. The structural consequence:

  • Per-model info.py files no longer transit a package whose __init__.py aggregates from them. The cycle is severed; the silent-skip in discover_models() is no longer load-bearing.
  • Cold-leaf imports of any per-model model.py work without a discover_models() warm-up — a workaround that several prior tests had to invoke.
  • The previously-invisible CatBoost is now correctly discovered. Registry size went from 29 → 30 entries as a direct side effect of the structural fix.

A back-compat shim at the legacy MethodMetadata path is kept so external imports remain stable.

Hardened discovery

discover_models() now logs a WARNING via stdlib logging when a per-model info.py fails to import, instead of silently dropping the model. The skip-and-continue behaviour is preserved — a broken model doesn't take down the rest of the registry — but the failure surfaces in logs so future regressions of the CatBoost-bug shape become immediately visible.

Extension registration surface

A register_model_info() API was added for extension packages whose model folders aren't reachable by discover_models()'s walk over tabarena.models. Extensions can now ship their own models that join the same registry, with automatic disambiguation when they re-declare a method name that already exists in the core registry.

Skill + documentation refresh

The .claude/skills/add-model/ skill (used by Claude to add new models) was rewritten end-to-end:

  • Describes the new per-model folder layout
  • Drops all references to the now-removed benchmark/models/ag/ namespace
  • Explains the auto-derived registries that no longer need manual edits
  • Includes templates for model.py, hpo.py, info.py, __init__.py, and the test file
  • Documents the _internal/ vs _vendor/ convention for multi-file models

Test coverage

A new tst/models/ test area was added with three focused suites:

  • test_registry.py — unit-tests the discovery walk (caching, duplicate detection, ignored-symbol filtering, extension registration, the new warning-on-failure behaviour) with a fully synthetic package fixture so the tests don't depend on the actual installed models.
  • test_lazy_imports.py — locks in the PEP 562 lazy property of tabarena.models, verifies MethodMetadata works through the lazy surface, confirms __all__ derivation is correct, guards against eager-re-export regressions.
  • test_utils.py — parametrises every previously-supported friendly model name and asserts the new registry-driven lookup returns the same ConfigGenerator object as the old hand-coded dict did.

Behaviours preserved

  • All previously-supported import paths for model classes resolve to the same class objects (verified by is-identity checks in tests).
  • MethodMetadata is still importable from its legacy tabarena.nips2025_utils.artifacts.method_metadata location for external callers, via a back-compat re-export shim.
  • All 27 friendly model names accepted by get_configs_generator_from_name still resolve to the same ConfigGenerator instance as before, including CPU/GPU variant ties.
  • The AutoGluon registry (tabarena_model_registry) still surfaces the same model classes; it just now auto-derives them instead of needing manual list edits.

Behaviours intentionally removed

  • The legacy top-level tabarena.benchmark.models.ag namespace and every per-model shim under it. Code that imported from these paths needs to switch to from tabarena.models import … or from tabarena.models.<key>.model import …. All in-repo call sites have already been updated.

Follow-ups deferred to subsequent PRs

These were considered in scope but explicitly deferred so this PR could land as a focused refactor:

  • Split _method_metadata.py (currently 962 lines mixing the foundation dataclass with S3/paper-runner/repository orchestration). After the split, MethodMetadata could be exposed eagerly at the top level without the cost concerns that motivated lazy loading.
  • Make nips2025_utils/artifacts/__init__.py lazy. The cycle is structurally fixed, but the eager aggregation in that package's init still forces every consumer of downloaders/uploaders to load the full metadata aggregator on first touch.
  • Factor CPU/GPU MethodMetadata pairs in realmlp/info.py, modernnca/info.py, tabm/info.py into a cpu_gpu_pair(...) helper.
  • Naming consistency in pyproject.toml: a couple of model extras (sap-rpt-oss, perpetualboosting) don't match their ModelKey, which trips up the pip-extras drift-detection.
  • Optional paper-output extras: the heavy plotting deps (tueplots, autorank, seaborn, matplotlib, plotly) are currently in the base install and could move into an optional paper extra.

Test results

The full model-area test sweep (tst/models/ + per-model tests + yaml-serialization tests) passes locally. Failures observed in the broader sweep are pre-existing and reproduce on main without any of these changes; none are caused by the refactor.

The two pre-existing failure modes (cold-leaf import of any per-model module without warm-up, and CatBoost silently missing from the registry) are now positively resolved by the layering fix.

@Innixma Innixma requested a review from LennartPurucker May 15, 2026 23:29

@LennartPurucker LennartPurucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great first step, some comments:

  • We still have several places we need to edit for each model and several files. Can we consolidate this even more? Maybe moving the generate function into the abstract model class would be best. Can we not move the model code into the same folder? Or, in some way or form, make it all one folder for all models.
  • Do we have documentation for how to fill in the MethodMetadata for a new model submission? How much of it could be stored in the abstract model class in some way or form?
  • nit: let us wait to merge iLTM before we go ahead with this.
  • nit: We need to update the add-model skill later.
  • Can you (with Calude) add a short explanation somewhere (maybe in the PR) and tests for how the discovery functions and should be understood?

Comment thread tabarena/tabarena/paper/tabarena_evaluator.py Outdated
@@ -24,37 +24,40 @@ def convert_numpy_dtypes(data: dict) -> dict:


def get_configs_generator_from_name(model_name: str):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: we might want to move to a function that returns the model info now in future refactor steps.

@Innixma Innixma force-pushed the refactor_model_directories branch 3 times, most recently from 6e722c9 to d99ca6b Compare May 27, 2026 00:05
@Innixma

Innixma commented May 27, 2026

Copy link
Copy Markdown
Collaborator Author

@LennartPurucker Finished the refactor, addressed comments, and updated PR description

@Innixma Innixma force-pushed the refactor_model_directories branch from deaab65 to dc47de4 Compare May 27, 2026 22:31

@LennartPurucker LennartPurucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool progress, add more comments/thoguhts

display_name="{ModelName}",
compute="gpu", # or "cpu"
date="YYYY-MM-DD", # date of the benchmarking run (or planning date if unbenchmarked)
ag_key="{ag_key_without_TA}", # e.g. "TABSTAR" (matches {ClassName}Model.ag_key without the TA- prefix)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to have it without TA prefix here? It seems a bit confusing now. Can we simplify this / rename the field in the metadata? Moreover, if possible, could we rely solely on ag_key or ag_name in the future?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer to save skill edits like this to post-refactor once we have a clearer view on what is still left to be refined.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, just commented on this as the skill was added/changed in the PR already

Comment thread tabarena/tabarena/models/ebm/model.py Outdated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we already think about a way/structure of how we want to deprecate models now? We can only depreacte them on the LB and keep them as is here, or add some flag for it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to think about this post-merge

ag_key="{ag_key_without_TA}", # e.g. "TABSTAR" (matches {ClassName}Model.ag_key without the TA- prefix)
model_key="{ag_key_without_TA}",
config_default="{ModelName}_c1_BAG_L1",
can_hpo=True,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ask Claude to go over these again from examples in models/ as some of this is wrong, like the postfix is always _c1_BAG_L1 or is_bag and can_hpo is True

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer to wait on this, we anyways may want to change MethodMetadata itself to not have certain keys or require certain information.

Comment thread tabarena/tabarena/models/nn_torch/hpo.py
Comment on lines +27 to 28
from tabarena.models import RealMLPModel
from autogluon.tabular.models import LGBModel

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I do "from tabarena.models import LGBModel" and it would do the same as "from autogluon.tabular.models import LGBModel"?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently, but we could support this if we want to.



# FIXME: Implement `best` and `best-N`
class MethodMetadata:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ask Claude to document this class and check that it aligns with your understanding of the class. Also, feel free to refactor this class as needed, as this might be a good time now

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer as a separate PR. This is a massive refactor already and I'm mainly just having it move files, not alter them. Otherwise it will become impossible to know if a bug sneaks in.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need these files for backward compatibility?

Or could we move reference models and portfolios also to the /models part somehow?
E.g. /mdeols/reference or models/portfolios

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And things like TabPrep artifacts to models/experimental?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to think about this for a future PR.

Comment thread tabarena/tabarena/tools/sync_pyproject_extras.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the test code / function also to the /models? And here, just have one file calling all tests (or a subset)? I guess we are not testing most models in CI or so, but it would be good to have all code that is relevant in a bundle for contributors

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% but I prefer to do in a separate PR.

@Innixma Innixma force-pushed the refactor_model_directories branch 2 times, most recently from 697cbc4 to ffca867 Compare May 28, 2026 23:16
Innixma and others added 20 commits May 28, 2026 23:50
Apply the TabSTAR Stage 1 recipe to limix, mitra, orionmsp, sap_rpt_oss,
tabdpt, and tabpfn_3: each gets `hpo.py` and `info.py`, with `generate.py`
reduced to a thin shim so the legacy `models/utils.py:name_to_import_map`
dispatch continues to work. The corresponding `<model>_metadata` entries
in `_tabarena_method_metadata_*.py` are now sourced from each model's
`info.py` (single source of truth, legacy names preserved as aliases).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same TabSTAR recipe: `hpo.py` becomes the canonical home for each model's
search-space generator (including the `generate_configs_*` / `generate_single_config_*`
helpers that the previous `generate.py` defined inline); `info.py` carries
the MethodMetadata + ModelInfo bundle. `generate.py` shims preserve the
legacy `models/utils.py:name_to_import_map` dispatch.

The `realmlp_gpu_metadata` and `xrfm_metadata` entries in
`_tabarena_method_metadata_2025_09_03.py` are now sourced from each model's
`info.py` (legacy names kept as aliases).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply the TabSTAR Stage 1 recipe to ebm, knn, lr, and perpetual_booster.
Each gets `hpo.py` (search space + `generate_configs_*` helpers) and
`info.py` (MethodMetadata + ModelInfo bundle); `generate.py` becomes a
thin shim so `models/utils.py:name_to_import_map` keeps working.

The `ebm_metadata`, `knn_metadata`, `lr_metadata`, and
`perpetualbooster_metadata` entries in the dated metadata files are now
sourced from each model's `info.py` (legacy names kept as aliases).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These folders host multiple AutoGluon model classes:
- tabicl/: TabICLModel + TabICLv2Model (only v2 has dedicated MethodMetadata)
- tabpfnv2_5/: RealTabPFNv25Model + TabPFNv26Model

`hpo.py` holds the shared search-space machinery and exports both `gen_*`
objects. `info.py` declares one `ModelInfo` per model class with a dedicated
MethodMetadata entry; auto-discovery picks them up by `ag_name`.

The dated metadata files (_2026_02_16, _2025_11_12, _2026_03_18) now source
`tabiclv2_metadata`, `realtabpfn25_metadata`, and `tabpfn26_metadata` from
each model's `info.py`. Legacy aliases preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply the TabSTAR Stage 1 recipe to the three GBDT models that previously
relied on the factory loop in `_tabarena_method_metadata_2025_06_12.py`.
Each gets `hpo.py` (search-space generator + `generate_configs_*` helper)
and `info.py` (standalone MethodMetadata + ModelInfo bundle).

The factory file now imports each model's `*_method_metadata` from its
`info.py` and skips them in the loop, so there's a single source of truth
per model and no duplicate registry entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same recipe: random_forest, extra_trees, fastai, and nn_torch each get
`hpo.py` + `info.py` with a standalone MethodMetadata. The factory loop
in `_tabarena_method_metadata_2025_06_12.py` now imports each metadata
from its `info.py` and skips the corresponding method in the loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These three models have a single `model_cls` shared between a CPU and a GPU
MethodMetadata variant. Each `info.py` declares two `ModelInfo` instances
(e.g. `tabm_info` + `tabm_gpu_info`) that share `model_cls` and `search_space`
but differ in `method_metadata`.

To support distinct keys for shared-model-class variants, the registry now
keys on `method_metadata.method` (guaranteed unique) instead of
`model_cls.ag_name`. No external consumers depend on the prior keying
scheme.

The factory in `_tabarena_method_metadata_2025_06_12.py` now sources all
12 migrated entries (3 GBDTs + 4 AG tabular + 5 multi-compute variants)
from their respective `info.py` modules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The tabicl/ folder already hosted both `gen_tabicl` (TabICLModel) and
`gen_tabiclv2` (TabICLv2Model) since chunk 4, but only TabICLv2 had a
dedicated `MethodMetadata` entry — the older TabICL_GPU lived in the
2025_06_12 factory loop. This commit adds `tabicl_method_metadata` and
`tabicl_info` so TabICL_GPU is a first-class registry entry alongside
TabICLv2.

`TabPFNv2_GPU` is left in the factory loop: there is no corresponding
`tabarena/models/tabpfnv2/` wrapper (the `models/utils.py` dispatch entry
points at a non-existent module), so it has no model_cls to attach to a
ModelInfo. That dispatch entry was dead before this refactor and stays
dead after; investigating it is out of scope here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each migrated model previously kept a thin `generate.py` re-exporting
`gen_<key>` from `hpo.py`, solely so `models/utils.py:name_to_import_map`
and a handful of scripts could keep working. With the migration complete:

- `name_to_import_map` now imports `tabarena.models.<key>.hpo.gen_<key>`
  directly. The dead `TabPFNv2_GPU` entry (its target module never existed)
  is also removed.
- 5 external consumers (`tabflow/scripts/run_jobs_*`, `examples/...`, and
  `tst/benchmark/test_yaml_experiment_serialization.py`) updated from
  `<key>.generate` to `<key>.hpo`.
- 24 `generate.py` shim files deleted.

Verified: top-level import succeeds, registry still holds 29 entries,
`get_configs_generator_from_name` dispatches correctly for all 26 friendly
names, and the metadata aggregator is unchanged at 35 methods.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each dated `_tabarena_method_metadata_*.py` previously kept a
`<key>_metadata = <key>_method_metadata` alias for back-compat. With Stage 2
complete, the canonical home is per-model `info.py` — the dated files
become near-empty placeholders.

- 2 active consumers updated to import from `tabarena.models.<key>.info`:
  - `tabflow/scripts/run_evaluate_linear_model.py` (lr_metadata → lr_method_metadata)
  - `examples/!old/run_download_url_and_cache_to_s3_2025_09_03.py` (8 aliases)
  - `examples/!old/run_limix_upload.py` (3 aliases)
- The aggregator `_tabarena_method_metadata.py` now imports every
  migrated entry directly from its per-model `info.py` (`<key>_method_metadata
  as <key>_metadata` re-aliasing keeps internal references stable).
- Legacy aliases dropped from 6 dated files (_2025_09_03, _2025_10_20,
  _2025_11_12, _2026_02_16, _2026_03_18, _2026_05_13). Each now either
  carries only the unmigrated factory entries or is a placeholder comment.

Verified: aggregator size unchanged at 35 methods, zero duplicates,
registry still holds 29 entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`model_registry.py` no longer maintains a hand-curated list of TabArena
custom model classes — it derives them from each `ModelInfo` in
`tabarena/models/<key>/info.py`, deduplicated by `model_cls` and filtered
to skip AG-builtins (whose `ag_key` is already in `ag_model_registry`).

To avoid a circular import (this module is transitively required by
`experiment_constructor` → `config_utils` → per-model `hpo.py`, which
`get_model_registry()` triggers via `discover_models()`), the derivation
runs lazily via PEP 562 `__getattr__`. `tabarena_model_registry` and
`_models_to_add` are built on first access.

Verified: 17 TabArena-custom classes auto-derived (matching the prior
hand-curated count exactly), all registered with the expected ag_keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ag_130_metadata` (AutoGluon 1.3 baseline) and `portfolio_metadata`
(Portfolio-N200-4h) aren't per-model wrappers — they're standalone
MethodMetadata entries for a baseline and a portfolio result, with no
`model_cls` / `search_space` to attach. They previously lived inline in
the 2025_06_12 factory file; move them to a dedicated
`tabarena/baselines/info.py` to clarify the separation between
per-model contributions and non-model baselines/portfolios.

The 5 historical config entries the factory still produces
(ExplainableBM/KNeighbors/LinearModel/RealMLP_GPU/TabDPT_GPU at
`artifact_name="tabarena-2025-06-12"`) intentionally stay — they
represent the original-paper artifact snapshot, distinct from each
model's newer artifact-name entry. The aggregator's existing
`replaced_methods` filter drops them from the latest collection while
the complete collection keeps them.

Verified: aggregator size unchanged at 35; complete collection still 54;
`AutoGluon_v130` and `Portfolio-N200-4h` continue to flow through the
complete collection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each per-model `info.py` is now self-describing about its pip dependencies
via the `pip_extra` tuple on `ModelInfo`. Backfilled for 13 models with
non-empty extras: ebm, limix, modernnca, orionmsp, perpetual_booster,
realmlp (both CPU + GPU), sap_rpt_oss, tabdpt, tabicl (both), tabm (both),
tabpfn_3, tabpfnv2_5 (both), xrfm.

`tabarena/tabarena/tools/sync_pyproject_extras.py` compares the aggregated
`ModelInfo.pip_extra` against `pyproject.toml` `[project.optional-dependencies]`
and reports drift. Run with `--check` to exit non-zero on mismatch (suitable
for CI / precommit). Current report flags legitimate naming differences
(e.g. `perpetual_booster` folder vs `perpetualboosting` extra, `sap_rpt_oss`
vs `sap-rpt-oss`) — a follow-up can either align names or add synonym
handling to the comparator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`betatabpfn`, `tabflex`, `TabPFNv2_GPU`, and the tabprep variants
(`PrepLightGBM`, `PrepLinearModel`, `PrepTabM`, `PrepRealTabPFN-v2.5`)
have benchmark-result MethodMetadata but no tabarena-side model wrapper
class. `ModelInfo` requires a `model_cls` and `search_space`, so these
entries can't be migrated to per-model `info.py` modules as-is.

Stage D is therefore a documentation-only stage: add module docstrings to
`_tabarena_method_metadata_2025_09_03.py` and
`_tabarena_method_metadata_2026_01_23_tabprep.py` explaining why these
entries stay there rather than moving to `tabarena/models/<key>/info.py`.

If wrappers are ever added for these models (or if the tabprep entries are
folded into the underlying model's `info.py` as additional `ModelInfo`
instances), Stage D can revisit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full Stage A (physically moving 15 wrapper directories from
`benchmark/models/ag/<key>/` to `tabarena/models/<key>/` and updating
~69 import paths) is high blast-radius for a single commit. This commit
takes the smaller intermediate step: each per-model folder gains a
`model.py` that re-exports the wrapper class(es) from their canonical
location. The per-model layout is now uniform on the import surface
(`tabarena.models.<key>` has `hpo.py` + `info.py` + `model.py` +
`__init__.py`); 69 legacy import paths keep working unchanged.

15 shims added:
- ebm, knn, limix, modernnca, orionmsp, perpetual_booster, realmlp,
  sap_rpt_oss, tabdpt, tabm, tabpfn_3, tabstar, xrfm
- tabicl (re-exports both TabICLModel + TabICLv2Model)
- tabpfnv2_5 (re-exports both RealTabPFNv25Model + TabPFNv26Model)

Follow-up to fully complete Stage A: physically relocate the wrapper
files to `tabarena/models/<key>/`, flip the shim direction so the legacy
path re-imports from the new location, then phase out the legacy
benchmark/models/ag/ tree once all 69 consumers migrate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Exposes `tabarena.models.register_model_info(info: ModelInfo)` so external
packages (e.g. `tabarena_extensions`) can declare additional models
without needing `discover_models()` to walk their package tree.

Extensions sometimes redeclare a method already in the core registry
(e.g. a re-benchmarked LinearModel with a different `artifact_name`).
When that happens, the function keys the new entry as
`f"{method}@{artifact_name}"`, preserving the core entry under the bare
method name.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous lazy access pattern predefined `_tabarena_model_registry`
and `_models_to_add` as module-level `None` globals. That short-circuits
Python's `__getattr__` fallback (which only fires on missing attributes),
so `from tabarena.benchmark.models.model_registry import _models_to_add`
returned `None` instead of building the list.

Move the lazy cache into a module-level `_lazy_state: dict` and remove
the predefined globals. Now first access via `__getattr__` builds and
caches both values; subsequent accesses read from the dict.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ConfigSpace` is declared as an optional extra (`search_spaces`) — base
installs don't include it. Pre-refactor, the per-model `generate.py`
files that needed ConfigSpace were only loaded on demand via
`name_to_import_map` lambdas, so the import never fired during a plain
`import tabarena`.

Stage 2's per-model migration moved the same code into `hpo.py`, which
`info.py` imports eagerly so `discover_models()` can build the registry.
That made the top-level `from ConfigSpace import ...` fire on every
`import tabarena`, breaking CI (`ModuleNotFoundError: No module named
'ConfigSpace'`).

Fix: in catboost, lightgbm, xgboost, extra_trees, random_forest, and xrfm
hpo.py, move the `from ConfigSpace import ...` inside the
`generate_configs_*` function body. Module import is now ConfigSpace-free;
the actual search-space construction (which only fires when someone calls
`gen_<key>.generate_all_bag_experiments(...)` or
`generate_configs_<key>(...)`) still requires it as before.

Verified: with ConfigSpace blocked at import, all 6 hpo.py modules load,
`MODEL_REGISTRY` builds (29 entries), and `_models_to_add` resolves
(17 TabArena-custom classes). With ConfigSpace available, the config
generators still produce configs unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Innixma and others added 24 commits May 28, 2026 23:50
Move the 6 S3/R2 transfer modules out of nips2025_utils/artifacts/ and
into a new tabarena/models/_artifacts/ sub-package, co-located with the
MethodMetadata they depend on. Drop the unused AbstractArtifactLoader /
AbstractArtifactUploader bases. Module names drop the redundant
`method_` prefix since the package scope already implies it. No shims:
all real consumers (the lazy imports inside _method_metadata.py and
method_artifact_manager.py) are updated in-place. Includes ruff --fix
cleanups on the touched files.
Tests for tabarena/tabarena/models/<key>/ now live at tst/models/test_<key>.py,
matching the existing flat layout that already houses test_lazy_imports.py,
test_registry.py, and test_utils.py. tst/benchmark/models/ is removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Innixma Innixma force-pushed the refactor_model_directories branch from e0aff69 to a1d33b8 Compare May 28, 2026 23:51
@LennartPurucker LennartPurucker self-requested a review May 29, 2026 07:42

@LennartPurucker LennartPurucker left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge as you see fit, I think the general new workflow is great. I leave it up to you what to add and what to add in antoher PR

@Innixma Innixma merged commit 8e0e89d into main May 29, 2026
6 checks passed
@LennartPurucker

Copy link
Copy Markdown
Collaborator

Big one, let's goooo!

@LennartPurucker LennartPurucker deleted the refactor_model_directories branch June 2, 2026 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants