Skip to content

Extremely Large AutoRecLab Workspace During Experiments with Large Datasets #34

@eisenbahnhero

Description

@eisenbahnhero

In recent experiments, I observed that the AutoRecLab workspace (“sandbox”) can grow extremely large. In one current experiment, it even exceeded 400 GB. As a result, AutoRecLab terminates when the disk space is exhausted, which is not desirable. This is particularly common when working with large datasets, such as the Amazon2018MusicalInstruments dataset.

This growth occurs because AutoRecLab typically trains a model for each created node (e.g., via OmniRec) and stores it in a file for later use. With a large number of nodes, this leads to a correspondingly large number of stored models, causing storage usage to increase very quickly.

To address this, I will introduce a flag in the config.toml. Without this flag, AutoRecLab will behave as usual. However, when the flag is enabled, the full workspace of a node will no longer be archived/logged after execution. Instead, only the relevant results will be kept. In particular, the large model files will be deleted, allowing users to run large experiments even with limited storage capacity.

Image

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions