Skip to content

Conversation

@lingbai-kong
Copy link
Contributor

Description

Fix the issus that training data missing/trade date is NaT randomly occurs when using route RollingStrategy with OnlineManager.

Motivation and Context

In RollingGen, the handler_mod is used to deal the case that hander's data end_time is earlier than dataset's test_data's segments. However, when the RollingGen.gen_following_tasks shifts the current segment to the next prediction window and the expected test end date is later than the current date (i.e. the segment of the last rolling round), the test end date of the newly generated segment will be allocated None value.

Then, when RollingGen calling self._update_task_segs(t, segments), handler_mod calculate the interval of hander's data end_date and the end date of the dataset's test_data's segments as follows:

cal_interval(
            task["dataset"]["kwargs"]["handler"]["kwargs"]["end_time"],
            task["dataset"]["kwargs"]["segments"][rolling_gen.test_key][1],
        )

Due to task["dataset"]["kwargs"]["segments"][rolling_gen.test_key][1] is None, the cal_interval raises TypeError but there is no code to handle it. Thus, the task["dataset"]["kwargs"]["handler"]["kwargs"]["end_time"] keeps its original value and finally causes incomplete data in the follow process.

How to fix it?

  • Force update hander's data end_date when the end date of the dataset's test_data's segments is None.
  • Please let me know if there is a better solusion.

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.
    Run this script to reproduce the problem. Please note: the dataset's version is 20251206.
#!/usr/bin/env python3
"""
Test program for OnlineManager

This program tests the add_strategy and routine methods of OnlineManager.
"""

import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

import pandas as pd
from typing import List, Dict
from qlib.workflow.online.manager import OnlineManager
from qlib.workflow.online.strategy import OnlineStrategy, RollingStrategy
from qlib.workflow.task.gen import RollingGen
from qlib.model.trainer import TrainerR
from qlib.workflow.recorder import Recorder
import qlib
qlib.init(provider_uri="~/.qlib/tushare_data/cn_data", region='cn')
def test_online_manager():
    """
    Test OnlineManager's add_strategy and routine methods
    """
    print("=== Testing OnlineManager ===")
    ###################################
    # online model
    ###################################

    online_segments = {
        "train": ("2025-05-22", "2025-09-09"),
        "valid": ("2025-09-10", "2025-10-14"),
        "test": ("2025-10-15", "2025-11-25"),
    }
    online_data_handler_config = {
        "start_time": online_segments["train"][0],
        "end_time": online_segments["test"][1],
        "fit_start_time": online_segments["train"][0],
        "fit_end_time": online_segments["train"][1],
        "instruments": 'csi300',
        "drop_raw": True
    }
    task = {
        "model": {
            "class": "LGBModel",
            "module_path": "qlib.contrib.model.gbdt",
            "kwargs": {
                "loss": "mse",
                "colsample_bytree": 0.8879,
                "learning_rate": 0.0421,
                "subsample": 0.8789,
                "lambda_l1": 205.6999,
                "lambda_l2": 580.9768,
                "max_depth": 8,
                "num_leaves": 210,
                "num_threads": 10,
                "verbosity": 2,
            }
        },
        "dataset": {
            "class": "DatasetH",
            "module_path": "qlib.data.dataset",
            "kwargs": {
                "handler": {
                    "class": "Alpha158",
                    "module_path": "qlib.contrib.data.handler",
                    "kwargs": online_data_handler_config,
                },
                "segments": online_segments,
            },
        },
        "record": [
            {
                "class": "SignalRecord",
                "module_path": "qlib.workflow.record_temp",
                "kwargs": {"dataset": "<DATASET>", "model": "<MODEL>"},
            },
            {"class": "SigAnaRecord", "module_path": "qlib.workflow.record_temp"},
        ],
        "strategy":{
            "rolling_step": 30
        }
    }
    strategy = RollingStrategy(
                    'test',
                    task,
                    RollingGen(step=task["strategy"]["rolling_step"], rtype=RollingGen.ROLL_SD),
                )
    print("Creating OnlineManager...")
    manager = OnlineManager(
        strategies=[],
        trainer=TrainerR()
    )
    
    print(f"Initial strategies count: {len(manager.strategies)}")
    
    # Test add_strategy method
    print("\n=== Testing add_strategy ===")    
    manager.add_strategy([strategy])
    print(f"Strategies count after add_strategy: {len(manager.strategies)}")
    
    # Test routine method
    print("\n=== Testing routine ===")
    test_time = pd.Timestamp("2025-12-06")
    manager.routine(cur_time=test_time, signal_kwargs={"over_write": True})
    
    print("\n=== Test completed successfully! ===")


if __name__ == "__main__":
    test_online_manager()

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
  2. Your own tests:
  • before
image
  • fixed
image

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant