Skip to content

[ENH] expose encoder scaling parameters and add inverse_scaling utility in TimeSeriesDataSet#2280

Open
cngmid wants to merge 11 commits into
sktime:mainfrom
cngmid:enh-encoder-scaling
Open

[ENH] expose encoder scaling parameters and add inverse_scaling utility in TimeSeriesDataSet#2280
cngmid wants to merge 11 commits into
sktime:mainfrom
cngmid:enh-encoder-scaling

Conversation

@cngmid

@cngmid cngmid commented May 11, 2026

Copy link
Copy Markdown
Contributor

Fixes PR #2271

[ENH] Encoder scaling parameters and inverse-scaling in TimeSeriesDataSet

Note: This PR currently includes only the core implementation.
I will add tests and a tutorial notebook once the maintainers confirm the API and naming.
This avoids rewriting tests/tutorials if design changes are requested.

Summary

This PR introduces a set of enhancements to TimeSeriesDataSet that make encoder/decoder scaling fully transparent and invertible. The goal is to improve interpretability, debugging, and downstream analysis by exposing scaling parameters and providing a clean inverse‑scaling utility.

These changes are fully backward‑compatible and optional for users who do not need custom scaling.


Motivation

TimeSeriesDataSet applies scaling internally (e.g., EncoderNormalizer, StandardScaler, or custom normalizers), but the resulting scale parameters are not currently exposed in a structured way. This makes it difficult to:

  • inspect model inputs in their original units

  • debug model behavior

  • compare predictions to raw data

  • export interpretable results

  • mix sklearn scalers with internal normalizers

This PR addresses these limitations by exposing encoder scaling parameters and providing a robust inverse‑scaling mechanism.


What This PR Adds

1. Public access to encoder scaling parameters

The dataset now exposes:

  • x_scale (n_scalers, 2) — scale values used internally, and

  • x_scale_idx (n_scalers,) — mapping from feature index → scale index.

The corresponding dataloader exposes:

  • encoder_scale (n_batch, n_scalers, 2) — scale values used internally, and

  • encoder_scale_idx (n_batch, n_scalers) — mapping from feature index → scale index.

These attributes allow users to inspect exactly how each feature was scaled.

2. A general inverse‑scaling utility

When users define mixed scaling strategies:

scalers = {
    "feature_a": StandardScaler(),
    "feature_b": None,                 # unscaled
    "feature_c": EncoderNormalizer(),  # pytorch-forecasting internal scaler
}

dataset = TimeSeriesDataSet(
    data=df,
    time_idx="time_idx",
    target="y",
    group_ids=["series"],
    max_encoder_length=24,
    max_prediction_length=12,
    time_varying_known_reals=["feature_a", "feature_b", "feature_c"],
    scalers=scalers,
)

a new method:

dataset.inverse_transform(x, target_scale, scale_idx, scale)

# OR (TBD)
dataset.inverse_transform(x)

supports:

  • StandardScaler

  • EncoderNormalizer

  • unscaled features

  • per‑sample scale parameters

  • both dataset items and dataloader batches

This makes it easy to reconstruct original values for visualization, debugging, or exporting predictions.

3. A tutorial notebook

A new example notebook demonstrates:

  • creating a dataset with mixed scalers

  • inspecting encoder scaling parameters

  • inverse‑transforming dataset items

  • inverse‑transforming dataloader batches

  • validating reconstruction accuracy

  • visual comparison of original vs. reconstructed values


Example Usage

Inverse‑transforming a dataset item

x, _ = list(dataset)[0]

x_cont_original = dataset.inverse_transform(
    x['x_cont'],
    x['target_scale'],
    x['x_scale_idx'],
    x['x_scale']
)

# OR (TBD)
x_cont_original = dataset.inverse_transform(x)

Inverse‑transforming a batch

batch = next(iter(dataset.to_dataloader(batch_size=32)))
x, _ = batch

encoder_cont_original = dataset.inverse_transform(
    x['encoder_cont'],
    x['target_scale'],
    x['encoder_scale_idx'],
    x['encoder_scale']
)

# OR (TBD)
encoder_cont_original = dataset.inverse_transform(x)

Backward Compatibility

Existing behavior is unchanged for users who do not pass custom scalers.

  • All new functionality is opt‑in.

  • No breaking API changes.


Next Steps

I’m happy to refine the API, add more tests, or adjust the tutorial based on maintainer feedback.
Thanks for reviewing — looking forward to iterating on this together.

@cngmid

cngmid commented May 11, 2026

Copy link
Copy Markdown
Contributor Author

Hi @andersendsa and @phoeenniixx — the initial implementation is ready for API review.

This Draft PR currently includes only the core changes to TimeSeriesDataSet.
Once the API and naming are confirmed, I will add:

  • tests
  • documentation updates
  • a tutorial notebook (as requested)

Looking forward to your feedback!

@phoeenniixx phoeenniixx left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think the code-quality is failing, please use pre-commit to solve this and other tests would run that would help us in reviewing as well

@codecov

codecov Bot commented May 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.82609% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@124bd8f). Learn more about missing BASE report.

Files with missing lines Patch % Lines
pytorch_forecasting/data/timeseries/_timeseries.py 97.82% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2280   +/-   ##
=======================================
  Coverage        ?   87.19%           
=======================================
  Files           ?      167           
  Lines           ?     9796           
  Branches        ?        0           
=======================================
  Hits            ?     8542           
  Misses          ?     1254           
  Partials        ?        0           
Flag Coverage Δ
cpu 87.19% <97.82%> (?)
pytest 87.19% <97.82%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cngmid cngmid requested a review from phoeenniixx May 12, 2026 13:16
@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@cngmid cngmid marked this pull request as ready for review May 18, 2026 08:14
@cngmid

cngmid commented May 18, 2026

Copy link
Copy Markdown
Contributor Author

Hi, all requested changes have been implemented, including a test function and a notebook added to the docs/source/tutorials and the branch is up to date.
Please let me know if anything else is needed.

@cngmid

cngmid commented May 18, 2026

Copy link
Copy Markdown
Contributor Author

All requested changes have been implemented, including the multi‑target fix, updated tests, and the revised tutorial.
The branch is up to date and ready for another review.
Please let me know if anything else is needed.

@cngmid cngmid changed the title ENH: expose encoder scaling parameters and add inverse_scaling utility in TimeSeriesDataSet [ENH] expose encoder scaling parameters and add inverse_scaling utility in TimeSeriesDataSet May 19, 2026
@cngmid

cngmid commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@phoeenniixx
Do you need anything more? I see that the merging is blocked.

@cngmid

cngmid commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Hi @phoeenniixx — I’ve re‑requested your review.
All requested changes are implemented and all checks are passing.
Could you please update your earlier “Changes requested” review when you have a moment?

@phoeenniixx phoeenniixx left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THanks! sorry for the late reply

Pinging @fkiraly @agobbifbk for their review. For me it looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants