Skip to content

Follow-up to #135: Add dataset validation checks and config file for reproducibility #196

@vignathi123-vi

Description

@vignathi123-vi

Follow-up to #135

PR #136 resolved the core dataset setup issues. During the discussion on #135,
I suggested two additional improvements that the maintainer agreed would further
improve robustness and reproducibility.

Proposed Improvements

  1. Dataset Validation Script (validate_dataset.py)
    Before training begins, a lightweight validation step should verify:
  • HR and LR image counts match
  • Filenames are consistent across HR and LR folders
  • Image shapes are as expected (based on scale factor)

This prevents subtle misalignment errors that won't raise immediate exceptions
but silently corrupt training.

  1. Configuration File (config.yaml)
    Dataset-related parameters like scale factor and patch size are currently
    implicit/hardcoded. Externalizing them into a config file would:
  • Make experiments more transparent
  • Improve reproducibility across different setups
  • Allow contributors to change parameters without touching core training logic

Reference

I'd like to work on this
I'm a GSoC 2026 applicant and happy to submit a PR implementing both of these.
Can I be assigned?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions