Follow-up to #135: Add dataset validation checks and config file for reproducibility

**Follow-up to #135**

PR #136 resolved the core dataset setup issues. During the discussion on #135, 
I suggested two additional improvements that the maintainer agreed would further 
improve robustness and reproducibility.

**Proposed Improvements**

1. Dataset Validation Script (`validate_dataset.py`)
Before training begins, a lightweight validation step should verify:
- HR and LR image counts match
- Filenames are consistent across HR and LR folders
- Image shapes are as expected (based on scale factor)

This prevents subtle misalignment errors that won't raise immediate exceptions 
but silently corrupt training.

2. Configuration File (`config.yaml`)
Dataset-related parameters like scale factor and patch size are currently 
implicit/hardcoded. Externalizing them into a config file would:
- Make experiments more transparent
- Improve reproducibility across different setups
- Allow contributors to change parameters without touching core training logic
 
**Reference**

- Closes follow-up from #135
- Maintainer acknowledged these suggestions in the issue thread

I'd like to work on this
I'm a GSoC 2026 applicant and happy to submit a PR implementing both of these. 
Can I be assigned?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up to #135: Add dataset validation checks and config file for reproducibility #196

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Follow-up to #135: Add dataset validation checks and config file for reproducibility #196

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions