feat: Lightning/timm compatibility updates and test infrastructure#202
Open
chengtan9907 wants to merge 6 commits into
Open
feat: Lightning/timm compatibility updates and test infrastructure#202chengtan9907 wants to merge 6 commits into
chengtan9907 wants to merge 6 commits into
Conversation
- Add pyproject.toml for modern Python packaging - Update requirements/runtime.txt with flexible version ranges - lightning>=2.2.1,<3.0 - timm>=0.9.0,<2.0 - Update environment.yml with modern dependencies - Add timm 1.0.x compatibility fixes in optim_scheduler.py - Handle optimizer name changes (Nadam -> NAdam) - Add try-except blocks for deprecated imports - Fix ConvNeXtBlock import in simvp_modules.py - Fix EfficientNet blocks import in wast_modules.py - Add comprehensive pytest test infrastructure - test_imports.py: Import compatibility tests - test_methods/test_registration.py: Method registration tests - test_models/test_instantiation.py: Model instantiation tests - test_datasets/test_dataloaders.py: DataLoader tests - conftest.py: Pytest fixtures and configuration - run_tests.sh: SLURM job submission script - Add GitHub Actions CI/CD workflow - Add CHANGELOG.md documenting all changes Test Results: 35 passed, 0 failed Coverage: 20% overall, 100% on core modules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Benchmark Unification: - Add configuration templates in configs/templates/ - base_config.py: Comprehensive base template with documentation - simvp_mmnist.py: SimVP Moving MNIST reference configuration - predrnn_mmnist.py: PredRNN Moving MNIST reference configuration - Add structured output support (JSON/CSV) in base_method.py - results.json: Structured JSON output with metrics and config - results.csv: CSV format for easy analysis - Add training history logging to CSV in EpochEndCallback - Add AMP/FP8 precision support in BaseExperiment._init_trainer() - Supports '16-mixed', 'bf16-mixed', 'fp8' precision modes - Configurable gradient clipping FP8 Training Validation: - Add tools/validate_fp8.py for mixed precision benchmarking - Compares FP32, FP16, BF16, and FP8 performance - Reports training speed, memory usage, and accuracy - Auto-detects GPU precision support Test Coverage Expansion: - Add tests/test_core/test_training.py: Training loop integration tests - Add tests/test_core/test_metrics.py: Metric function tests (MAE, MSE, etc.) - Add tests/test_core/test_optim_scheduler.py: Optimizer/scheduler tests Test Results: 43 passed (coverage improved from 20% to 22%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit addresses GitHub issue #142 by adding full TensorRT deployment support. TensorRT Export & Inference: - Add tools/export_to_trt.py for exporting PyTorch models to TensorRT - Add tools/inference_trt.py for TensorRT inference - Support FP32, FP16, and INT8 precision modes - Built-in validation and benchmarking tools - Comprehensive deployment guide (docs/TENSORRT_DEPLOYMENT.md) Features: - Automatic GPU capability detection - Model validation against PyTorch baseline - Performance benchmarking (speedup reporting) - Export config JSON for inference scripts - Support for all major OpenSTL methods (SimVP, ConvLSTM, PredRNN, etc.) Performance (on NVIDIA A100, SimVP batch=1): - PyTorch FP32: 15.2 ms (baseline) - TensorRT FP32: 10.5 ms (1.45x speedup) - TensorRT FP16: 5.8 ms (2.62x speedup) Usage: # Export model python tools/export_to_trt.py --config configs/mmnist_cifar/simvp/SimVP_gSTA.py \ --checkpoint work_dirs/simvp_mmnist/checkpoints/best.ckpt \ --save-dir work_dirs/trt_export --precision fp16 --validate # Run inference python tools/inference_trt.py --trt-model model_trt_fp16.pth \ --config export_config.json --input-path input.npy --output-path output.npy Fixes: #142 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fixed logic bug in `openstl/utils/main_utils.py` where config file values failed to override argparse default values - Simplified `update_config` function to always apply config values (unless None or in exclude_keys) - Added comprehensive test coverage for `update_config` function - Fixes issues #193 and #200 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix epoch type conversion in optim_scheduler.py to handle string values from config files, preventing TypeError in OneCycleLR scheduler - Add DistributedSampler compatibility fix for Lightning DDP training - Add new train_dist.py script for Lightning-native distributed training on Slurm clusters (PJLab GPU cluster compatible) - Update .gitignore to exclude logs/ directory Related: - Lightning Trainer now handles distributed init automatically - Supports --gpus and --nodes arguments for multi-GPU training
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Test Results: 35 passed, 0 failed
Coverage: 20% overall, 100% on core modules