diff --git a/Graph_Representation_Learning_Rushil_Singha/README.md b/Graph_Representation_Learning_Rushil_Singha/README.md index ead8c90..3187188 100644 --- a/Graph_Representation_Learning_Rushil_Singha/README.md +++ b/Graph_Representation_Learning_Rushil_Singha/README.md @@ -1,5 +1,8 @@ # JetNet Graph Diffusion Model +**Author**: Rushil Singha +**GSoC 2025 Project**: Graph-based diffusion models for realistic jet generation + A PyTorch/PyTorch-Geometric implementation of a **graph-based diffusion model** for generating realistic jets from the [JetNet dataset](https://huggingface.co/datasets/jetnet). This project builds **k-nearest neighbor (kNN) jet graphs**, learns **Chebyshev GCN (ChebNet) embeddings**, trains a **diffusion model in latent space**, and decodes back into particle-level jets. @@ -18,37 +21,148 @@ This project builds **k-nearest neighbor (kNN) jet graphs**, learns **Chebyshev ## โš™๏ธ Installation -Clone the repo and install dependencies: +### Prerequisites +- Python 3.8+ (tested on 3.9) +- CUDA 11.8+ (for GPU acceleration) +- At least 8GB RAM (16GB recommended) + +### Setup ```bash -git clone https://github.com/your-username/jetnet-graph-diffusion.git -cd jetnet-graph-diffusion +# Clone and navigate to project +git clone https://github.com/ML4SCI/GENIE.git +cd GENIE/Graph_Representation_Learning_Rushil_Singha +# Install dependencies pip install -r requirements.txt +``` + +**Note**: If you encounter PyTorch Geometric installation issues, install manually: +```bash +pip install torch==2.0.0+cu118 -f https://download.pytorch.org/whl/torch_stable.html +pip install torch-geometric torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-2.0.0+cu118.html +``` + +--- + +## ๐Ÿƒโ€โ™‚๏ธ Usage + +### Basic Run +```bash +python code.py +``` + +### What the script does: +1. **Downloads JetNet dataset** (~2GB) to `jetnet_data/` directory +2. **Preprocesses jets** - extracts particle features (eta, phi, pt) and masks +3. **Builds kNN graphs** - constructs k=8 nearest neighbor graphs for each jet +4. **Trains ChebNet encoder** - learns 64-dimensional latent representations +5. **Runs diffusion training** - trains denoising model in latent space +6. **Generates synthetic jets** - samples new jets from trained model +7. **Evaluates results** - computes KL divergence and Wasserstein distances +8. **Saves outputs** to `results/` directory + +### Expected Runtime +- **CPU**: 3-4 hours +- **GPU (RTX 3080+)**: 45-90 minutes +- **Memory usage**: 6-12GB RAM + +### Output Files +``` +results/ +โ”œโ”€โ”€ training_logs.txt # Training progress and losses +โ”œโ”€โ”€ generated_jets.png # Comparison plots +โ”œโ”€โ”€ evaluation_metrics.json # KL divergence, Wasserstein distances +โ”œโ”€โ”€ model_checkpoints/ # Saved model weights +โ””โ”€โ”€ jet_visualizations/ # Individual jet plots +``` -requirements.txt - -numpy==1.24.3 -torch==2.0.0 -torch-geometric -torch-scatter -torch-sparse -torch-cluster -networkx -scikit-learn -jetnet +--- + +## ๐Ÿ”ง Configuration + +Key parameters in `code.py`: +```python +# Graph construction +K_NEIGHBORS = 8 # kNN graph connectivity +LATENT_DIM = 64 # Embedding dimension + +# Training +BATCH_SIZE = 32 # Adjust based on GPU memory +LEARNING_RATE = 1e-4 # Adam optimizer learning rate +NUM_EPOCHS = 100 # Training epochs ``` -# This script: -->Encodes jets into latent space +--- -->Runs diffusion training +## ๐Ÿ“Š Expected Results -->Decodes jets back into particle space +**Good results show:** +- KL divergence < 0.1 for jet mass and pT distributions +- Wasserstein distance < 0.05 for particle multiplicity +- Generated jets visually similar to real jets in eta-phi space + +**If results are poor:** +- Increase training epochs (try 200+) +- Adjust learning rate (try 5e-5 or 2e-4) +- Check GPU memory usage (reduce batch size if needed) + +--- + +## ๐Ÿ› Troubleshooting + +**Common Issues:** + +1. **CUDA out of memory** + ```python + # Reduce batch size in code.py + BATCH_SIZE = 16 # or 8 + ``` + +2. **JetNet download fails** + ```bash + # Manual download alternative + wget https://zenodo.org/record/6975118/files/jetnet.tar.gz + tar -xzf jetnet.tar.gz + ``` + +3. **PyTorch Geometric errors** + ```bash + # Reinstall with specific CUDA version + pip uninstall torch-geometric torch-scatter torch-sparse + pip install torch-geometric -f https://data.pyg.org/whl/torch-2.0.0+cu118.html + ``` + +4. **Slow training on CPU** + - Expected behavior - consider using Google Colab or cloud GPU + - Reduce dataset size by modifying `num_particles=50` in `load_jetnet_data()` + +--- + +## ๐Ÿ“ˆ Performance Tips + +- **GPU acceleration**: Ensure CUDA is properly installed +- **Memory optimization**: Use gradient checkpointing for large models +- **Faster convergence**: Try learning rate scheduling +- **Better results**: Experiment with different graph construction methods (radius graphs, etc.) + +--- + +## ๐Ÿค Contributing + +Found a bug or want to improve the model? +1. Fork the repository +2. Create a feature branch +3. Make your changes +4. Submit a pull request + +--- -->Logs evaluation metrics +## ๐Ÿ“š References -->Saves visualizations to results/ +- [JetNet Dataset](https://huggingface.co/datasets/jetnet) +- [PyTorch Geometric Documentation](https://pytorch-geometric.readthedocs.io/) +- [Chebyshev Graph Convolutions](https://arxiv.org/abs/1606.09375) diff --git a/Non_local_Jet_Classification_Tanmay_Bakshi/readme.md b/Non_local_Jet_Classification_Tanmay_Bakshi/readme.md index e69de29..39d9af6 100644 --- a/Non_local_Jet_Classification_Tanmay_Bakshi/readme.md +++ b/Non_local_Jet_Classification_Tanmay_Bakshi/readme.md @@ -0,0 +1,105 @@ +# Non-local Jet Classification with Topological Features + +**Author**: Tanmay Bakshi +**GSoC 2025 Project**: Advanced jet classification using persistent homology and topological data analysis + +## Overview + +This project implements sophisticated neural network architectures for classifying particle jets, with a focus on capturing non-local geometric features through topological data analysis. The approach combines traditional jet features with persistent homology to improve classification performance on quark vs gluon discrimination tasks. + +## Dataset + +The project uses the **Quark Gluon Tagging Reference Dataset** by Kasieczka et al., featuring: +- 1.2M training events, 400k validation, 400k test events +- 14 TeV hadronic tops (signal) vs QCD dijets (background) +- Anti-kT 0.8 jets in pT range [550,650] GeV +- Leading 200 jet constituents stored per jet +- Constituents sorted by pT (highest first) + +## Project Structure + +``` +Non_local_Jet_Classification_Tanmay_Bakshi/ +โ”œโ”€โ”€ main.py # Main entry point +โ”œโ”€โ”€ datasets.py # Data loading utilities +โ”œโ”€โ”€ coordinates_extract.py # Feature extraction +โ”œโ”€โ”€ data_arrange.py # Data preprocessing +โ”œโ”€โ”€ preprocess_dask.py # Parallel preprocessing +โ”œโ”€โ”€ persistent_net-2.ipynb # Interactive demo notebook +โ”œโ”€โ”€ console/ # Console utilities +โ”œโ”€โ”€ helper/ # Helper functions +โ”œโ”€โ”€ nn/ # Neural network models +โ”œโ”€โ”€ persistence/ # Topological analysis +โ”œโ”€โ”€ scnn/ # Simplicial CNN implementation +โ””โ”€โ”€ Weaver/ # Weaver framework integration +``` + +## Quick Start + +### Prerequisites +- Python 3.8+ +- PyTorch 1.8+ +- awkward-array +- scikit-learn +- h5py +- pandas +- numpy + +### Installation +```bash +# Navigate to project directory +cd Non_local_Jet_Classification_Tanmay_Bakshi + +# Install dependencies (create requirements.txt if needed) +pip install torch awkward scikit-learn h5py pandas numpy matplotlib + +# For topological analysis +pip install gudhi # for persistent homology +``` + +### Running the Code + +**Option 1: Python Script** +```bash +python main.py +``` + +**Option 2: Interactive Notebook (Recommended)** +```bash +jupyter notebook persistent_net-2.ipynb +``` + +**Option 3: Data Preprocessing** +```bash +# For large datasets, use parallel preprocessing +python preprocess_dask.py +``` + +## Key Features + +- **Topological Feature Extraction**: Uses persistent homology to capture jet topology +- **Multi-scale Analysis**: Analyzes jets at different geometric scales +- **Advanced Architectures**: Implements Simplicial CNNs and graph-based methods +- **Weaver Integration**: Compatible with the Weaver framework for particle physics ML + +## Expected Outputs + +- Classification accuracy metrics +- ROC curves and performance plots +- Topological feature visualizations +- Model checkpoints in respective subdirectories + +## Troubleshooting + +**Common Issues:** +1. **Memory errors**: Reduce batch size or use `preprocess_dask.py` for large datasets +2. **Missing dependencies**: Install `gudhi` for topological analysis features +3. **CUDA errors**: Ensure PyTorch CUDA version matches your system + +## Citation + +If you use this code, please cite: +``` +Kasieczka, G., Plehn, T., Thompson, J., & Russell, M. +"Quark Gluon Tagging Reference Dataset" +``` \ No newline at end of file diff --git a/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/README.md b/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/README.md index d81806c..bc80524 100644 --- a/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/README.md +++ b/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/README.md @@ -1,80 +1,180 @@ -# PINNDE : Physics Informed Neural Networks for Diffusion Equation | GSoC 2025 +# PINNDE: Physics Informed Neural Networks for Diffusion Equation + +**Author**: Sijil Jose +**GSoC 2025 Project**: Fast sampling via reverse-time diffusion using PINNs ![ML4Sci@GSoC2024](https://miro.medium.com/v2/resize:fit:1100/format:webp/0*8KAp7eW2atsaRwdS.jpeg) -## Project Description : +## ๐ŸŽฏ Project Overview + +This project develops a proof-of-concept for building fast and reliable samplers by solving reverse-time diffusion equations using Physics-Informed Neural Networks (PINNs). PINNDE combines the high accuracy of diffusion models with the flexibility of physics-informed neural networks to sample from complicated and intractable distributions in multiple dimensions. + +### โœ… What Was Accomplished + +- โœ… Implemented accurate q-function approximation for reverse-time diffusion ODE +- โœ… Developed multiple PINN architectures for solving diffusion equations +- โœ… Validated on 1D, 2D, and 3D Gaussian Mixture Models +- โœ… Tested different optimization strategies for PINN training +- ๐Ÿšง Integration with Fast Calorimeter Challenge 2022 (in progress) + +--- + +## ๐Ÿš€ Quick Start + +### Prerequisites +- Python 3.8+ +- PyTorch 1.8+ +- NumPy, Matplotlib, SciPy +- Jupyter (for notebooks) + +### Installation +```bash +cd Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose + +# Install dependencies (create requirements.txt if needed) +pip install torch numpy matplotlib scipy jupyter corner +``` + +### Running the Code + +#### Option 1: Python Scripts (Advanced Users) +```bash +# 1D Gaussian Mixture Model +python flow_de/train_1d_GMM.py + +# 2D Gaussian Mixture Model +python flow_de/train_2d_GMM.py + +# 3D Gaussian Mixture Model +python flow_de/train_3d_GMM.py +``` +**Note**: Uncomment the last line in each script to run the optimizer. + +#### Option 2: Jupyter Notebooks (Recommended for Beginners) +```bash +cd "Jupyter Notebooks" + +# Start with 1D case +jupyter notebook FlowDE_PINN-1D_GMM.ipynb + +# Then try 2D and 3D +jupyter notebook FlowDE_PINN-2D_GMM.ipynb +jupyter notebook FlowDE_PINN-3D_GMM.ipynb +``` + +#### Option 3: Numerical Solver Demo +```bash +cd FlowDE +jupyter notebook FlowDE.ipynb # 1D numerical solution demo +``` + +--- + +## ๐Ÿ“ Project Structure + +``` +Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/ +โ”œโ”€โ”€ flow_de/ # Core implementation +โ”‚ โ”œโ”€โ”€ flow_de.py # qVectorField and FlowDE classes +โ”‚ โ”œโ”€โ”€ gendata.py # Data generation utilities +โ”‚ โ”œโ”€โ”€ networks_1d.py # 1D PINN architectures +โ”‚ โ”œโ”€โ”€ networks_2d.py # 2D PINN architectures +โ”‚ โ”œโ”€โ”€ networks_3d.py # 3D PINN architectures +โ”‚ โ””โ”€โ”€ train_*d_GMM.py # Training scripts +โ”œโ”€โ”€ Jupyter Notebooks/ # Interactive examples +โ”‚ โ”œโ”€โ”€ FlowDE_PINN-1D_GMM.ipynb # 1D demo with explanations +โ”‚ โ”œโ”€โ”€ FlowDE_PINN-2D_GMM.ipynb # 2D demo with visualizations +โ”‚ โ””โ”€โ”€ FlowDE_PINN-3D_GMM.ipynb # 3D demo with corner plots +โ”œโ”€โ”€ Figures/ # Result visualizations +โ”œโ”€โ”€ Tests/ # Unit tests (to be expanded) +โ””โ”€โ”€ slides_docs/ # Project documentation +``` + +--- + +## ๐ŸŽฏ Expected Results + +### Training Process +- **Runtime**: 30 minutes - 2 hours depending on dimension and complexity +- **Convergence**: Loss should decrease steadily over epochs +- **Memory**: 2-4GB RAM typically sufficient + +### Output Files +- **Model checkpoints**: `*.pth` files with trained parameters +- **Visualizations**: Comparison plots in `Figures/` directory +- **Trajectories**: ODE solution paths (PINN vs numerical solver) + +### Success Indicators +โœ… **Good Results:** +- Generated samples match target distribution visually +- Low residual loss for physics constraints +- Smooth trajectory plots without oscillations + +โš ๏ธ **Poor Results May Indicate:** +- Insufficient training epochs (try 5000+) +- Learning rate too high/low (try 1e-4 to 1e-3) +- Network architecture needs adjustment + +--- -The over arching goal of this project is to develop a proof of concept for building a fast and reliable sampler by solving reverse-time diffusion equation that leverages the high accuracy of diffusion models with the flexibility of physics-informed neural networks. PINNDE can be the basis of a fast, accurate, sampler of complicated and, or, intractable distributions in multiple dimensions. Encouraging results of the PINNDE method in 1, 2, and 3 dimensions are obtained. +## ๐Ÿ”ฌ Key Results Achieved -### What was Accomplished? -As part of GSoC 2025, I contributed to this project titled as 'PINNDE:Physics Informed Neural Networks for Diffusion Equation' with the organisation Machine Learnign for Science [ML4SCI](https://ml4sci.org/). I am working under the mentorship of Prof. Harrison Prosper, Prof. Pushpalatha Bhat, and Prof. Sergei Gleyzer. This project is part of the broader [GENIE](https://ml4sci.org/activities/gsoc2025.html) initiative within ML4SCI, which explores the use of machine learning techniques for anomaly detection and event generation in high-energy particle physics. -Our final goal is to test this method named PINNDE on toy examples and later move on to use it for devloping a fast simulations of particle jets. As part of GSoC 2025 i have finished the following tasks +### Distribution Matching +![Trained Distributions](Figures/trained_distributions.png) -- Implemented an accurate and stable approximant for q-function that is required for solving the reverse-time diffusion ODE. -- Implemented different PINN architechures to test the feasilibilty of PINNs to accurately solve the reverse-time diffusion ODE. -- Obtained satisfactory results on different probability distributions of 1 ,2 and 3 dimensions. -- Tested different optimisation strategies for traning PINNs -- Started using this new method on [Fast Calorimeter Challenge 2022 for benchmarking](https://calochallenge.github.io/homepage/) (coming Soon!!) +*Comparison of target distributions (black) vs PINNDE samples (blue) for 1D, 2D, and 3D cases* -Following are the relevant documents pertaining to this project. +### Trajectory Validation +![Normal Trajectories](Figures/normal_trajectories.png) +![Uniform Trajectories](Figures/uniform_trajectories.png) -- Code on GENIE Github Repository: [Link to official Repository](https://github.com/ML4SCI/GENIE/tree/main/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose) -- Code on my Github Repository (my fork) : [Link to my fork (branch PINNDE)](https://github.com/sijil-jose/GENIE/blob/PINNDE/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/README.md) -- Project Documentation: (final blog coming soon !!) +*PINN solutions (black) vs numerical Runge-Kutta solver (green) showing excellent agreement* -#### Other Important Documents: -- Initial project idea from ML4SCI : [ML4SCI LinK](https://ml4sci.org/gsoc/2025/proposal_GENIE5.html) -- My project proposal : [Proposal](https://github.com/sijil-jose/GENIE/blob/PINNDE/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/slides_docs/GSOC_2025_Project_Proposal_Sijil_Jose.pdf) -- GSoC Abstract : [Abstract](https://summerofcode.withgoogle.com/programs/2025/projects/uGmyAV1q) -- Mid Term blog summarising the project : [PINNDE mid-term blog](https://medium.com/@sijiljose.999/gsoc-2025-with-ml4sci-part-i-physics-informed-neural-network-for-diffusion-equation-pinnde-491d46a5b84d) -- Final Document : (Coming Soon!!) -- Midterm Lighting Talk : [Midterm slides](https://github.com/sijil-jose/GENIE/blob/PINNDE/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/slides_docs/Mid-term_slides.pdf) +--- -### Next Steps -- Finish implementing this method for Fast Calorimeter Challenge -- Explore Other PINN and operator learning frameworks. -- Add more unit tests for the files +## ๐Ÿ› ๏ธ Troubleshooting -### My Contributions: +**Training doesn't converge:** +- Increase number of collocation points +- Adjust learning rate (try 5e-4) +- Check physics loss weighting -Initally I had written a detailed [proposal](https://github.com/sijil-jose/GENIE/blob/PINNDE/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/slides_docs/GSOC_2025_Project_Proposal_Sijil_Jose.pdf) outlining my plans for the project and also finshed a [test task](https://github.com/sijil-jose/GENIE/tree/PINNDE/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/Initial_test). The following is the code developed during the GSoC 2025 coding period. -- Code on GENIE Github Repository: [Link to official Repository](https://github.com/ML4SCI/GENIE/tree/main/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose) -- Code on my Github Repository (my fork) : [Link to my fork (branch PINNDE)](https://github.com/sijil-jose/GENIE/blob/PINNDE/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/README.md) +**Memory issues:** +- Reduce batch size in training scripts +- Use CPU instead of GPU for smaller problems -I had also worked on documenting my work in the form of blogs and stared compiling the results we obtained into an article, which can be found below. -- Mid Term blog summarising the project : [PINNDE mid-term blog](https://medium.com/@sijiljose.999/gsoc-2025-with-ml4sci-part-i-physics-informed-neural-network-for-diffusion-equation-pinnde-491d46a5b84d) -- Final Document : (Coming Soon!!) -- Preprint of the article: (Coming Soon !!) +**Poor sample quality:** +- Increase training epochs +- Verify target distribution implementation +- Check boundary conditions -### Description of Directories and files: -- ```flow_de``` : directory containing the files and scripts to train different models - - ``` flow_de.py``` : python file containg the classes named ```class qVectorField``` and ```class FlowDE``` for definig the q-function and numerically solving the reverse-time diffusion equation. - - ```gendata.py``` : python file continaing functions to sample from 1 , 2 and 3 dimensional distibutions considered in this project. - - ``` networks_1d.py``` , ``` networks_2d.py``` and ``` networks_3d.py``` : python files containing majority of the pytorch functions required to defining and training the neural networks for different cases. - - ```train_1d_GMM.py```, ```train_2d_GMM.py```, ```train_3d_GMM.py``` : python files to train the PINNDE models. (uncomment the last line to run the optimiser ) +--- -- ```Jupyter Notebooks``` : contains the respective jupyter notebooks with more detailed explainations for each case -- ```FlowDE``` : contained ```FlowDE.ipynb``` a jupyternotebook with code to numerically solve the reverse-time diffusion equation for a 1D case. -- ```slides_docs``` : containes some pdf documents related to this project -- ```Tests``` : python files with unit tests for each functions. (To be updated) -- ```Figures```: contains some plots describing the results obtained in this project. -- ```README.md```: This documentation file -- ```Initial_test```: This directory contains all the files sumbitted as part of the initial tests as part of the application for GSoC 2025. +## ๐Ÿ“š Documentation & Resources +### Project Links +- [Official Repository](https://github.com/ML4SCI/GENIE/tree/main/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose) +- [Author's Fork](https://github.com/sijil-jose/GENIE/blob/PINNDE/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/README.md) +- [Mid-term Blog](https://medium.com/@sijiljose.999/gsoc-2025-with-ml4sci-part-i-physics-informed-neural-network-for-diffusion-equation-pinnde-491d46a5b84d) -# Overview of interesting results obtained during this program (Plots): +### Academic References +- [Original ML4SCI Proposal](https://ml4sci.org/gsoc/2025/proposal_GENIE5.html) +- [GSoC Abstract](https://summerofcode.withgoogle.com/programs/2025/projects/uGmyAV1q) +- [Fast Calorimeter Challenge](https://calochallenge.github.io/homepage/) -### Plots comparing the target distributions and distributions obtained from the trained model: +--- -![Trained Distibtuions](https://github.com/sijil-jose/GENIE/blob/PINNDE/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/Figures/trained_distributions.png) +## ๐Ÿ”ฎ Future Work -These figures compares samples generated by the trained PINNDE network with those from the reference distribution used during training. The reference distribution is shown in black, -while samples from the PINNDE model are shown in blue. The two-dimensional and three-dimensional cases are visualized using corner plots: the diagonal panels display the marginalized one-dimensional distributions, while the off-diagonal panels illustrate the pairwise joint distributions +- Complete Fast Calorimeter Challenge integration +- Explore advanced PINN architectures (DeepONet, etc.) +- Add comprehensive unit test coverage +- Benchmark against other sampling methods -### Plot comparing the trajectories obtained by solving the reverse time ODE using PINN and Numerical Solvers: -![Normal](https://github.com/sijil-jose/GENIE/blob/PINNDE/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/Figures/normal_trajectories.png) +--- -![Uniform](https://github.com/sijil-jose/GENIE/blob/PINNDE/Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/Figures/uniform_trajectories.png) +## ๐Ÿ™ Acknowledgments -This plot compares the solution trajectories of reverse-time diffusion ODE solution obtained by numerically solving the ODE using the 2nd โˆ’order Runge-Kutta solver with the solution predicted -from the trained PINN. ( These are different points from the collocation points used for training). The solutions from PINN are plotted in black and solutions from the Runge-Kutta solver is plotted in green. +**Mentors**: Prof. Harrison Prosper, Prof. Pushpalatha Bhat, Prof. Sergei Gleyzer +**Organization**: [ML4SCI](https://ml4sci.org/) - Machine Learning for Science +**Program**: Google Summer of Code 2025 diff --git a/README.md b/README.md index 3c9c0da..36d83b7 100644 --- a/README.md +++ b/README.md @@ -1 +1,365 @@ -# GENIE +# GENIE - Generative Networks for Interpretable Event Generation + +[![ML4SCI](https://img.shields.io/badge/ML4SCI-GSoC-blue)](https://ml4sci.org/) +[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE) + +**GENIE** is a collection of machine learning projects developed as part of Google Summer of Code (GSoC) with [Machine Learning for Science (ML4SCI)](https://ml4sci.org/). This repository contains cutting-edge implementations of generative models, physics-informed neural networks, and graph-based learning techniques applied to high-energy particle physics and scientific computing. + +--- + +## ๐Ÿ“‹ Table of Contents + +- [Overview](#overview) +- [Repository Structure](#repository-structure) +- [Projects](#projects) + - [Graph Representation Learning](#1-graph-representation-learning) + - [Non-local Jet Classification](#2-non-local-jet-classification) + - [Physics-Informed Neural Networks for Diffusion Equation](#3-physics-informed-neural-networks-for-diffusion-equation) +- [Getting Started](#getting-started) +- [Usage Instructions](#usage-instructions) +- [Contributing](#contributing) +- [License](#license) +- [Acknowledgments](#acknowledgments) + +--- + +## ๐ŸŒŸ Overview + +GENIE explores the intersection of machine learning and particle physics, focusing on: +- **Event Generation**: Creating realistic particle physics events using generative models +- **Anomaly Detection**: Identifying rare or unusual patterns in high-energy physics data +- **Fast Simulation**: Developing efficient alternatives to traditional Monte Carlo simulations +- **Graph-based Learning**: Leveraging graph neural networks for particle jet analysis + +Each subproject in this repository represents a complete GSoC contribution with its own methodology, implementation, and results. + +--- + +## ๐Ÿ“ Repository Structure + +``` +GENIE/ +โ”œโ”€โ”€ Graph_Representation_Learning_Rushil_Singha/ +โ”‚ โ”œโ”€โ”€ code.py # Main implementation +โ”‚ โ”œโ”€โ”€ requirements.txt # Python dependencies +โ”‚ โ””โ”€โ”€ README.md # Project-specific documentation +โ”‚ +โ”œโ”€โ”€ Non_local_Jet_Classification_Tanmay_Bakshi/ +โ”‚ โ”œโ”€โ”€ main.py # Entry point +โ”‚ โ”œโ”€โ”€ datasets.py # Data loading utilities +โ”‚ โ”œโ”€โ”€ persistent_net-2.ipynb # Jupyter notebook demo +โ”‚ โ””โ”€โ”€ readme.md # Project-specific documentation +โ”‚ +โ”œโ”€โ”€ Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose/ +โ”‚ โ”œโ”€โ”€ flow_de/ # Core implementation +โ”‚ โ”œโ”€โ”€ Jupyter Notebooks/ # Interactive examples +โ”‚ โ”œโ”€โ”€ Figures/ # Result visualizations +โ”‚ โ””โ”€โ”€ README.md # Project-specific documentation +โ”‚ +โ””โ”€โ”€ README.md # This file +``` + +--- + +## ๐Ÿš€ Projects + +### 1. Graph Representation Learning +**Author**: Rushil Singha +**Focus**: Graph-based diffusion models for jet generation + +A PyTorch/PyTorch-Geometric implementation that constructs k-nearest neighbor graphs from particle jets, learns Chebyshev GCN embeddings, and trains diffusion models in latent space to generate realistic jets from the JetNet dataset. + +**Key Features**: +- kNN graph construction from particle clouds +- Chebyshev Graph Convolutional Networks (ChebNet) +- Latent diffusion with denoising MLP +- Evaluation using KL divergence & Wasserstein distance + +--- + +### 2. Non-local Jet Classification +**Author**: Tanmay Bakshi +**Focus**: Advanced jet classification using topological and non-local features + +This project implements sophisticated neural network architectures for classifying particle jets, incorporating persistent homology and topological data analysis to capture non-local geometric features. + +**Key Features**: +- Persistent homology-based feature extraction +- Advanced neural network architectures +- Integration with Weaver framework +- Topological data analysis for jet classification + +--- + +### 3. Physics-Informed Neural Networks for Diffusion Equation +**Author**: Sijil Jose +**Focus**: PINNDE - Fast sampling via reverse-time diffusion + +Develops a proof-of-concept for building fast and reliable samplers by solving reverse-time diffusion equations using Physics-Informed Neural Networks (PINNs). Successfully demonstrated on 1D, 2D, and 3D probability distributions. + +**Key Features**: +- Accurate q-function approximation for reverse-time diffusion +- Multiple PINN architectures tested +- Validated on Gaussian Mixture Models in 1D, 2D, and 3D +- Fast Calorimeter Challenge 2022 benchmarking (in progress) + +--- + +## ๐Ÿ› ๏ธ Getting Started + +### Prerequisites + +- **Python**: 3.8 or higher (3.9+ recommended) +- **pip**: Latest version (22.0+) +- **Git**: For cloning the repository +- **CUDA**: Optional but recommended for GPU acceleration (CUDA 11.8+ for PyTorch compatibility) +- **Memory**: At least 8GB RAM (16GB+ recommended for larger datasets) +- **Storage**: At least 5GB free space for datasets and model checkpoints + +### System Requirements by Project + +| Project | Python | GPU Memory | Estimated Runtime | +|---------|--------|------------|-------------------| +| Graph Representation Learning | 3.8+ | 4GB+ (optional) | 2-4 hours | +| Non-local Jet Classification | 3.8+ | 6GB+ (recommended) | 1-3 hours | +| Physics-Informed Neural Networks | 3.8+ | 2GB+ (optional) | 30min-2 hours | + +### Installation + +1. **Clone the repository**: + ```bash + git clone https://github.com/ML4SCI/GENIE.git + cd GENIE + ``` + +2. **Choose a project** and navigate to its directory: + ```bash + cd Graph_Representation_Learning_Rushil_Singha + # OR + cd Non_local_Jet_Classification_Tanmay_Bakshi + # OR + cd Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose + ``` + +3. **Install project-specific dependencies**: + ```bash + pip install -r requirements.txt + ``` + + > **Note**: Each project has its own `requirements.txt` file. Make sure you're in the correct project directory. + +--- + +## ๐Ÿ“– Usage Instructions + +### Graph Representation Learning + +```bash +cd Graph_Representation_Learning_Rushil_Singha +pip install -r requirements.txt +python code.py +``` + +**What it does**: +- Downloads and preprocesses JetNet dataset +- Constructs kNN graphs from particle jets +- Trains Chebyshev GCN encoder +- Runs diffusion model training +- Generates synthetic jets +- Evaluates with KL divergence and Wasserstein distance +- Saves visualizations to `results/` + +**Expected Output**: Training logs, evaluation metrics, and visualization plots in the `results/` directory. + +**Estimated Runtime**: 2-4 hours on CPU, 30-60 minutes with GPU + +**Key Output Files**: +- `results/training_logs.txt` - Training progress and metrics +- `results/generated_jets.png` - Visualization of generated vs real jets +- `results/evaluation_metrics.json` - KL divergence and Wasserstein distances + +--- + +### Non-local Jet Classification + +```bash +cd Non_local_Jet_Classification_Tanmay_Bakshi +pip install -r requirements.txt # If available +python main.py +``` + +**Alternative - Jupyter Notebook**: +```bash +jupyter notebook persistent_net-2.ipynb +``` + +**What it does**: +- Loads and preprocesses jet datasets +- Extracts topological features using persistent homology +- Trains classification models +- Evaluates model performance + +**Expected Output**: Model checkpoints, classification metrics, and performance visualizations. + +--- + +### Physics-Informed Neural Networks for Diffusion Equation + +#### Option 1: Python Scripts + +```bash +cd Physics_Informed_Neural_Network_Diffusion_Equation_Sijil_Jose +pip install -r requirements.txt # If available + +# For 1D Gaussian Mixture Model +python flow_de/train_1d_GMM.py + +# For 2D Gaussian Mixture Model +python flow_de/train_2d_GMM.py + +# For 3D Gaussian Mixture Model +python flow_de/train_3d_GMM.py +``` + +> **Note**: Uncomment the last line in each training script to run the optimizer. + +#### Option 2: Jupyter Notebooks (Recommended for beginners) + +```bash +cd "Jupyter Notebooks" +jupyter notebook FlowDE_PINN-1D_GMM.ipynb +# OR +jupyter notebook FlowDE_PINN-2D_GMM.ipynb +# OR +jupyter notebook FlowDE_PINN-3D_GMM.ipynb +``` + +**What it does**: +- Trains PINN to solve reverse-time diffusion ODE +- Generates samples from target distributions +- Compares PINN solutions with numerical solvers +- Visualizes trajectories and distributions + +**Expected Output**: +- Trained model checkpoints +- Comparison plots between target and generated distributions +- Trajectory visualizations in `Figures/` directory + +--- + +## ๐Ÿ”ง Troubleshooting + +### Common Issues Across Projects + +#### Installation Problems +**Issue**: PyTorch installation fails or CUDA version mismatch +```bash +# Solution: Install specific PyTorch version +pip install torch==2.0.0+cu118 -f https://download.pytorch.org/whl/torch_stable.html +``` + +**Issue**: `ModuleNotFoundError` for project-specific packages +```bash +# Solution: Ensure you're in the correct project directory +cd Graph_Representation_Learning_Rushil_Singha # or other project +pip install -r requirements.txt +``` + +#### Runtime Issues +**Issue**: CUDA out of memory errors +- Reduce batch size in training scripts +- Use CPU-only mode: `export CUDA_VISIBLE_DEVICES=""` +- Close other GPU-intensive applications + +**Issue**: Dataset download failures +- Check internet connection +- For JetNet: datasets auto-download to `jetnet_data/` directory +- Manual download links available in individual project READMEs + +#### Performance Issues +**Issue**: Very slow training on CPU +- Expected behavior for deep learning models +- Consider using Google Colab, Kaggle, or cloud GPU services +- Reduce dataset size for testing (modify `num_particles` parameters) + +### Project-Specific Help + +| Issue Type | Graph Representation | Jet Classification | Physics-Informed NN | +|------------|---------------------|-------------------|-------------------| +| Memory errors | Reduce `BATCH_SIZE` | Use `preprocess_dask.py` | Reduce collocation points | +| Slow convergence | Increase epochs to 200+ | Check data preprocessing | Adjust learning rate | +| Poor results | Try different K values | Verify dataset format | Increase PINN depth | + +### Getting Help + +1. **Check individual project READMEs** for specific troubleshooting +2. **Open an issue** on GitHub with: + - Your operating system and Python version + - Complete error message + - Steps to reproduce the problem +3. **Join ML4SCI discussions** for community support + +--- + +## ๐Ÿค Contributing + +We welcome contributions! Here's how you can help: + +1. **Fork the repository** on GitHub +2. **Create a new branch** for your feature: + ```bash + git checkout -b feature/your-feature-name + ``` +3. **Make your changes** and commit: + ```bash + git add . + git commit -m "Description of your changes" + ``` +4. **Push to your fork**: + ```bash + git push origin feature/your-feature-name + ``` +5. **Open a Pull Request** on the main repository + +### Contribution Guidelines + +- Follow the existing code style and structure +- Add documentation for new features +- Include tests where applicable +- Update the README if you add new functionality +- Reference any related issues in your PR description + +--- + +## ๐Ÿ“„ License + +This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details. + +--- + +## ๐Ÿ™ Acknowledgments + +- **Google Summer of Code (GSoC)** for funding and support +- **ML4SCI Organization** for mentorship and guidance +- **Mentors**: Prof. Harrison Prosper, Prof. Pushpalatha Bhat, Prof. Sergei Gleyzer, and others +- **Contributors**: Rushil Singha, Tanmay Bakshi, Sijil Jose + +### Related Links + +- [ML4SCI Website](https://ml4sci.org/) +- [GSoC 2025 Projects](https://ml4sci.org/activities/gsoc2025.html) +- [JetNet Dataset](https://huggingface.co/datasets/jetnet) +- [Fast Calorimeter Challenge](https://calochallenge.github.io/homepage/) + +--- + +## ๐Ÿ“ž Contact & Support + +For questions, issues, or discussions: +- **Open an issue** on this repository +- **Visit** [ML4SCI](https://ml4sci.org/) for more information +- **Check** individual project READMEs for project-specific documentation + +--- + +**Made with โค๏ธ by the ML4SCI community**