Skip to content

GPU#85

Open
mailhexu wants to merge 20 commits into
mainfrom
gpu
Open

GPU#85
mailhexu wants to merge 20 commits into
mainfrom
gpu

Conversation

@mailhexu
Copy link
Copy Markdown
Owner

@mailhexu mailhexu commented Apr 2, 2026

Option to use GPU accerlaration with JAX.

- Add --spin-conf CLI option and spin_conf TOML parameter for specifying magnetic moments
- Refactor: extract prepare_magnon_from_params to reduce code duplication
- Fix create_plot_script to write to correct output directory
- Add magnon_theory.md documentation
- Add examples directory with scripts and config files
- Add comprehensive tests for magnon functionality
- Set default=True for --no-Jiso, --no-Jani, --no-DMI, --no-SIA CLI args
  to ensure all interactions are enabled by default
- Remove path prepending for spin_conf_file and uz_file; paths are now
  relative to current working directory, not TB2J results path
- Add combined J tensor output in exchange.out showing J = Jiso*I + DMI + Jani
- Document combined tensor formula and provide verified example in docs
- Fix type hints for Optional[str] in MagnonParameters
- Create MAEGreenGPU class using JAX for GPU acceleration
- JAX is an optional dependency (lazy loading)
- Implements GPU-accelerated:
  * Green's function computation with vmap
  * Spinor matrix rotation
  * Parallel angle computation
- Add use_gpu option to siesta interface
- Add --use_gpu CLI argument to siesta2J

The API is fully compatible with MAEGreen.
- Create ExchangeNCLGPU class using JAX for GPU acceleration
- JAX is an optional dependency (lazy loading)
- Implements GPU-accelerated:
  * Pauli block decomposition
  * A tensor computation with einsum
  * Vectorized operations over R vectors and atom pairs
  * Orbital-resolved A tensor computation
- Same API as ExchangeNCL with use_gpu parameter
Features:
- Add --use_gpu flag to enable GPU acceleration (opt-in, no auto-detection)
- GPU-accelerated eigenvalue/eigenvector computation using Cholesky decomposition
- GPU-accelerated Green's function and A-tensor computation
- JIT-compiled kernels for Pauli decomposition and tensor contractions
- Support for non-orthogonal basis (overlap matrix S)
- Separate GreenGPU.py module that inherits from TBGreen

Performance improvements (9x9x9 k-mesh, 50 energy points):
- Eigenvalue preparation: 22s → 5.6s (4x speedup)
- Total: 34s → 22s (1.5x speedup)
- Results match CPU version (J_iso ≈ -26.22 meV)

Key changes:
- TB2J/GreenGPU.py: New GPU-accelerated TBGreen class
- TB2J/gpu/: New module with JAX-based GPU implementations
  - jax_utils.py: Eigenvalue computation, array utilities
  - exchange_ncl_gpu.py: GPU ExchangeNCL implementation
  - exchange_pert2_gpu.py: GPU perturbation implementation
  - mae_green_gpu.py: GPU MAE calculation
- TB2J/green.py: Add use_gpu parameter, delegate to TBGreenGPU
- TB2J/exchange.py: Integrate GPU exchange class
- TB2J/exchange_params.py: Add use_gpu parameter handling
- TB2J/interfaces/siesta_interface.py: Pass use_gpu to exchange calculation
- TB2J/scripts/*.py: Add --use_gpu CLI flag

Deprecated:
- sisl_wrapper.py moved to deprecated/ (use HamiltonIO instead)
- exchangeGPU.py removed (replaced by gpu/ module)
The y-component of Pauli decomposition had the wrong sign:
- Wrong: (M01 - M10) * (-0.5j)
- Correct: (M01 - M10) * 0.5j

This matches the CPU version in TB2J/pauli.py and fixes the DMI
y-component sign difference.
- Combined H(k), S(k) and eigenvalue decomposition into single JIT-compiled pipeline
- Uses jax.vmap for batched eigenvalue computation
- Optimized Cholesky decomposition for generalized eigenvalue problem
- Caching of JIT-compiled functions to avoid recompilation overhead

Performance improvements:
- Eigenvalue preparation: 3.8s15s + 0.63s 0.62s (6.2x with JIT) -> 3.46s
- Total: 8.33s -> 6.05s
- Created ExchangeCL2GPU class for GPU-accelerated collinear calculations
- Uses JAX for tensor operations with JIT compilation
- Added GPU support to Manager class for Wannier90 interface
- Updated siesta_interface.py to use GPU for collinear calculations
- Both collinear and non-collinear calculations now support GPU acceleration
- Removed /2 division in parse_ham for match TB2J_spinphon behavior
- Removed H + H.conj().T symmetrization in gen_ham
- Fixed merge_tbmodels_spin to use interleaved basis order

The Wannier90 _hr.dat file stores H(R) for all R vectors.
The previous code incorrectly divided by 2 and then symmetrized,
giving incorrect results. Now the behavior matches
TB2J_spinphon which gives correct exchange parameters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant