Research code for the paper: "Gradient-based Optimization for mRNA Sequence Design"
Publication: Li Hongmin, Goro Terai, Takumi Otagaki, Kiyoshi Asai. bioRxiv 2025.10.22.683691; doi: https://doi.org/10.1101/2025.10.22.683691
This repository contains the complete implementation of the ID3 (Iterative Deep Learning-based Design) framework for optimizing mRNA sequences while maintaining biological constraints. The framework implements 12 optimization variants combining three constraint mechanisms with four optimization modes.
- 12 Optimization Variants: 3 constraint mechanisms × 4 optimization modes
- Constraints: Codon Profile Constraint, Amino Matching Softmax, Lagrangian Multiplier
- Modes: Deterministic/Stochastic × Soft/Hard
- DeepRaccess Integration: RNA accessibility prediction for ribosome binding
- CAI Optimization: Codon Adaptation Index for translation efficiency
- GPU Support: CUDA acceleration for faster optimization
- Python 3.8 or higher
- PyTorch 1.9 or higher
- CUDA-compatible GPU (optional, CPU supported)
# Clone repository
git clone https://github.com/Li-Hongmin/ID3.git
cd ID3
# Install dependencies
pip install -r requirements.txt
# Run demo - DeepRaccess will be set up automatically
bash run_demo.shThat's it! The demo automatically detects and installs DeepRaccess on first run.
# Default: O15263 protein
bash run_demo.sh
# Different protein
bash run_demo.sh P04637
# Results saved to examples/demo_<timestamp>/
# Includes: optimized sequence, trajectory data, and visualizationsThe demo automatically:
- ✅ Checks and installs DeepRaccess if needed
- ✅ Runs 1000-iteration mRNA optimization with Amino Matching Softmax constraint
- ✅ Generates publication-quality evolution figures
- ✅ Saves all results to
examples/directory
For systematic experiments (research/paper reproduction):
# Quick test (5 iterations, 1 seed)
python run_unified_experiment.py --preset quick-test
# Full 12x12 experiments (1000 iterations, 12 seeds) - Accessibility only
python run_unified_experiment.py --preset full-12x12
# Full experiments with CAI optimization
python run_unified_experiment.py --preset full-12x12-cai-penalty
# Custom experiment
python run_unified_experiment.py \
--proteins O15263,P04637 \
--constraints lagrangian,ams,cpc \
--variants 00,01,10,11 \
--iterations 1000 \
--seeds 12 \
--enable-cai \
--device cpuResults saved to results/ directory with detailed metrics and trajectories.
run_demo.sh - Quick case study demo
- ✅ One-click complete workflow
- ✅ Automatic DeepRaccess setup
- ✅ Single protein optimization with visualization
- ✅ Results saved to
examples/directory - ✅ Perfect for quick demonstrations
run_unified_experiment.py - Research-grade experiments
- ✅ Batch experiments (multiple proteins/constraints/variants)
- ✅ Multiple random seeds for statistical analysis
- ✅ 12 optimization variants (3 constraints × 4 modes)
- ✅ Detailed result tracking and analysis
- ✅ Used for paper results
Both tools optimize:
- Amino acid constraints (3 mechanisms: Lagrangian, Amino Matching Softmax, Codon Profile Constraint)
- CAI optimization (Codon Adaptation Index)
- RNA accessibility (DeepRaccess prediction)
ID3/
├── run_demo.sh # One-click case study demo
├── demo.py # Main demo script
├── run_unified_experiment.py # Research experiment framework
├── README.md # This file
├── requirements.txt # Python dependencies
│
├── scripts/ # Auxiliary scripts
│ ├── evolution_figure.py # Visualization generator
│ ├── setup_deepraccess.sh # DeepRaccess installer
│ └── README.md # Scripts documentation
│
├── id3/ # Source code
│ ├── constraints/ # Constraint mechanisms
│ ├── optimizers/ # Optimization engines
│ ├── cai/ # CAI module
│ └── utils/ # Utility functions
│
├── data/ # Data files
│ ├── proteins/ # Test protein sequences (.fasta.txt)
│ ├── codon_references/ # CAI reference data
│ └── utr_templates/ # UTR templates
│
└── examples/ # Demo results (with visualizations)
└── demo_20251031_233130/ # Example: 1000-iter optimization
# Quick demo (1000 iterations)
bash run_demo.sh
# Different protein
bash run_demo.sh P04637
# Research-grade experiments (customizable iterations)
python run_unified_experiment.py --preset quick-test
python run_unified_experiment.py --preset full-12x12import sys
sys.path.insert(0, 'src')
from id3.constraints.lagrangian import LagrangianConstraint
# Create constraint (access-only)
constraint = LagrangianConstraint(
protein_sequence,
enable_cai=False
)
# Generate RNA sequence
result = constraint.forward(alpha=0.5, beta=0.5)
rna_seq = result['discrete_sequence']
# With CAI optimization
constraint_cai = LagrangianConstraint(
protein_sequence,
enable_cai=True,
cai_target=0.8,
cai_lambda=0.1
)
result = constraint_cai.forward(alpha=0.5, beta=0.5)
rna_seq = result['discrete_sequence']
cai_value = result['cai_metadata']['final_cai']The ID3 framework provides 3 constraint mechanisms to ensure RNA sequences encode the correct amino acids. All 3 mechanisms support joint optimization with DeepRaccess.
- Method: Soft penalty-based optimization with adaptive λ
- Formula:
L = f_accessibility + λ·C_amino + λ_CAI·L_CAI - Advantages: Flexible penalty adjustment, stable optimization
- Usage:
demo.py --constraint lagrangian(default)
- Method: Softmax-based amino acid probability matching
- Advantages: Differentiable, enforces constraints naturally
- Usage:
demo.py --constraint amino_matchingorrun_demo.sh(default)
- Method: Maintains codon usage distribution from initial sequence
- Advantages: Preserves codon usage patterns
- Usage:
demo.py --constraint codon_profile
Key Insight: All constraint mechanisms output soft probability distributions (rna_sequence) that can be used for gradient-based optimization with DeepRaccess. The gradient flows through:
Constraint → Soft Probabilities → DeepRaccess → Accessibility Loss → Backprop
- det_soft: Deterministic gradient descent with soft constraints
- det_hard: Deterministic gradient descent with hard constraints
- sto_soft: Stochastic sampling with soft constraints
- sto_hard: Stochastic sampling with hard constraints
python demo.py --protein MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGpython demo.py --protein MSKGEELFTGVVPILVELDGDVNGHKFSVSGEG --enable-caipython demo.py --protein-file data/proteins/P04637.fasta.txt \
--enable-cai \
--cai-target 0.9 \
--cai-lambda 0.2 \
--output result.fasta# Lagrangian Multiplier
python demo.py --constraint lagrangian
# Amino Matching Softmax
python demo.py --constraint amino_matching
# Codon Profile Constraint
python demo.py --constraint codon_profileIf you use this code in your research, please cite:
@article{li2025gradient,
title={Gradient-based Optimization for mRNA Sequence Design},
author={Li, Hongmin and Terai, Goro and Otagaki, Takumi and Asai, Kiyoshi},
journal={bioRxiv},
year={2025},
doi={10.1101/2025.10.22.683691},
url={https://doi.org/10.1101/2025.10.22.683691}
}This work is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Academic Use: ✅ Freely permitted Commercial Use: ❌ Prohibited without permission Attribution: ✅ Required in all publications
For commercial licensing inquiries: [email protected]
See LICENSE-SUMMARY.md for detailed terms.
- Research Questions: [email protected]
- Bug Reports: GitHub Issues
- Commercial Licensing: [email protected]
- DeepRaccess model: https://github.com/hmdlab/DeepRaccess
- University of Tokyo
Version: 1.0.0 Last Updated: January 15, 2025 Maintained by: University of Tokyo