diff --git a/README.md b/README.md
index d61f9bf256..f2c949d889 100644
--- a/README.md
+++ b/README.md
@@ -13,14 +13,11 @@
# Overview
> [!IMPORTANT]
-> **Note:** SkyRL is undergoing a repo reorganization into the [`skyrl/`](./skyrl) folder, which unifies the skyrl libraries below into a single package. The existing packages below are fully functional but will be migrated to new paths shortly. For full [Tinker API](https://docs.skyrl.ai/docs/tinker/overview) support please use the `skyrl/` folder. See the [Tinker Quickstart docs](https://docs.skyrl.ai/docs/tinker/quickstart) to get started. See issue: https://github.com/NovaSky-AI/SkyRL/issues/1145
+> **Note:** SkyRL has reorganized the previous `skyrl-train` and `skyrl-tx` packages into the unified [`skyrl/`](./skyrl) package. For full [Tinker API](https://docs.skyrl.ai/docs/tinker/overview) support, use the `skyrl/` package. See the [Tinker Quickstart docs](https://docs.skyrl.ai/docs/tinker/quickstart) to get started. See issue: https://github.com/NovaSky-AI/SkyRL/issues/1145
SkyRL is a full-stack RL library that provides the following components:
-- [skyrl](./skyrl): Our new unified library for RL on your own hardware, with support for the [Tinker API](https://docs.skyrl.ai/docs/tinker/overview). `skyrl` combines our previous work:
-
- * [`skyrl-train`](./skyrl-train): A modular, performant training framework for RL.
- * [`skyrl-tx`](./skyrl-tx): A cross-platform library implementing a backend for the [Tinker API](https://docs.skyrl.ai/docs/tinker/overview), with a unified engine for training and inference.
+- [skyrl](./skyrl): Our unified library for RL on your own hardware, with support for the [Tinker API](https://docs.skyrl.ai/docs/tinker/overview). The training stack lives in [`skyrl/train`](./skyrl/train) and [`skyrl/backends/skyrl_train`](./skyrl/backends/skyrl_train), while the Tinker/TX stack lives in [`skyrl/tinker`](./skyrl/tinker), [`skyrl/tx`](./skyrl/tx), and [`skyrl/backends`](./skyrl/backends).
- [`skyrl-agent`](./skyrl-agent): Our agent layer for training long-horizon, real-world agents. For exact reproduction of [SkyRL-v0](https://novasky-ai.notion.site/skyrl-v0) results, please checkout to commit a0d50c482436af7fac8caffa4533616a78431d66.
- [`skyrl-gym`](./skyrl-gym): Our gymnasium of tool-use tasks, including a library of math, coding, search and SQL environments implemented in the Gymnasium API.
diff --git a/skyrl-train/README.md b/skyrl-train/README.md
deleted file mode 100644
index bcc599e4b6..0000000000
--- a/skyrl-train/README.md
+++ /dev/null
@@ -1,124 +0,0 @@
-# SkyRL-Train: A modular, performant RL framework for post-training LLMs
-
-
-[](https://novasky-ai.github.io/) [](https://github.com/NovaSky-AI/SkyRL) [](https://x.com/NovaSkyAI) [](https://huggingface.co/NovaSky-AI) [](https://discord.gg/RBAjeWSA) [](https://docs.skyrl.ai/docs/)
-
-
-
-> [!IMPORTANT]
-> **Note:** SkyRL is undergoing a repo reorganization into the `SkyRL/skyrl` folder, which unifies the skyrl libraries (`skyrl-train`, `skyrl-tx`) into a single package. The code that was previously in the `skyrl-train` package can now be found in `skyrl/{backends/, train/, utils/}`. See issue: https://github.com/NovaSky-AI/SkyRL/issues/1145
-
-
-# Overview
-
- With a focus on modularity, `skyrl-train` makes it easy to prototype new training algorithms, environments, and execution plansβwithout compromising usability or speed.
-
-`skyrl-train` is **for users who want to modify anything:**
-
-- **Quickly develop new environments** without modifying or understanding the training code.
-- **Modify the training execution plan** such as model placement, colocation or disaggregation of training and generation, and async RL.
-- **Implement custom trajectory generation** specific to your use-case, such as custom sampling methods, tree search, etc.
-- β¦ make any other flexible modifications to the RL workflow!
-
-
-## Key Features
-The `skyrl-train` package supports:
-- PPO and GRPO
-- Training Backends: FSDP, FSDP2, and [Megatron](https://docs.skyrl.ai/docs/examples/megatron)
-- Inference backends: vLLM, SGLang, and any custom OpenAI API compatible endpoint that exposes a method to perform weight sync
-- Ulysses sequence parallelism for long-context training
-- [Colocated or disaggregated](https://docs.skyrl.ai/docs/configuration/placement) training and generation (including on heterogeneous hardware)
-- Synchronous RL, [async one-off pipelining](https://docs.skyrl.ai/docs/tutorials/one_step_off_async), or [fully async RL with in-flight weight updates](https://docs.skyrl.ai/docs/tutorials/fully_async)
-- Simple batched rollouts or Asynchronous rollouts for multi-turn conversations
-- Weight sync via NCCL, gloo, or checkpoint-and-load
-- Integration with `skyrl-gym`, [verifiers](https://github.com/NovaSky-AI/SkyRL/tree/main/examples/train_integrations/verifiers), [OpenEnv](https://github.com/NovaSky-AI/SkyRL/tree/main/examples/train_integrations/openenv), [Harbor/Terminal-Bench](https://github.com/NovaSky-AI/SkyRL/tree/main/examples/train_integrations/harbor), and more!
-- Sequence packing and Flash Attention 2
-- Algorithmic support for RLOO, REINFORCE, GSPO, CISPO, SAPO
-- Step-wise training for fully on policy multi-turn RL
-- 5D Parallelism support for MoE models with the [Megatron backend](https://docs.skyrl.ai/docs/examples/megatron)
-
-## Documentation
-
-Find documentation at: [docs.skyrl.ai/docs/](https://docs.skyrl.ai/docs/)
-
-## Quick Start
-
-A quick start guide for installation and your first training run is provided below.
-
-### Requirements
-
-The only requirements are:
-
-- CUDA version 12.8
-- [uv](https://docs.astral.sh/uv/)
-
-If you're running on an existing Ray cluster, make sure to use Ray 2.51.1 and Python 3.12. If not, proceed with the installation instructions below.
-
-
-First, clone the repository:
-
-```bash
-git clone --recurse-submodules https://github.com/NovaSky-AI/SkyRL
-cd SkyRL/
-```
-
-Then, create a new virtual environment and install the dependencies:
-
-```bash
-# creates a venv at .venv/
-uv sync --extra fsdp
-source .venv/bin/activate
-```
-
-Then, prepare the dataset:
-
-```bash
-uv run -- python examples/train/gsm8k/gsm8k_dataset.py
-```
-
-Finally, before training, make sure to configure Ray to use `uv`:
-
-```bash
-export RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook
-# or add to your .bashrc
-# echo 'export RAY_RUNTIME_ENV_HOOK=ray._private.runtime_env.uv_runtime_env_hook.hook' >> ~/.bashrc
-```
-
-You should now be able to run our example script (assumes at least 4 GPUs):
-
-```bash
-export WANDB_API_KEY=
-bash examples/train/gsm8k/run_gsm8k.sh
-```
-
-For detailed installation instructions, as well as more examples, please refer to our [documentation](https://docs.skyrl.ai/docs/).
-
-## Training on a new task or environment
-
-To implement a new task or environment using the SkyRL-Gym interface, please see our [Walkthrough Docs](https://docs.skyrl.ai/docs/tutorials/new_env).
-
-If you don't want to use the SkyRL-Gym interface, or you have an existing task or agentic pipeline implementation and just want to train with it on top of SkyRL, we recommend you create a simple custom [`Generator`](/skyrl/train/generators/base.py), which requires implementing a single method, `generate()`. We have one example of a custom Generator at [`SkyRLGymGenerator`](/skyrl/train/generators/skyrl_gym_generator.py) which executes environments written in the SkyRL-Gym interface. We are working to provide more example integrations of agent harnesses -- please reach out if you'd like yours to be one of them!
-
-## Reproducing SkyRL-SQL
-We also test SkyRL by reproducing our prior release [SkyRL-SQL](https://novasky-ai.notion.site/skyrl-sql), which enabled efficient Multi-Turn RL for Text2SQL.
-You can find a link to the wandb report [here](https://wandb.ai/sky-posttraining-uc-berkeley/skyrl-sql/reports/SkyRL-SQL---VmlldzoxMzM0MTAyMw), and a detailed walk through of the reproduction in our [documentation](https://docs.skyrl.ai/docs/examples/multi_turn_text2sql).
-
-# Acknowledgement
-
-This work is done at [**Berkeley Sky Computing Lab**](https://sky.cs.berkeley.edu/) in collaboration with [**Anyscale**](https://www.anyscale.com/), with generous compute support from [**Anyscale**](https://www.anyscale.com/), [**Databricks**](https://www.databricks.com/), [**NVIDIA**](https://developer.nvidia.com/brev), [**Lambda Labs**](https://lambda.ai/), and [**AMD**](https://www.amd.com/en).
-
-We adopt many lessons and code from several great projects such as [veRL](https://github.com/volcengine/verl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [Search-R1](https://github.com/PeterGriffinJin/Search-R1), [OpenReasonerZero](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero), and [NeMo-RL](https://github.com/NVIDIA-NeMo/RL). We appreciate each of these teams and their contributions to open-source research!
-
-
-
-# Citation
-
-If you find the work in `skyrl-train` helpful, please consider citing:
-```bibtex
-@misc{griggs2025skrylv01,
- title={Evolving SkyRL into a Highly-Modular RL Framework},
- author={Tyler Griggs and Sumanth Hegde and Eric Tang and Shu Liu and Shiyi Cao and Dacheng Li and Charlie Ruan and Philipp Moritz and Kourosh Hakhamaneshi and Richard Liaw and Akshay Malik and Matei Zaharia and Joseph E. Gonzalez and Ion Stoica},
- year={2025},
- note={Notion Blog}
-}
-```
diff --git a/skyrl-tx/README.md b/skyrl-tx/README.md
deleted file mode 100644
index 91e777c996..0000000000
--- a/skyrl-tx/README.md
+++ /dev/null
@@ -1,320 +0,0 @@
-
-
-SkyRL tx is an open-source library that implements a backend for the [Tinker API](https://thinkingmachines.ai/tinker/), allowing you to set up your own Tinker-like service running on your own hardware. It provides a unified interface for both training and inference, enabling seamless online learning, cost-effective multi-tenancy through LoRA, and simplified ML infrastructure.
-
-> [!IMPORTANT]
-> **Note:** SkyRL is undergoing a repo reorganization into the [`skyrl/`](../skyrl) folder, which unifies the skyrl libraries into a single package. The code that was previously in the `skyrl-tx` folder can now be found in `skyrl/{backends, tinker, tx, utils}`.
-
-## β¨ Key Features
-
-- **Unified Training & Inference** β Single engine for forward passes, backward passes, and sampling
-- **Multi-User LoRA Support** β Efficient GPU sharing across users with individual adapters
-- **SFT & RL Support** β Supervised fine-tuning and reinforcement learning with PPO and custom loss functions
-- **Multi-Node Training** β FSDP and tensor parallelism for distributed training
-- **Multiple Model Architectures** β Support for Qwen3 (dense & MoE), Llama 3, and DeepSeek V3
-- **External Inference Engine** β Optional vLLM integration for optimized inference
-- **Production Ready** β PostgreSQL support, cloud storage checkpoints, and database migrations
-
-## ποΈ Architecture
-
-SkyRL tx consists of four main components:
-
-```
-βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-β REST API Server β
-β (FastAPI - handles requests) β
-βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- β
- βΌ
-βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-β Database β
-β (SQLite/PostgreSQL - metadata, job queue) β
-βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- β
- βΌ
-βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-β Engine β
-β (Scheduling & batching across users/adapters) β
-βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- β
- βΌ
-βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-β Worker β
-β (Model execution, forward/backward, optimizer) β
-βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-```
-
-## π Quick Start
-
-### Installation
-
-```bash
-git clone https://github.com/NovaSky-AI/SkyRL
-cd SkyRL/
-
-# For GPU
-uv run --extra gpu --extra tinker -m skyrl.tinker.api --base-model
-
-# For TPU
-uv run --extra tpu --extra tinker -m skyrl.tinker.api --base-model
-```
-
-### Basic Training Example (Pig Latin)
-
-Start the server:
-
-```bash
-uv run --extra gpu --extra tinker -m skyrl.tinker.api --base-model "Qwen/Qwen3-0.6B"
-```
-
-Run a simple training loop:
-
-```python
-import tinker
-import numpy as np
-from tinker import types
-
-# Connect to the local server
-service_client = tinker.ServiceClient(base_url="http://localhost:8000", api_key="tml-dummy")
-training_client = service_client.create_lora_training_client(base_model="Qwen/Qwen3-0.6B")
-tokenizer = training_client.get_tokenizer()
-
-# Training examples
-examples = [
- {"input": "banana split", "output": "anana-bay plit-say"},
- {"input": "quantum physics", "output": "uantum-qay ysics-phay"},
- {"input": "coding wizard", "output": "oding-cay izard-way"},
-]
-
-def process_example(example, tokenizer):
- prompt = f"English: {example['input']}\nPig Latin:"
- prompt_tokens = tokenizer.encode(prompt, add_special_tokens=True)
- completion_tokens = tokenizer.encode(f" {example['output']}\n\n", add_special_tokens=False)
-
- tokens = prompt_tokens + completion_tokens
- weights = [0] * len(prompt_tokens) + [1] * len(completion_tokens)
-
- return types.Datum(
- model_input=types.ModelInput.from_ints(tokens=tokens[:-1]),
- loss_fn_inputs=dict(weights=weights[1:], target_tokens=tokens[1:])
- )
-
-processed = [process_example(ex, tokenizer) for ex in examples]
-
-# Training loop
-for _ in range(6):
- fwdbwd = training_client.forward_backward(processed, "cross_entropy").result()
- training_client.optim_step(types.AdamParams(learning_rate=1e-4)).result()
-
- logprobs = np.concatenate([o['logprobs'].tolist() for o in fwdbwd.loss_fn_outputs])
- weights = np.concatenate([e.loss_fn_inputs['weights'].tolist() for e in processed])
- print(f"Loss: {-np.dot(logprobs, weights) / weights.sum():.4f}")
-```
-
-### Sampling
-
-```python
-# After training, create a sampling client
-sampling_client = training_client.save_weights_and_get_sampling_client(name='my-model')
-
-# Sample from the model
-prompt = types.ModelInput.from_ints(tokenizer.encode("English: coffee break\nPig Latin:"))
-params = types.SamplingParams(max_tokens=20, temperature=0.0)
-result = sampling_client.sample(prompt=prompt, sampling_params=params, num_samples=8).result()
-
-for i, seq in enumerate(result.sequences):
- print(f"{i}: {tokenizer.decode(seq.tokens)}")
-```
-
-## π Usage Examples
-
-### Dense Model Training (Qwen3-8B on 8ΓH100)
-
-```bash
-# Start the server
-uv run --extra gpu --extra tinker -m skyrl.tinker.api \
- --base-model Qwen/Qwen3-8B \
- --backend-config '{"max_lora_adapters": 2, "max_lora_rank": 1, "tensor_parallel_size": 8, "train_micro_batch_size": 1}'
-
-# Run training (using tinker-cookbook)
-export TINKER_API_KEY="tml-dummy"
-uv run --with wandb --with tinker sl_loop.py \
- base_url=http://localhost:8000 \
- model_name=Qwen/Qwen3-8B lora_rank=1 train_on_what=LAST_ASSISTANT_MESSAGE
-```
-
-### MoE Model Training (Qwen/Qwen3-30B-A3B)
-
-```bash
-# Start the server
-uv run --extra gpu --extra tinker -m skyrl.tinker.api \
- --base-model Qwen/Qwen3-30B-A3B \
- --backend-config '{"max_lora_adapters": 2, "max_lora_rank": 1, "expert_parallel_size": 8, "train_micro_batch_size": 1, "shard_attention_heads": false}'
-
-# Run training (using tinker-cookbook)
-export TINKER_API_KEY="tml-dummy"
-uv run --with wandb --with tinker sl_loop.py \
- base_url=http://localhost:8000 \
- model_name=Qwen/Qwen3-30B-A3B lora_rank=1 max_length=512 train_on_what=LAST_ASSISTANT_MESSAGE
-```
-
-### Reinforcement Learning (Qwen/Qwen3-8B)
-
-```bash
-# Start server
-uv run --extra gpu --extra tinker -m skyrl.tinker.api \
- --base-model Qwen/Qwen3-8B \
- --backend-config '{"max_lora_adapters": 3, "max_lora_rank": 1, "tensor_parallel_size": 8, "train_micro_batch_size": 8, "sample_max_num_sequences": 256}' > out.log
-
-# Run RL loop
-uv run --with wandb --with tinker rl_loop.py \
- base_url=http://localhost:8000 \
- model_name="Qwen/Qwen3-8B" \
- lora_rank=1 max_length=1024
-```
-
-### Running the `search_tool` example
-
-First follow the instructions in the [the search_tool recipe](https://github.com/thinking-machines-lab/tinker-cookbook/blob/main/tinker_cookbook/recipes/search_tool/README.md)
-to download the data and set up chroma. You can then use the following commands to train the model
-
-```bash
-# Start server
-uv run --extra gpu --extra tinker -m skyrl.tinker.api \
- --port 8001 \
- --base-model Qwen/Qwen3-4B-Instruct-2507 \
- --backend-config '{"max_lora_adapters": 3, "max_lora_rank": 32, "tensor_parallel_size": 8, "train_micro_batch_size": 1, "sample_max_num_sequences": 128}' > out.log
-
-# Run RL loop
-export TINKER_API_KEY="tml-dummy"
-export GOOGLE_API_KEY="..." # Replace with your Google API Key
-export WANDB_API_KEY="..." # Replace with your WandB API Key
-uv run --extra vector-search --extra wandb python -m tinker_cookbook.recipes.search_tool.train \
- base_url=http://localhost:8001 \
- model_name=Qwen/Qwen3-4B-Instruct-2507 \
- behavior_if_log_dir_exists=delete \
- wandb_project=search-r1-skyrl-tx
-```
-
-### Multi-Node Training
-
-```bash
-# Node 0 (coordinator + API server)
-CUDA_VISIBLE_DEVICES=0,1,2,3 uv run --extra gpu --extra tinker -m skyrl.tinker.api \
- --base-model Qwen/Qwen3-8B \
- --backend-config '{
- "max_lora_adapters": 3,
- "max_lora_rank": 1,
- "tensor_parallel_size": 4,
- "fully_sharded_data_parallel_size": 2,
- "train_micro_batch_size": 8,
- "sample_max_num_sequences": 256,
- "coordinator_address": "node0:7777",
- "num_processes": 2
- }' > out.log
-
-# Node 1 (worker)
-CUDA_VISIBLE_DEVICES=4,5,6,7 uv run --extra jax --extra gpu --extra tinker -m skyrl.backends.jax \
- --coordinator-address "node0:7777" \
- --num-processes 2 \
- --process-id 1
-```
-
-### With External vLLM Inference
-
-```bash
-# Start vLLM
-VLLM_ALLOW_RUNTIME_LORA_UPDATING=True \
-VLLM_PLUGINS=lora_filesystem_resolver \
-VLLM_LORA_RESOLVER_CACHE_DIR=/tmp/lora_models/ \
-CUDA_VISIBLE_DEVICES=4,5,6,7 uv run --with vllm vllm serve Qwen/Qwen3-4B \
- --tensor-parallel-size 4 --port 7999 --enable-lora
-
-# Start SkyRL tx with external inference
-CUDA_VISIBLE_DEVICES=0,1,2,3 uv run --extra gpu --extra tinker -m skyrl.tinker.api \
- --base-model Qwen/Qwen3-4B \
- --external-inference-url "http://0.0.0.0:7999" \
- --backend-config '{"max_lora_adapters": 3, "max_lora_rank": 1, "tensor_parallel_size": 4, "train_micro_batch_size": 8}' > out.log
-```
-
-## π― Supported Features
-
-| Feature | Status |
-|---------|--------|
-| Qwen3 Dense Models | β
|
-| Qwen3 MoE Models | β
|
-| Llama 3 Models | β
|
-| DeepSeek V3 Models | β
|
-| Multi-User LoRA | β
|
-| LoRA (all layers) | β
|
-| Forward/Backward | β
|
-| Sampling | β
|
-| Gradient Accumulation | β
|
-| Gradient Checkpointing | β
|
-| JIT Compilation | β
|
-| Tensor Parallelism | β
|
-| Expert Parallelism | β
|
-| FSDP | β
|
-| Multi-Node | β
|
-| PostgreSQL | β
|
-| Cloud Storage Checkpoints | β
|
-| Custom Loss Functions | β
|
-| External Inference (vLLM) | β
|
-| Local Model Loading | β
|
-
-## πΊοΈ Roadmap
-
-- **Performance** β Expert parallelism, context parallelism, optimized kernels
-- **Models** β More architectures, PyTorch model definitions via torchax
-- **API Coverage** β Full Tinker API compatibility
-- **Operations** β Dashboard/frontend, improved logging and metrics
-- **Integration** β SkyRL-train Tinkerification
-
-## π€ Contributing
-
-We welcome contributions! The project is early and hackable β now is a great time to get involved.
-
-**Ways to contribute:**
-- Try examples from the [Tinker documentation](https://tinker-docs.thinkingmachines.ai/) or [cookbook](https://github.com/thinking-machines-lab/tinker-cookbook)
-- Fix issues or implement features from our [issue tracker](https://github.com/NovaSky-AI/SkyRL/issues?q=is%3Aissue%20state%3Aopen%20label%3Atx)
-- Improve documentation
-- Add support for more models
-- Performance optimizations
-
-## π Resources
-
-- **[Ray Summit Talk](https://www.youtube.com/watch?v=_JLnESEu2gw)** β SkyRL tx: A unified training and inference engine
-- **[Slides](https://docs.google.com/presentation/d/1g-u8zxz7FsnlQXXShBVoqjUJhS48c6rxkJJJn0sj78A/)** β Presentation slides
-- **[Tinker Documentation](https://tinker-docs.thinkingmachines.ai/)** β Official Tinker API docs
-- **[Tinker Cookbook](https://github.com/thinking-machines-lab/tinker-cookbook)** β Example recipes
-
-## π Blog Posts
-
-- **[Introducing SkyRL tx](https://novasky-ai.notion.site/skyrl-tx)**
-- **[SkyRL tx v0.0.2](https://novasky-ai.notion.site/skyrl-tx-v002)**
-- **[SkyRL tx v0.0.3](https://novasky-ai.notion.site/skyrl-tx-003)**
-- **[SkyRL tx v0.1.0](https://novasky-ai.notion.site/skyrl-tx-v010)**
-- **[SkyRL tx v0.2.0](https://novasky-ai.notion.site/skyrl-tx-v02)**
-- **[SkyRL tx v0.2.1](https://novasky-ai.notion.site/skyrl-tx-v021)**
-- **[SkyRL tx v0.3.0](https://novasky-ai.notion.site/skyrl-tx-v030)**
-
-## π¬ Contact
-
-- **Slack**: [#skyrl-tx](https://skyrl.slack.com/archives/C09K1JGNPJS)
-- **GitHub**: [NovaSky-AI/SkyRL/skyrl-tx](https://github.com/NovaSky-AI/SkyRL/tree/main/skyrl-tx/README.md)
-- **Twitter/X**: [@NovaSkyAI](https://x.com/NovaSkyAI)
-
-## π License
-
-See [LICENSE](LICENSE) for details.