Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# PufferDrive

[![Unit Tests](https://github.com/Emerge-Lab/PufferDrive/actions/workflows/utest.yml/badge.svg)](https://github.com/Emerge-Lab/PufferDrive/actions/workflows/utest.yml)

<img align="left" style="width:260px" src="https://github.com/Emerge-Lab/PufferDrive/blob/main/pufferlib/resources/drive/pufferdrive_20fps_long.gif" width="288px">

**PufferDrive is a fast and friendly driving simulator to train and test RL-based models.**
Expand Down
24 changes: 24 additions & 0 deletions docs/src/interact-with-agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,30 @@ then launch:

This will run `demo()` with an existing model checkpoint.

## Arguments & Configuration

The `drive` tool supports similar CLI arguments as the visualizer to control the environment and rendering. It also reads the `pufferlib/config/ocean/drive.ini` file for default environment settings.

### Command Line Arguments

| Argument | Description | Default |
| :--- | :--- | :--- |
| `--map-name <path>` | Path to the map binary file (e.g., `resources/drive/binaries/training/map_000.bin`). If omitted, picks a random map out of `num_maps` from `map_dir` in `drive.ini`. | Random |
| `--policy-name <path>` | `Path to the policy weights file (.bin).` | `resources/drive/puffer_drive_weights.bin` |
| `--view <mode>` | Selects which views to render: `agent`, `topdown`, or `both`. | `both` |
| `--frame-skip <n>` | Renders every Nth frame to speed up simulation (framerate remains 30fps). | `1` |
| `--num-maps <n>` | Overrides the number of maps to sample from if `--map-name` is not set. | `drive.ini` value |

### Visualization Flags

| Flag | Description |
| :--- | :--- |
| `--show-grid` | Draws the underlying nav-graph/grid on the map. |
| `--obs-only` | Hides objects not currently visible to the agent's sensors (fog of war). |
| `--lasers` | Visualizes the raycast sensor lines from the agent. |
| `--log-trajectories` | Draws the ground-truth "human" expert trajectories as green lines. |
| `--zoom-in` | Zooms the camera mainly on the active region rather than the full map bounds. |

### Controls

**General:**
Expand Down
21 changes: 19 additions & 2 deletions docs/src/simulator.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,28 @@ A high-performance autonomous driving simulator in C with Python bindings.

- `control_vehicles`: Only vehicles
- `control_agents`: All agent types (vehicles, cyclists, pedestrians)
- `control_tracks_to_predict`: WOMD evaluation mode
- `control_wosac`: WOSAC evaluation mode (controls all valid agents ignoring expert flag and start to goal distance)
- `control_sdc_only`: Self-driving car only

> [!NOTE]
> `control_vehicles` filters out agents marked as "expert" and those too close to their goal (<2m). For full WOMD evaluation, use `control_tracks_to_predict`.
> `control_vehicles` filters out agents marked as "expert" and those too close to their goal (<2m). For full WOMD evaluation, use `control_wosac`.

> [!IMPORTANT]
> **Agent Dynamics:** The simulator supports three types of agents:
> 1. **Policy-Controlled:** Stepped by your model's actions.
> 2. **Experts:** Stepped using ground-truth log trajectories.
> 3. **Static:** Remain frozen in place.
>
> In the simulator, agents not selected for policy control will be treated as **Static** by default. To make them follow their **Expert trajectories**, you must set `mark_as_expert=true` for those agents in the jsons. This is critical for `control_sdc_only` to ensure the environment behaves realistically around the policy-controlled agents.

### Init modes

- **`create_all_valid`** (Default): Initializes every valid agent present in the map file. This includes policy-controlled agents, experts (if marked), and static agents.

- **`create_only_controlled`**: Initializes **only** the agents that are directly controlled by the policy.

> [!NOTE]
> In `create_only_controlled` mode, the environment will contain **no static or expert agents**. Only the policy-controlled agents will exist.

### Goal behaviors

Expand Down
12 changes: 6 additions & 6 deletions docs/src/train.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
## Training
# Training

### Basic training
## Basic training

Launch a training run with Weights & Biases logging:

```bash
puffer train puffer_drive --wandb --wandb-project "pufferdrive"
```

### Environment configurations
## Environment configurations

**Default configuration (Waymo maps)**

Expand All @@ -33,11 +33,11 @@ resample_frequency = 100000 # No resampling needed (there are only a few Carla m
termination_mode = 0 # 0: terminate at episode_length, 1: terminate after all agents reset

# Map settings
map_dir = "resources/drive/binaries/carla"
num_maps = 2
map_dir = "resources/drive/binaries"
num_maps = 2 # Number of Carla maps you're training in
```

this should give a good starting point. With these settings, you'll need about 2-3 billion steps to get an agent that reaches most of it's goals (> 95%) and has a combined collsion / off-road rate of 3 % per episode of 300 steps.
this should give a good starting point. With these settings, you'll need about 2-3 billion steps to get an agent that reaches most of it's goals (> 95%) and has a combined collsion / off-road rate of 3 % per episode of 300 steps in town 1 and 2, which can be found [here](https://github.com/Emerge-Lab/PufferDrive/tree/2.0/data_utils/carla/carla_data). Before launching your experiment, run `drive.py` with the folder to the Carla towns to process them to binaries, then ensure the `map_dir` above is pointed to these binaries.

> [!Note]
> The default training hyperparameters work well for both configurations and typically don't need adjustment.
Expand Down
38 changes: 36 additions & 2 deletions docs/src/visualizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ bash scripts/build_ocean.sh visualize local

If you need to force a rebuild, remove the cached binary first (`rm ./visualize`).

## Run headless
Launch the visualizer with a virtual display and export an `.mp4`:
## Rendering a Video
Launch the visualizer with a virtual display and export an `.mp4` for the binary scenario:

```bash
xvfb-run -s "-screen 0 1280x720x24" ./visualize
Expand All @@ -43,3 +43,37 @@ puffer render puffer_drive
```

This mode parallelizes rendering based on `vec.num_workers`.

## Arguments & Configuration

The `visualize` tool supports several CLI arguments to control the rendering output. It also reads the `pufferlib/config/ocean/drive.ini` file for default environment settings(For more details on these settings, refer to [Configuration](simulator.md#configuration)).

### Command Line Arguments

| Argument | Description | Default |
| :--- | :--- | :--- |
| `--map-name <path>` | Path to the map binary file (e.g., `resources/drive/binaries/training/map_000.bin`). If omitted, picks a random map out of `num_maps` from `map_dir` in `drive.ini`. | Random |
| `--policy-name <path>` | Path to the policy weights file (`.bin`). | `resources/drive/puffer_drive_weights.bin` |
| `--view <mode>` | Selects which views to render: `agent`, `topdown`, or `both`. | `both` |
| `--output-agent <path>` | Output filename for agent view video. | `<policy>_agent.mp4` |
| `--output-topdown <path>` | Output filename for top-down view video. | `<policy>_topdown.mp4` |
| `--frame-skip <n>` | Renders every Nth frame to speed up generation (framerate remains 30fps). | `1` |
| `--num-maps <n>` | Overrides the number of maps to sample from if `--map-name` is not set. | `drive.ini` value |

### Visualization Flags

| Flag | Description |
| :--- | :--- |
| `--show-grid` | Draws the underlying nav-graph/grid on the map. |
| `--obs-only` | Hides objects not currently visible to the agent's sensors (fog of war). |
| `--lasers` | Visualizes the raycast sensor lines from the agent. |
| `--log-trajectories` | Draws the ground-truth "human" expert trajectories as green lines. |
| `--zoom-in` | Zooms the camera mainly on the active region rather than the full map bounds. |

### Key `drive.ini` Settings
The visualizer initializes the environment using `pufferlib/config/ocean/drive.ini`. Important settings include:

- `[env] dynamics_model`: `classic` or `jerk`. Must match the trained policy.
- `[env] episode_length`: Duration of the playback. defaults to 91 if set to 0.
- `[env] control_mode`: Determines which agents are active (`control_vehicles` vs `control_sdc_only`).
- `[env] goal_behavior`: Defines agent behavior upon reaching goals (respawn vs stop).
26 changes: 7 additions & 19 deletions docs/src/wosac.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,30 +55,18 @@ We provide baselines on a small curated dataset from the WOMD validation set wit

| Method | Realism meta-score | Kinematic metrics | Interactive metrics | Map-based metrics | minADE | ADE |
|--------|-------------------|-------------------|---------------------|-------------------|--------|------|
| Ground-truth (UB) | 0.832 | 0.606 | 0.846 | 0.961 | 0 | 0 |
| π_Base self-play RL | 0.737 | 0.319 | 0.789 | 0.938 | 10.834 | 11.317 |
| [SMART-tiny-CLSFT](https://arxiv.org/abs/2412.05334) | 0.805 | 0.534 | 0.830 | 0.949 | 1.124 | 3.123 |
| π_Random | 0.485 | 0.214 | 0.657 | 0.408 | 6.477 | 18.286 |
| Ground-truth (UB) | 0.8179 | 0.6070 | 0.9590 | 0.8722 | 0 | 0 |
| Self-play RL agent | 0.6750 | 0.2798 | 0.7966 | 0.7811 | 10.8057 | 11.4108 |
| [SMART-tiny-CLSFT](https://arxiv.org/abs/2412.05334) | 0.7818 | 0.5200 | 0.8914 | 0.8378 | 1.1236 | 3.1231 |
| Random | 0.4459 | 0.0506 | 0.7843 | 0.4704 | 23.5936 | 25.0097 |

*Table: WOSAC baselines in PufferDrive on 229 selected clean held-out validation scenarios.*


> ✏️ Download the dataset from [Hugging Face](https://huggingface.co/datasets/daphne-cornelisse/pufferdrive_wosac_val_clean) to reproduce these results or benchmark your policy.


| Method | Realism meta-score | Kinematic metrics | Interactive metrics | Map-based metrics | minADE | ADE |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| Ground-truth (UB) | 0.833 | 0.574 | 0.864 | 0.958 | 0 | 0 |
| π_Base self-play RL | 0.737 | 0.323 | 0.792 | 0.930 | 8.530 | 9.088 |
| [SMART-tiny-CLSFT](https://arxiv.org/abs/2412.05334) | 0.795 | 0.504 | 0.832 | 0.932 | 1.182 | 2.857 |
| π_Random | 0.497 | 0.238 | 0.656 | 0.430 | 6.395 | 18.617 |

*Table: WOSAC baselines in PufferDrive on validation 10k dataset.*


> ✏️ Download the dataset from [Hugging Face](https://huggingface.co/datasets/daphne-cornelisse/pufferdrive_womd_val) to reproduce these results or benchmark your policy.
- **Random agent:** Following the [WOSAC 2023 paper](https://arxiv.org/abs/2305.12032), the random agent samples future trajectories by independently sampling (x, y, θ) at each timestep from a Gaussian distribution in the AV coordinate frame `(mu=1.0, sigma=0.1)`, producing uncorrelated random motion over the horizon of 80 steps.
- **Goal-conditioned self-play RL agent**: An agent trained through self-play RL to reach the end point points ("goals") without colliding or going off-road. Baseline can be reproduced using the default settings in the `drive.ini` file with the Waymo dataset. We also open-source the weights of this policy, see `pufferlib/resources/drive/puffer_drive_weights` `.bin` and `.pt`.


> ✏️ Download the dataset from [Hugging Face](https://huggingface.co/datasets/daphne-cornelisse/pufferdrive_wosac_val_clean) to reproduce these results or benchmark your policy.

## Evaluating trajectories

Expand Down
5 changes: 5 additions & 0 deletions docs/theme/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -463,3 +463,8 @@ blockquote {
margin: 1rem 0;
border-radius: 0 8px 8px 0;
}

/* Fix table visibility - remove alternating row colors */
table tr:nth-child(2n) {
background-color: transparent !important;
}
14 changes: 12 additions & 2 deletions pufferlib/config/ocean/drive.ini
Original file line number Diff line number Diff line change
Expand Up @@ -177,8 +177,14 @@ render_map = none
eval_interval = 1000
; Path to dataset used for evaluation
map_dir = "resources/drive/binaries/training"
; Evaluation will run on the first num_maps maps in the map_dir directory
num_maps = 20
; Number of scenarios to process per batch
wosac_batch_size = 32
; Target number of unique scenarios perform evaluation in
wosac_target_scenarios = 64
; Total pool of scenarios to sample from
wosac_scenario_pool_size = 1000
; Max batches, used as a timeout to prevent an infinite loop
Comment on lines +183 to +186
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WaelDLZ how are these two numbers different? I can see it in the code but it's unclear from the comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in the long run these should be replaced by the multiprocessed eval code.

From my understanding:

  • Pool size is the equivalent of the num_maps we would have in training.
  • target_scenario is like stop the eval once we saw 64 distinct scenarios (this is needed because we sample with replacement)
  • batch_size is we compute the metrics by batches of 32

But I hope that we can replace this by the mechanism I proposed in the multiprocessed eval PR

wosac_max_batches = 100
backend = PufferEnv
; WOSAC (Waymo Open Sim Agents Challenge) evaluation settings
; If True, enables evaluation on realism metrics each time we save a checkpoint
Expand All @@ -198,10 +204,14 @@ wosac_goal_radius = 2.0
wosac_sanity_check = False
; Only return aggregate results across all scenes
wosac_aggregate_results = True
; Evaluation mode: "policy", "ground_truth"
wosac_eval_mode = "policy"
; If True, enable human replay evaluation (pair policy-controlled agent with human replays)
human_replay_eval = False
; Control only the self-driving car
human_replay_control_mode = "control_sdc_only"
; Number of scenarios for human replay evaluation equals the number of agents
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also unclear comment

human_replay_num_agents = 16

[render]
; Mode to render a bunch of maps with a given policy
Expand Down
86 changes: 31 additions & 55 deletions pufferlib/ocean/benchmark/evaluate_imported_trajectories.py
Original file line number Diff line number Diff line change
@@ -1,49 +1,30 @@
import sys
import pickle
import numpy as np
from scipy.spatial import cKDTree
import pufferlib.pufferl as pufferl
from pufferlib.ocean.benchmark.evaluator import WOSACEvaluator


def align_trajectories_by_initial_position(simulated, ground_truth, tolerance=1e-4):
"""
If the trajectories where generated using the same dataset, then regardless of the algorithm the initial positions should be the same.
We use this information to align the trajectories for WOSAC evaluation.

Ideally we would not have to use a tolerance, but the preprocessing in SMART shifts some values by around 2-e5 for some agents.

Also, the preprocessing in SMART messes up some heading values, so I decided not to include heading.

Idea of this script, use a nearest neighbor algorithm to associate all initial positions in gt to positions in simulated,
and check that everyone matching respects the tolerance and there are no duplicates.
"""

sim_pos = np.stack([simulated["x"][:, 0, 0], simulated["y"][:, 0, 0], simulated["z"][:, 0, 0]], axis=1).astype(
np.float64
)
def align_trajectories(simulated, ground_truth):
# Idea is to use the (scenario_id, id) pair to reindex simulated_trajectories in order to align it with GT
gt_scenario_ids = ground_truth["scenario_id"][:, 0]
sim_scenario_ids = simulated["scenario_id"][:, 0, 0]

gt_pos = np.stack(
[ground_truth["x"][:, 0, 0], ground_truth["y"][:, 0, 0], ground_truth["z"][:, 0, 0]], axis=1
).astype(np.float64)
gt_ids = ground_truth["id"][:, 0]
sim_ids = simulated["id"][:, 0, 0]

tree = cKDTree(sim_pos)
lookup = {(s_id, a_id): idx for idx, (s_id, a_id) in enumerate(zip(sim_scenario_ids, sim_ids))}

dists, indices = tree.query(gt_pos, k=1)
try:
indices = [lookup[(s, i)] for (s, i) in zip(gt_scenario_ids, gt_ids)]
indices = np.array(indices, dtype=int)
except KeyError:
print("An agent present in the GT is missing in your simulation")
raise

tol_check = dists <= tolerance
sim_traj = {k: v[indices] for k, v in simulated.items()}

if not np.all(tol_check):
max_dist = np.max(dists)
raise ValueError(f"Didn't find a match for {np.sum(~tol_check)} agents, tolerance broken by {max_dist}m.")

if len(set(indices)) != len(indices):
raise ValueError("Duplicate matching found, I am sorry but this likely indicates that your data is wrong")

reordered_sim = {}
for key, val in simulated.items():
reordered_sim[key] = val[indices]
return reordered_sim
return sim_traj


def check_alignment(simulated, ground_truth, tolerance=1e-4):
Expand Down Expand Up @@ -72,8 +53,7 @@ def evaluate_trajectories(simulated_trajectory_file, args):
"""
env_name = "puffer_drive"
args["env"]["map_dir"] = args["eval"]["map_dir"]
args["env"]["num_maps"] = args["eval"]["num_maps"]
args["env"]["use_all_maps"] = True
args["env"]["num_maps"] = args["eval"]["wosac_num_maps"]
dataset_name = args["env"]["map_dir"].split("/")[-1]

print(f"Running WOSAC realism evaluation with {dataset_name} dataset. \n")
Expand All @@ -97,30 +77,26 @@ def evaluate_trajectories(simulated_trajectory_file, args):

print(f"Number of scenarios: {len(np.unique(gt_trajectories['scenario_id']))}")
print(f"Number of controlled agents: {num_agents_gt}")
print(f"Number of evaluated agents: {np.sum(gt_trajectories['id'] >= 0)}")
print(f"Number of evaluated agents: {gt_trajectories['is_track_to_predict'].sum()}")

print(f"Loading simulated trajectories from {simulated_trajectory_file}...")
with open(simulated_trajectory_file, "rb") as f:
sim_trajectories = pickle.load(f)

if sim_trajectories["x"].shape[0] != gt_trajectories["x"].shape[0]:
print("\nThe number of agents in simulated and ground truth trajectories do not match.")
print("This is okay if you are running this script on a subset of the val dataset")
print("But please also check that in drive.h MAX_AGENTS is set to 256 and recompile")

if not check_alignment(sim_trajectories, gt_trajectories):
print("\nTrajectories are not aligned, trying to align them, if it fails consider changing the tolerance.")
sim_trajectories = align_trajectories_by_initial_position(sim_trajectories, gt_trajectories)
assert check_alignment(sim_trajectories, gt_trajectories), (
"There might be an issue with the way you generated your data."
)
print("Alignment successful")
else:
sim_trajectories = {k: v[:num_agents_gt] for k, v in sim_trajectories.items()}

# Evaluator code expects to have matching ids between gt and sim trajectories
# Since alignment is checked it is safe to do that
sim_trajectories["id"][:] = gt_trajectories["id"][..., None]
num_agents_sim = sim_trajectories["x"].shape[0]
assert num_agents_sim >= num_agents_gt, (
"There is less agents in your simulation than in the GT, so the computation won't be valid"
)

if num_agents_sim > num_agents_gt:
print("If you are evaluating on a subset of your trajectories it is fine.")
print("\n Else, you should consider changing the value of MAX_AGENTS in drive.h and compile")

sim_trajectories = align_trajectories(sim_trajectories, gt_trajectories)

assert check_alignment(sim_trajectories, gt_trajectories), (
"There might be an issue with the way you generated your data."
)

agent_state = vecenv.driver_env.get_global_agent_state()
road_edge_polylines = vecenv.driver_env.get_road_edge_polylines()
Expand Down
Loading