Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
3401b7a
add dinov3
jveitchmichaelis Sep 12, 2025
8357539
attempt to gc after eval, suppress warnings
jveitchmichaelis Sep 14, 2025
a5c2705
add eval as callback
jveitchmichaelis Sep 22, 2025
b5aa289
trigger eval from callback only during training
jveitchmichaelis Sep 22, 2025
c7dc9fe
add eval to cli
jveitchmichaelis Sep 22, 2025
0d66fc3
move callbacks, add compress option
jveitchmichaelis Sep 22, 2025
fffcc06
batch eval + ddp tweaks
jveitchmichaelis Sep 24, 2025
d8f1105
attempt to fix barriers
jveitchmichaelis Sep 24, 2025
4a1ccbe
handle empty pred in image callback
jveitchmichaelis Sep 24, 2025
0eebb06
only rank 0 for image callback
jveitchmichaelis Sep 24, 2025
f2bcd8f
push predictions to comet after train
jveitchmichaelis Sep 24, 2025
20b1afd
support batch/upload eval to comet
jveitchmichaelis Sep 24, 2025
dded14c
eval gzip, defaults for cli
jveitchmichaelis Sep 24, 2025
c52793b
fallback to epoch
jveitchmichaelis Sep 24, 2025
e9e5c32
add idempotency to eval + comet upload
jveitchmichaelis Sep 25, 2025
33de05e
fix dataframe mutation bug
jveitchmichaelis Sep 25, 2025
e65fa14
handle empty columns robustly
jveitchmichaelis Sep 25, 2025
ae14323
optimize IoU
jveitchmichaelis Sep 26, 2025
a58b833
optimize sharding
jveitchmichaelis Sep 26, 2025
e0a1e5d
merge eval optim
jveitchmichaelis Sep 26, 2025
1eaaebc
improve parallel script
jveitchmichaelis Sep 26, 2025
0d80a19
fix eval merging
jveitchmichaelis Sep 29, 2025
c5bc3f3
supress many detection warning
jveitchmichaelis Sep 29, 2025
5e1a183
add iou tests
jveitchmichaelis Sep 29, 2025
24c5c67
bump lightning for barrier device id
jveitchmichaelis Oct 8, 2025
c696ab0
resume/fine-tune from ckpt
jveitchmichaelis Oct 14, 2025
4b6e293
backbone handling when loading from ckpt
jveitchmichaelis Oct 14, 2025
f44b1de
docs
jveitchmichaelis Oct 16, 2025
50227e8
correct lightning bump
jveitchmichaelis Oct 16, 2025
6e2ccc4
update model config override
jveitchmichaelis Oct 16, 2025
d570cac
allow ckpt for predict/eval
jveitchmichaelis Oct 20, 2025
11ac993
allow ckpt for predict/eval
jveitchmichaelis Oct 20, 2025
53668e3
preserve model names when finetuning
jveitchmichaelis Oct 20, 2025
5fb8e67
handle ckpts better
jveitchmichaelis Oct 23, 2025
70f5aa2
for this branch, use basenames for simplicity.
jveitchmichaelis Oct 23, 2025
967a44d
improve args for evaluate/predict
jveitchmichaelis Oct 23, 2025
85f95ff
support backbone lr/adamw
jveitchmichaelis Nov 10, 2025
d3814a0
fix shard duplication error
jveitchmichaelis Nov 10, 2025
f1dad19
add weight decay and fix detr box conversion
jveitchmichaelis Nov 10, 2025
ff80632
named lr groups
jveitchmichaelis Nov 10, 2025
0bd7cab
fix detr top-k
jveitchmichaelis Nov 11, 2025
8d5a889
add eos coefficient for detr
jveitchmichaelis Nov 12, 2025
86b05a3
only set unused params for dinov3
jveitchmichaelis Nov 12, 2025
6cfa2ca
switch eos_coefficient -> focal_alpha for detr
jveitchmichaelis Nov 13, 2025
8cc3fa3
test use aux loss
jveitchmichaelis Nov 13, 2025
d79a740
add conditional and vanilla detr for testing
jveitchmichaelis Nov 14, 2025
6895a07
update detr
jveitchmichaelis Nov 17, 2025
8dc5e60
allow mixed precision training
jveitchmichaelis Nov 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@ tests/__pycache__
tests/data/*
.vscode/
*ipynb_checkpoints/
docs/user_guide/deepforestr.md
docs/user_guide/deepforestr.md
53 changes: 39 additions & 14 deletions docs/user_guide/11_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -429,17 +429,27 @@ for tile in tiles_to_predict:

Usually creating this object does not cost too much computational time.

#### Training across multiple nodes on a HPC system
#### Training across multiple nodes/GPUs

We have heard that this error can appear when trying to deep copy the pytorch lightning module. The trainer object is not pickleable.
For example, on multi-gpu environments when trying to scale the deepforest model the entire module is copied leading to this error.
Setting the trainer object to None and directly using the pytorch object is a reasonable workaround.
If you have access to a HPC system or cluster, or simply a powerful desktop with multiple GPUs locally, you may want to take advantage of them. Fortunately, DeepForest uses Lightning which handles most of the distributed processing issues for you. Let's call the number of nodes "N" and the number of GPUs per node, "M". A common setup is a single node with up to `M=8` GPUs, but you may need to split procesing between machines, in which case you'd have multiple nodes.

If you're using a job manager like SLURM, you can express the number of GPUs via a configuration and the "allowed" device IDs will be passed to Lightning. On a local machine, Lightning will attempt to acquire whatever resources it can, unless you override and specify the `devices` argument, which can be a list. **On a managed cluster, do not do this: rely on 'auto' and let the scheduler to inform what GPUs are available.** The reason is that some clusters are unable to isolate GPU devices to jobs like they can with CPU cores, and you can interfere with other people's jobs if you try to acquire a device that wasn't allocated to you.

In most cases the only thing you need to set is the training strategy to be "DDP" (distributed-data parallel). Pytorch has a technical document here, but we provide a brief summary here with some practical tips. When training starts, `NM` copies of your program will be created. In DDP, the training dataset is sharded/split between these processes, so each epoch will be `len(dataset)/batch_size/NM` steps. At the end of each forward pass, all the processes are synchronized, the g are combined

Replace

```python
m = main.deepforest()
m.create_trainer()
v
logger=loggers,
callbacks=callbacks,
gradient_clip_val=0.5,
accelerator=config.accelerator,
strategy="ddp_find_unused_parameters_true"
if torch.cuda.is_available()
else "auto",
devices='auto'
)
m.trainer.fit(m)
```

Expand All @@ -449,21 +459,36 @@ with
m.trainer = None
from pytorch_lightning import Trainer

trainer = Trainer(
accelerator="gpu",
strategy="ddp",
devices=model.config.devices,
enable_checkpointing=False,
max_epochs=model.config.train.epochs,
logger=comet_logger
)
trainer = Trainer(
accelerator="gpu",
strategy="ddp_find_unused_parameters_true",
devices=model.config.devices,
enable_checkpointing=False,
max_epochs=model.config.train.epochs,
logger=comet_logger
)


trainer.fit(m)
```

The added benefits of this is more control over the trainer object.
The downside is that it doesn't align with the .config pattern where a user now has to look into the config to create the trainer.
We are open to changing this to be the default pattern in the future and welcome input from users.

#### Visualization during training

Visualizing images during training can be valuable to spot augmentation that isn't working as you expected, label issues and to see if the model is learning anything. To make this easy, we provide a Lightning callback that can be used with the trainer: `deepforest.callbacks.ImagesCallback`. You need to provide a directory path where the images will be saved, which can be a temporary path if you don't want to keep the images. To use, create the callback object and pass it to `create_trainer` along with any other callbacks you need.

```python
from deepforest import callbacks

im_callback = callbacks.ImagesCallback(save_dir=tmpdir, every_n_epochs=2)
m.create_trainer(callbacks=[im_callback])
```

The callback will, by default, log images to disk. When training starts, it will save images for the training and validation dataset (if available). Then at a user-specified interval (`every_n_epochs`), predictions will be logged with ground truth. If you have Comet or Tensorboard loggers (loggers which accept `add_image` or `log_image`) then the callback will attempt to log to those. Due to auto-discovery behavior with Comet, the callback will preferentially log to Tensorboard if present, to avoid images being pushed to Comet twice. To adjust the number of samples saved, modify `dataset_samples` and `prediction_samples` (set to 0 to disable).

#### Training via command line

We provide a basic script to trigger a training run via CLI. This script is installed as part of the standard DeepForest installation is called `deepforest train`. We use [Hydra](https://hydra.cc/docs/intro/) for configuration management and you can pass configuration parameters as command line arguments as follows:
Expand Down
29 changes: 24 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ dependencies = [
"h5py",
"huggingface_hub>=0.25.0",
"hydra-core",
"geopandas>=1.0.0",
"matplotlib",
"numpy<2.0",
"omegaconf",
Expand All @@ -46,21 +47,38 @@ dependencies = [
"pillow>6.2.0",
"psutil",
"pycocotools",
"pytorch-lightning>=2.1.0,<3.0.0",
"pytorch-lightning>=2.5.5,<3.0.0",
"pyyaml>=5.1.0",
"rasterio",
"rtree",
"safetensors<0.6.0",
"shapely>2.0.0",
"setuptools",
"slidingwindow",
"supervision",
"tensorboard",
"timm",
"torch>=2.2.0,<2.3.0",
"torchvision>=0.17.0,<0.18.0",
"torch>=2.7.0",
"torchvision>=0.17.0",
"tqdm",
"transformers",
"transformers>=4.56",
"xmltodict",
"transformers",
"timm>=1.0.15",
"faster-coco-eval>=1.6.7",
"comet-ml>=3.51.0",
]

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://download.pytorch.org/whl/cu128"
explicit = true

[tool.uv.sources]
torch = [
{ index = "pytorch-cu128", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
]
torchvision = [
{ index = "pytorch-cu128", marker = "sys_platform == 'linux' or sys_platform == 'win32'" },
]

[project.urls]
Expand Down Expand Up @@ -101,6 +119,7 @@ docs = [

[project.scripts]
deepforest = "deepforest.scripts.cli:main"
deepforest-evaluate = "deepforest.scripts.evaluate:main"

[build-system]
requires = ["setuptools>=61.0", "wheel"]
Expand Down
196 changes: 99 additions & 97 deletions src/deepforest/IoU.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,122 +2,124 @@
IoU Module, with help from https://github.com/SpaceNetChallenge/utilities/blob/spacenetV3/spacenetutilities/evalTools.py
"""

import geopandas as gpd
import numpy as np
import pandas as pd
import rtree
import shapely
from scipy.optimize import linear_sum_assignment
from shapely import STRtree


def create_rtree_from_poly(poly_list):
# create index
index = rtree.index.Index(interleaved=True)
for idx, geom in enumerate(poly_list):
index.insert(idx, geom.bounds)

return index


def _overlap_(test_poly, truth_polys, rtree_index):
"""Calculate overlap between one polygon and all ground truth by area."""
prediction_id = []
truth_id = []
area = []
matched_list = list(rtree_index.intersection(test_poly.geometry.bounds))
for index in truth_polys.index:
if index in matched_list:
# get the original index just to be sure
intersection_result = test_poly.geometry.intersection(
truth_polys.loc[index].geometry
)
intersection_area = intersection_result.area
else:
intersection_area = 0

prediction_id.append(test_poly.prediction_id)
truth_id.append(truth_polys.loc[index].truth_id)
area.append(intersection_area)

results = pd.DataFrame(
{"prediction_id": prediction_id, "truth_id": truth_id, "area": area}
)
return results
def _overlap_all(test_polys: "gpd.GeoDataFrame", truth_polys: "gpd.GeoDataFrame"):
"""Computes intersection and union areas for all polygons in the test/truth
dataframes.


def _overlap_all(test_polys, truth_polys, rtree_index):
"""Find area of overlap among all sets of ground truth and prediction."""
results = []
for _index, row in test_polys.iterrows():
result = _overlap_(
test_poly=row, truth_polys=truth_polys, rtree_index=rtree_index
Return NumPy arrays:
intersections : (n_truth, n_pred) intersection areas
unions : (n_truth, n_pred) union areas
truth_ids : (n_truth,) truth index values (order matches rows of areas/unions)
pred_ids : (n_pred,) prediction index values (order matches cols of areas/unions)
"""
# geometry arrays
pred_geoms = np.asarray(test_polys.geometry.values, dtype=object)
truth_geoms = np.asarray(truth_polys.geometry.values, dtype=object)

pred_ids = test_polys.index.to_numpy()
truth_ids = truth_polys.index.to_numpy()

n_pred = pred_geoms.size
n_truth = truth_geoms.size

# empty cases
if n_pred == 0 or n_truth == 0:
return (
np.zeros((n_truth, n_pred), dtype=float),
np.zeros((n_truth, n_pred), dtype=float),
truth_ids,
pred_ids,
)
results.append(result)
results = pd.concat(results, ignore_index=True)

return results
# spatial index on truth
tree = STRtree(truth_geoms)
p_idx, t_idx = tree.query(pred_geoms, predicate="intersects") # shape (2, M)

intersections = np.zeros((n_truth, n_pred), dtype=float)
unions = np.zeros((n_truth, n_pred), dtype=float)

def _iou_(test_poly, truth_poly):
"""Intersection over union."""
intersection_result = test_poly.intersection(truth_poly.geometry)
intersection_area = intersection_result.area
union_area = test_poly.union(truth_poly.geometry).area
return intersection_area / union_area
if p_idx.size:
inter = shapely.intersection(truth_geoms[t_idx], pred_geoms[p_idx])
uni = shapely.union(truth_geoms[t_idx], pred_geoms[p_idx])
intersections[t_idx, p_idx] = shapely.area(inter)
unions[t_idx, p_idx] = shapely.area(uni)

return intersections, unions, truth_ids, pred_ids

def compute_IoU(ground_truth, submission):
"""

Args:
ground_truth: a projected geopandas dataframe with geoemtry
submission: a projected geopandas dataframe with geometry
Returns:
iou_df: dataframe of IoU scores
"""
# Create index columns for ease
ground_truth["truth_id"] = ground_truth.index.values
submission["prediction_id"] = submission.index.values

# rtree_index
rtree_index = create_rtree_from_poly(ground_truth.geometry)

# find overlap among all sets
overlap_df = _overlap_all(
test_polys=submission, truth_polys=ground_truth, rtree_index=rtree_index
def compute_IoU(ground_truth: "gpd.GeoDataFrame", submission: "gpd.GeoDataFrame"):
# Compute truth <> prediction overlaps
intersections, unions, truth_ids, pred_ids = _overlap_all(
test_polys=submission, truth_polys=ground_truth
)

# Create cost matrix for assignment
matrix = overlap_df.pivot(
index="truth_id", columns="prediction_id", values="area"
).values
# Cost matrix is the intersection area
matrix = intersections

if matrix.size == 0:
# No matches, early exit
return pd.DataFrame(
{
"prediction_id": pd.Series(dtype="float64"),
"truth_id": pd.Series(dtype=truth_ids.dtype),
"IoU": pd.Series(dtype="float64"),
"score": pd.Series(dtype="float64"),
"geometry": pd.Series(dtype=object),
}
)

# Linear sum assignment + match lookup
row_ind, col_ind = linear_sum_assignment(matrix, maximize=True)
match_for_truth = dict(zip(row_ind, col_ind, strict=False))

# Score lookup
pred_scores = submission["score"].to_dict() if "score" in submission.columns else {}

# IoU matrix
with np.errstate(divide="ignore", invalid="ignore"):
iou_mat = np.divide(
intersections,
unions,
out=np.zeros_like(intersections, dtype=float),
where=unions > 0,
)

# Create IoU dataframe, match those predictions and ground truth, IoU = 0
# for all others, they will get filtered out
iou_df = []
for index, _row in ground_truth.iterrows():
if index in row_ind:
matched_id = col_ind[np.where(index == row_ind)[0][0]]
iou = _iou_(
submission[submission.prediction_id == matched_id],
ground_truth.loc[index],
)
score = submission[submission.prediction_id == matched_id].score.values[0]
# build rows for every truth element (unmatched => None, IoU 0)
records = []
for t_idx, truth_id in enumerate(truth_ids):
# If we matched this truth box
if t_idx in match_for_truth:
# Look up matching prediction and corresponding IoU and score
p_idx = match_for_truth[t_idx]
pred_id = pred_ids[p_idx]
iou = float(iou_mat[t_idx, p_idx])
score = pred_scores.get(pred_id, None)
else:
iou = 0
matched_id = None
pred_id = None
iou = 0.0
score = None
iou_df.append(
pd.DataFrame(
{
"prediction_id": [matched_id],
"truth_id": [index],
"IoU": iou,
"score": score,
}
)
records.append(
{
"prediction_id": pred_id,
"truth_id": truth_id,
"IoU": iou,
"score": score,
}
)

iou_df = pd.concat(iou_df)
iou_df = iou_df.merge(ground_truth[["truth_id", "geometry"]])

# Output dataframe
iou_df = pd.DataFrame.from_records(records)
iou_df = iou_df.merge(
ground_truth.assign(truth_id=truth_ids)[["truth_id", "geometry"]],
on="truth_id",
how="left",
)
return iou_df
Loading