Skip to content

Conversation

@SuMayaBee
Copy link
Contributor

Description

Allow users to pass image_path and label arguments to read_file() when reading shapefiles that don't have these columns.

Problem

When users have a shapefile without image_path or label columns, read_file() crashes with:

ValueError: No image_path column found in shapefile, please specify rgb path

Solution

  1. Pass image_path argument to shapefile_to_annotations() - Modified utilities.py to forward the image_path argument when reading shapefiles.

  2. Add warning when image_path is passed - Users see a warning confirming the value will be assigned to every row.

  3. Allow label argument - Removed blocking error for missing label column. Now defaults to "Unknown" if not provided.

  4. Write tests for shapefiles - Added 4 tests covering all scenarios.

  5. Add documentation example - Added example in docs/user_guide/01_Reading_data.md with argument table showing required vs optional parameters.

Before & After

For shapefiles without image_path and label columns:

# Scenario Before After
1 Pass only image_path argument ValueError ✅ Works, label defaults to "Unknown"
2 Pass both image_path and label arguments ValueError ✅ Works

For shapefiles with image_path and label columns:

# Scenario Before After
1 No arguments passed ✅ Works ✅ Works (reads from columns)
2 Pass image_path argument ✅ Works ✅ Works (uses argument, overrides column)
3 Pass label argument ✅ Works ✅ Works (uses argument, overrides column)
4 Pass both arguments ✅ Works ✅ Works (uses arguments, overrides columns)

Usage Improvement

Previously (workaround):

raster_path = "/path/to/image.tif"
gdf = gpd.read_file("/path/to/annotations.shp")
gdf["image_path"] = os.path.basename(raster_path)
gdf["label"] = "Tree"
ground_truth = read_file(gdf, root_dir=os.path.dirname(raster_path))

This required two extra library imports (os and geopandas).

Now:

ground_truth = read_file("/path/to/annotations.shp", image_path="/path/to/image.tif", label="Tree")

No extra library imports needed. Here, root_dir is optional when image_path is a full path. label is optional too and defaults to "Unknown" when not provided.

Related Issue(s)

Closes #997

@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.91%. Comparing base (0ab23a3) to head (140606f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1242      +/-   ##
==========================================
+ Coverage   87.73%   87.91%   +0.17%     
==========================================
  Files          20       20              
  Lines        2716     2715       -1     
==========================================
+ Hits         2383     2387       +4     
+ Misses        333      328       -5     
Flag Coverage Δ
unittests 87.91% <100.00%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jveitchmichaelis
Copy link
Collaborator

Thanks for the contribution. A couple of comments on the tests, there is quite a lot of duplicate code and perhaps it would be better to use pytest fixtures for some of the input data?

Please could you also include the AI assistance declaration from the PR template? (you can see an example here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve deepforest.utilities.read_file for reading a .shp that doesn't have a image_path column

2 participants