Full disclosure: I used Claude to help me track this down, but have gone through a lot of passes to try to reduce the noise and remove hallucinations. I'm going to keep using 0.18 in the near-term, as I'm also blocked on some navmesh crates updating, but I'll save my work in progress on a branch in case you'd like me to do any more investigation. Hopefully this is helpful.
Bevy version and features
- 0.19.0 (release, crates.io)
- Non-default features:
bevy = { version = "0.19", default-features = false, features = [
"3d", "bevy_dev_tools", "bevy_remote", "exr", "jpeg", "multi_threaded", "ui",
] }
Relevant system information
- Rust:
1.96.0 (stable)
- OS: macOS (Darwin 25.5.0)
SystemInfo { os: "macOS 26.5.1", kernel: "25.5.0", cpu: "Apple M4 Pro", core_count: "14", memory: "48.0 GiB" }
AdapterInfo { name: "Apple M4 Pro", vendor: 0, device: 0, device_type: IntegratedGpu, device_pci_bus_id: "", driver: "", driver_info: "", backend: Metal, subgroup_min_size: 4, subgroup_max_size: 64, transient_saves_memory: true }
What you did
A 3D app that renders most content (units, foliage, weapons) through custom GPU
instancing, so the number of bevy-managed mesh instances (terrain chunks, weapon
parts, projectiles) stays under 256, with meshes moving every frame. The 3D
camera carries NoIndirectDrawing, so play runs in CpuCulling mode with no issue.
The panic fires deterministically on opening the pause menu, which spawns a UI
Camera2d that lacks NoIndirectDrawing. extract_meshes_for_gpu_building derives
any_gpu_culling = !gpu_culling_query.is_empty(), where
gpu_culling_query: Extract<Query<(), (With<Camera>, Without<NoIndirectDrawing>)>>
(mesh.rs:1958, 1967):
- During play the only camera has
NoIndirectDrawing → any_gpu_culling = false
→ the mesh instance queue is CpuCulling → mesh_culling_data_buffer is unused and
empty, even though current_input_buffer already holds N (< 256) instances.
- The menu
Camera2d has no NoIndirectDrawing → any_gpu_culling = true →
RenderMeshInstanceGpuQueue::init(true) switches to GpuCulling (mesh.rs:1294).
- That frame,
collect_meshes_for_gpu_building calls
mesh_culling_data_buffer.grow(current_input_buffer.len() = N) for the first time
at N < 256. The off-by-one (below) zeroes its dirty_pages, and the following
fast-path set for a moving mesh panics.
Both buffers are monotonic (never cleared/truncated) and grow early-returns unless
new_len > old_len, so this is the first GpuCulling frame, not steady state.
Minimal repro: one camera with GPU culling (no NoIndirectDrawing), a 3D scene
with fewer than 256 mesh instances, at least one moving (so the collection fast
path runs).
What went wrong
Panic from the parallel mesh-collection task pool:
thread 'Compute Task Pool (3)' panicked at
bevy_render-0.19.0/src/render_resource/sparse_buffer_vec.rs:589:25:
index out of bounds: the len is 0 but the index is 0
2: core::panicking::panic_bounds_check
3: AtomicSparseBufferVec<T>::set // -> note_changed_index
4: bevy_pbr::render::mesh::collect_meshes_for_gpu_building::{{closure}}::{{closure}}
at bevy_pbr-0.19.0/src/render/mesh.rs:2603:38 // mesh_culling_data_buffer.set(...)
...
45: bevy_pbr::render::mesh::collect_meshes_for_gpu_building
at bevy_pbr-0.19.0/src/render/mesh.rs:2472:32 // ComputeTaskPool scope
Root cause
MeshCullingDataBuffer is an AtomicSparseBufferVec<MeshCullingData> with
page_size_log2 = 8 → page size = 256 elements (mesh.rs:650, 1639-1647). Each
frame collect_meshes_for_gpu_building grows it to the instance count (mesh.rs:2442):
mesh_culling_data_buffer.grow(current_input_buffer.len() as u32);
AtomicSparseBufferVec::grow sizes dirty_pages from the floored page index of
new_len instead of the page count needed for new_len elements
(sparse_buffer_vec.rs, grow ~612-650):
self.values.resize_with(new_len as usize, T::Blob::default); // values -> new_len
let new_page_count = self.index_to_page(new_len); // = new_len / 256 (floored!)
self.dirty_pages.resize_with(
(new_page_count as usize).div_ceil(PAGES_PER_DIRTY_WORD as usize),
|| AtomicU64::new(u64::MAX),
);
fn index_to_page(&self, index: u32) -> u32 { index / self.page_size() } // floor
For 0 < new_len < 256: new_page_count = 0 → dirty_pages is truncated to length
0, while values keeps new_len elements (all in page 0, which needs one dirty
word). push does this correctly (index_to_page(index) / PAGES_PER_DIRTY_WORD + 1);
grow does not. set then writes values[index] fine but panics in
note_changed_index (sparse_buffer_vec.rs:564-590):
pub fn set(&self, index: u32, value: T) {
value.write_to_blob(&self.values[index as usize]); // ok: values has the index
self.note_changed_index(index); // panics below
}
fn note_changed_index(&self, index: u32) {
let page = self.index_to_page(index);
let page_word = page / PAGES_PER_DIRTY_WORD;
self.dirty_pages[page_word as usize].fetch_or(...); // :589 dirty_pages empty -> panic
}
The panic at line 589 (not the values[index] write on 565) proves values contains
the index — the inconsistency is purely values (populated) vs dirty_pages (empty).
Additional information
Possible fix
In AtomicSparseBufferVec::grow, size dirty_pages from the page count needed to hold
new_len elements:
let new_page_count = new_len.div_ceil(self.page_size()); // not index_to_page(new_len)
(truncate has the same floor-based count but is only called with len = 0, where it
is harmless. The real defect is grow truncating dirty_pages below what values
requires.)
Workarounds
- Put
NoIndirectDrawing on every camera, including UI Camera2ds. Keeps
any_gpu_culling = false (CpuCulling), so MeshCullingDataBuffer is never used.
Clean and consistent. (The inverse — no NoIndirectDrawing anywhere, i.e. always
GpuCulling — does not help, since the bug is in the GpuCulling path itself.)
PbrPlugin { use_gpu_instance_buffer_builder: false } also avoids this path, but
the per-view batch-set type is still chosen from the hardware-detected
GpuPreprocessingMode (Culling), not from that flag — producing an inconsistent
state that logs "Dynamic uniform batch sets should be used when GPU preprocessing is off" and breaks real material draws. Only appropriate when the device lacks GPU
preprocessing.
Likely-related code / history
Full disclosure: I used Claude to help me track this down, but have gone through a lot of passes to try to reduce the noise and remove hallucinations. I'm going to keep using 0.18 in the near-term, as I'm also blocked on some navmesh crates updating, but I'll save my work in progress on a branch in case you'd like me to do any more investigation. Hopefully this is helpful.
Bevy version and features
Relevant system information
1.96.0(stable)What you did
A 3D app that renders most content (units, foliage, weapons) through custom GPU
instancing, so the number of bevy-managed mesh instances (terrain chunks, weapon
parts, projectiles) stays under 256, with meshes moving every frame. The 3D
camera carries
NoIndirectDrawing, so play runs in CpuCulling mode with no issue.The panic fires deterministically on opening the pause menu, which spawns a UI
Camera2dthat lacksNoIndirectDrawing.extract_meshes_for_gpu_buildingderivesany_gpu_culling = !gpu_culling_query.is_empty(), wheregpu_culling_query: Extract<Query<(), (With<Camera>, Without<NoIndirectDrawing>)>>(mesh.rs:1958, 1967):
NoIndirectDrawing→any_gpu_culling = false→ the mesh instance queue is
CpuCulling→mesh_culling_data_bufferis unused andempty, even though
current_input_bufferalready holds N (< 256) instances.Camera2dhas noNoIndirectDrawing→any_gpu_culling = true→RenderMeshInstanceGpuQueue::init(true)switches toGpuCulling(mesh.rs:1294).collect_meshes_for_gpu_buildingcallsmesh_culling_data_buffer.grow(current_input_buffer.len() = N)for the first timeat N < 256. The off-by-one (below) zeroes its
dirty_pages, and the followingfast-path
setfor a moving mesh panics.Both buffers are monotonic (never cleared/truncated) and
growearly-returns unlessnew_len > old_len, so this is the first GpuCulling frame, not steady state.Minimal repro: one camera with GPU culling (no
NoIndirectDrawing), a 3D scenewith fewer than 256 mesh instances, at least one moving (so the collection fast
path runs).
What went wrong
Panic from the parallel mesh-collection task pool:
Root cause
MeshCullingDataBufferis anAtomicSparseBufferVec<MeshCullingData>withpage_size_log2 = 8→ page size = 256 elements (mesh.rs:650, 1639-1647). Eachframe
collect_meshes_for_gpu_buildinggrows it to the instance count (mesh.rs:2442):AtomicSparseBufferVec::growsizesdirty_pagesfrom the floored page index ofnew_leninstead of the page count needed fornew_lenelements(sparse_buffer_vec.rs, grow ~612-650):
For
0 < new_len < 256:new_page_count = 0→dirty_pagesis truncated to length0, while
valueskeepsnew_lenelements (all in page 0, which needs one dirtyword).
pushdoes this correctly (index_to_page(index) / PAGES_PER_DIRTY_WORD + 1);growdoes not.setthen writesvalues[index]fine but panics innote_changed_index(sparse_buffer_vec.rs:564-590):The panic at line 589 (not the
values[index]write on 565) provesvaluescontainsthe index — the inconsistency is purely
values(populated) vsdirty_pages(empty).Additional information
Possible fix
In
AtomicSparseBufferVec::grow, sizedirty_pagesfrom the page count needed to holdnew_lenelements:(
truncatehas the same floor-based count but is only called withlen = 0, where itis harmless. The real defect is
growtruncatingdirty_pagesbelow whatvaluesrequires.)
Workarounds
NoIndirectDrawingon every camera, including UICamera2ds. Keepsany_gpu_culling = false(CpuCulling), soMeshCullingDataBufferis never used.Clean and consistent. (The inverse — no
NoIndirectDrawinganywhere, i.e. alwaysGpuCulling — does not help, since the bug is in the GpuCulling path itself.)
PbrPlugin { use_gpu_instance_buffer_builder: false }also avoids this path, butthe per-view batch-set type is still chosen from the hardware-detected
GpuPreprocessingMode(Culling), not from that flag — producing an inconsistentstate that logs
"Dynamic uniform batch sets should be used when GPU preprocessing is off"and breaks real material draws. Only appropriate when the device lacks GPUpreprocessing.
Likely-related code / history
AtomicSparseBufferVec)