Skip to content

ENH: add missing metadata to DCMTKImageIO#6217

Open
zivy wants to merge 6 commits intoInsightSoftwareConsortium:mainfrom
zivy:enhanceDCMTKIOFunctionality
Open

ENH: add missing metadata to DCMTKImageIO#6217
zivy wants to merge 6 commits intoInsightSoftwareConsortium:mainfrom
zivy:enhanceDCMTKIOFunctionality

Conversation

@zivy
Copy link
Copy Markdown
Member

@zivy zivy commented May 5, 2026

The DCMTKImageIO was not updating the metadata dictionary resulting in empty metadata when reading DICOM files. This commit adds the missing code to update the metadata dictionary with the DICOM tags read from the file.

Note that tags that are part of a DICOM sequence are skipped due to the flat nature of the metadata dictionary. Encoding DICOM sequences requires supporting a hierarchical structure in the metadata dictionary which is not straightforward. This omission is consistent with the GDCMImageIO behavior. Possibly address in the future.

Inspired by PR #6189
Tool: GitHub copilot (claude sonnet 4.6)

@zivy zivy requested a review from blowekamp May 5, 2026 20:07
@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Enhancement Improvement of existing methods or implementation type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct area:IO Issues affecting the IO module labels May 5, 2026
@zivy zivy requested a review from hjmjohnson May 5, 2026 20:07
@greptile-apps

This comment was marked as resolved.

Comment thread Modules/IO/DCMTK/test/itkDCMTKImageIOMetadataTest.cxx Outdated
Comment thread Modules/IO/DCMTK/src/itkDCMTKImageIO.cxx Outdated
@blowekamp
Copy link
Copy Markdown
Member

Very Nice!

It would be nice for the test to be a GTest. I am sure I created some solution to reference a data file from a hard coded GTest some place.

Is there any way were could have converage to testing for more VR types to ensure there are handled correctly?
https://dicom.nema.org/dicom/2013/output/chtml/part05/sect_6.2.html

zivy and others added 3 commits May 5, 2026 16:53
The DCMTKImageIO was not updating the metadata dictionary resulting
in empty metadata when reading DICOM files. This commit adds the missing
code to update the metadata dictionary with the DICOM tags read from
the file.

Note that tags that are part of a DICOM sequence are skipped due to
the flat nature of the metadata dictionary. Encoding DICOM sequences
requires supporting a heirarchical structure in the metadata dictionary
which is not straightforward. This omission is consistent with the
GDCMImageIO behavior. Possibly address in the future.
DICOM Supplements 172/173 added the OD, OL, and OV bulk-data VRs.
Add them to the binary-VR check in DCMTKImageIO so values reach the
base64 encoder instead of falling through to the text getOFStringArray
branch (which silently drops them).
Tag (0008,0021) is SeriesDate; StudyDate is (0008,0020).  Test
assertion was already against the right tag number — only the
comment was misleading.
@hjmjohnson hjmjohnson force-pushed the enhanceDCMTKIOFunctionality branch from b27d12c to 17a89db Compare May 5, 2026 22:00
@hjmjohnson
Copy link
Copy Markdown
Member

hjmjohnson commented May 5, 2026

@zivy — pushed two small follow-up commits to your branch via maintainerCanModify (rebased onto current main first):

  • f5a977cf BUG: cover EVR_OD / EVR_OL / EVR_OV in the binary-VR base64 path (Greptile P2 #2).
  • 17a89dbd STYLE: correct the (0008,0021) → SeriesDate comment in the metadata test (Greptile P2 #1).

Both Greptile threads now resolved.

@blowekamp — left the test as a CTest for now. The clean GTest conversion needs a way to bake the DATA{...}-resolved fixture path in at compile time (since GTest drivers don't take command-line args). I attempted ExternalData_Expand_Arguments + target_compile_definitions and it works in principle, but the staged-path timing is fragile and would benefit from being rolled out as a separate creategoogletestdriver enhancement that all DICOM tests can use. Happy to file a follow-up issue if you'd like — for now the GTest conversion is deferred. The "more VR types" coverage you mentioned needs new DICOM fixture files (we don't have OD/OL/OV samples in ITKData).

hjmjohnson added 2 commits May 5, 2026 17:23
The accessor returns the raw mutable DcmDataset *, so a const-qualified
method would let callers mutate dataset state through a const reference.
Drop the const to match the returned-pointer mutability.
Sequences are nested datasets; getUint8Array() does not produce their
content, so they were silently dropped from the dictionary.  Match
GDCMImageIO and skip them explicitly.
Copy link
Copy Markdown
Member

@hjmjohnson hjmjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I'll let @zivy or @blowekamp merge after reviewing followup refinements.

@blowekamp I think we should delay making gtest conversion for tests that need arguments.

@zivy
Copy link
Copy Markdown
Member Author

zivy commented May 5, 2026

@hjmjohnson appreciate the corrections. Will merge once all the tests complete and pass.

@blowekamp
Copy link
Copy Markdown
Member

@blowekamp I think we should delay making gtest conversion for tests that need arguments.

Agreed.

@blowekamp — left the test as a CTest for now. The clean GTest conversion needs a way to bake the DATA{...}-

I thought I had added a case of this in ITK, but atlas I can't find it. For reference this is how it is done in SimpleITK

ExternalData_Expand_Argument in CMakeList.txt but tests have been written from the ground up to prefer generated data over files.

A DataFinder or TestHarness or Fixture class with method to get directories, and input file names, which is configured with paths via a Cmake in file: https://github.com/SimpleITK/SimpleITK/blob/694b3410ddf9cf401ef75cfe9f0a7a7ef21133a6/Testing/Unit/CMakeLists.txt#L459-L475

Incorporating this type of structure into a ITK Gtest fixture that is generated/configured in the creategoogletstdriver may be a reasonable way to move forward, in another issue.

IsImageFile(const std::string & filename);

DcmDataset *
GetDataset()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be exposing a DCMTK in the public interface violating encapsulation principles. This needs a closer look.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blowekamp — agreed, fixed in 1d42617.

GetDataset() was consumed at exactly one site (the new metadata loop in itkDCMTKImageIO.cxx); there was no external consumer. The cleanest remediation is to move the loop into DCMTKFileReader itself so DcmDataset * never leaves the reader. The public surface of DCMTKFileReader is now ITK-only for the new method:

// itkDCMTKFileReader.h — public, ITK-only
void
PopulateMetaDataDictionary(MetaDataDictionary & dict) const;

The full iteration loop, binary-VR base64 encoding, sequence-skip, and pixel-data-skip rules now live in itkDCMTKFileReader.cxx co-located with the other DCMTK-aware element accessors (GetElementLO, GetElementDS, GetElementFD, …). DCMTKImageIO::ReadImageInformation() collapses to one line:

reader.PopulateMetaDataDictionary(this->GetMetaDataDictionary());

DCMTK headers (dcelem.h) and itksys/Base64.h move out of itkDCMTKImageIO.cxx along with the loop. No behavioral change. Net diff: 3 files changed, 74 insertions, 71 deletions.

The pre-existing DCMTK leakage on AddDictEntry(DcmDictEntry *), SetDcmItem(DcmItem *), etc. on this header is a separate (much larger) cleanup — happy to do a follow-up split into itkDCMTKFileReader.h (ITK-only) + itkDCMTKFileReaderInternal.h (DCMTK-aware, in src/) if you'd like, but it's out of scope for this PR.

@blowekamp
Copy link
Copy Markdown
Member

blowekamp commented May 6, 2026

This was merged too quickly for me to review changes.

@zivy
Copy link
Copy Markdown
Member Author

zivy commented May 6, 2026

@blowekamp while you commented that the PR was "merged too quickly", it doesn't seem to be merged. For some reason @hjmjohnson closed it via an unrelated GDCM PR, #6218.

Before I address @blowekamp's valid comment on exposing DCMTK in public interface, I'd like to understand what's going on.

@hjmjohnson, please confirm that this PR was indeed closed by mistake or let us know why it was closed.

@hjmjohnson hjmjohnson reopened this May 6, 2026
@hjmjohnson
Copy link
Copy Markdown
Member

Sorry, I did not intend to close this issue.

Move the metadata-population loop into
DCMTKFileReader::PopulateMetaDataDictionary so DcmDataset never escapes
the reader. Per @blowekamp review.
@hjmjohnson hjmjohnson force-pushed the enhanceDCMTKIOFunctionality branch from 1d42617 to 3943066 Compare May 6, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:IO Issues affecting the IO module type:Enhancement Improvement of existing methods or implementation type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants