ENH: add missing metadata to DCMTKImageIO#6217
ENH: add missing metadata to DCMTKImageIO#6217zivy wants to merge 6 commits intoInsightSoftwareConsortium:mainfrom
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
|
Very Nice! It would be nice for the test to be a GTest. I am sure I created some solution to reference a data file from a hard coded GTest some place. Is there any way were could have converage to testing for more VR types to ensure there are handled correctly? |
The DCMTKImageIO was not updating the metadata dictionary resulting in empty metadata when reading DICOM files. This commit adds the missing code to update the metadata dictionary with the DICOM tags read from the file. Note that tags that are part of a DICOM sequence are skipped due to the flat nature of the metadata dictionary. Encoding DICOM sequences requires supporting a heirarchical structure in the metadata dictionary which is not straightforward. This omission is consistent with the GDCMImageIO behavior. Possibly address in the future.
DICOM Supplements 172/173 added the OD, OL, and OV bulk-data VRs. Add them to the binary-VR check in DCMTKImageIO so values reach the base64 encoder instead of falling through to the text getOFStringArray branch (which silently drops them).
Tag (0008,0021) is SeriesDate; StudyDate is (0008,0020). Test assertion was already against the right tag number — only the comment was misleading.
b27d12c to
17a89db
Compare
|
@zivy — pushed two small follow-up commits to your branch via
Both Greptile threads now resolved. @blowekamp — left the test as a CTest for now. The clean GTest conversion needs a way to bake the |
The accessor returns the raw mutable DcmDataset *, so a const-qualified method would let callers mutate dataset state through a const reference. Drop the const to match the returned-pointer mutability.
Sequences are nested datasets; getUint8Array() does not produce their content, so they were silently dropped from the dictionary. Match GDCMImageIO and skip them explicitly.
hjmjohnson
left a comment
There was a problem hiding this comment.
LGTM I'll let @zivy or @blowekamp merge after reviewing followup refinements.
@blowekamp I think we should delay making gtest conversion for tests that need arguments.
|
@hjmjohnson appreciate the corrections. Will merge once all the tests complete and pass. |
Agreed.
I thought I had added a case of this in ITK, but atlas I can't find it. For reference this is how it is done in SimpleITK ExternalData_Expand_Argument in CMakeList.txt but tests have been written from the ground up to prefer generated data over files. A DataFinder or TestHarness or Fixture class with method to get directories, and input file names, which is configured with paths via a Cmake in file: https://github.com/SimpleITK/SimpleITK/blob/694b3410ddf9cf401ef75cfe9f0a7a7ef21133a6/Testing/Unit/CMakeLists.txt#L459-L475 Incorporating this type of structure into a ITK Gtest fixture that is generated/configured in the creategoogletstdriver may be a reasonable way to move forward, in another issue. |
| IsImageFile(const std::string & filename); | ||
|
|
||
| DcmDataset * | ||
| GetDataset() |
There was a problem hiding this comment.
This appears to be exposing a DCMTK in the public interface violating encapsulation principles. This needs a closer look.
There was a problem hiding this comment.
@blowekamp — agreed, fixed in 1d42617.
GetDataset() was consumed at exactly one site (the new metadata loop in itkDCMTKImageIO.cxx); there was no external consumer. The cleanest remediation is to move the loop into DCMTKFileReader itself so DcmDataset * never leaves the reader. The public surface of DCMTKFileReader is now ITK-only for the new method:
// itkDCMTKFileReader.h — public, ITK-only
void
PopulateMetaDataDictionary(MetaDataDictionary & dict) const;The full iteration loop, binary-VR base64 encoding, sequence-skip, and pixel-data-skip rules now live in itkDCMTKFileReader.cxx co-located with the other DCMTK-aware element accessors (GetElementLO, GetElementDS, GetElementFD, …). DCMTKImageIO::ReadImageInformation() collapses to one line:
reader.PopulateMetaDataDictionary(this->GetMetaDataDictionary());DCMTK headers (dcelem.h) and itksys/Base64.h move out of itkDCMTKImageIO.cxx along with the loop. No behavioral change. Net diff: 3 files changed, 74 insertions, 71 deletions.
The pre-existing DCMTK leakage on AddDictEntry(DcmDictEntry *), SetDcmItem(DcmItem *), etc. on this header is a separate (much larger) cleanup — happy to do a follow-up split into itkDCMTKFileReader.h (ITK-only) + itkDCMTKFileReaderInternal.h (DCMTK-aware, in src/) if you'd like, but it's out of scope for this PR.
|
This was merged too quickly for me to review changes. |
|
@blowekamp while you commented that the PR was "merged too quickly", it doesn't seem to be merged. For some reason @hjmjohnson closed it via an unrelated GDCM PR, #6218. Before I address @blowekamp's valid comment on exposing DCMTK in public interface, I'd like to understand what's going on. @hjmjohnson, please confirm that this PR was indeed closed by mistake or let us know why it was closed. |
|
Sorry, I did not intend to close this issue. |
Move the metadata-population loop into DCMTKFileReader::PopulateMetaDataDictionary so DcmDataset never escapes the reader. Per @blowekamp review.
1d42617 to
3943066
Compare
The DCMTKImageIO was not updating the metadata dictionary resulting in empty metadata when reading DICOM files. This commit adds the missing code to update the metadata dictionary with the DICOM tags read from the file.
Note that tags that are part of a DICOM sequence are skipped due to the flat nature of the metadata dictionary. Encoding DICOM sequences requires supporting a hierarchical structure in the metadata dictionary which is not straightforward. This omission is consistent with the GDCMImageIO behavior. Possibly address in the future.
Inspired by PR #6189
Tool: GitHub copilot (claude sonnet 4.6)