dsr2xml - Man Page
Convert DICOM SR file and data set to XML
Synopsis
dsr2xml [options] dsrfile-in [xmlfile-out]
Description
The dsr2xml utility converts the contents of a DICOM Structured Reporting (SR) document (file format or raw data set) to XML (Extensible Markup Language). The XML Schema dsr2xml.xsd does not yet follow any standard format. However, the dsr2xml application might be enhanced in this aspect in the future (e.g. by supporting HL7/CDA - Clinical Document Architecture).
If dsr2xml reads a raw data set (DICOM data without a file format meta-header) it will attempt to guess the transfer syntax by examining the first few bytes of the file. It is not always possible to correctly guess the transfer syntax and it is better to convert a data set to a file format whenever possible (using the dcmconv utility). It is also possible to use the -f and -t[ieb] options to force dsr2xml to read a dataset with a particular transfer syntax.
Parameters
dsrfile-in DICOM SR input filename to be converted xmlfile-out XML output filename (default: stdout)
Options
general options
-h --help print this help text and exit --version print version information and exit --arguments print expanded command line arguments -q --quiet quiet mode, print no warnings and errors -v --verbose verbose mode, print processing details -d --debug debug mode, print debug information -ll --log-level [l]evel: string constant (fatal, error, warn, info, debug, trace) use level l for the logger -lc --log-config [f]ilename: string use config file f for the logger
input options
input file format: +f --read-file read file format or data set (default) +fo --read-file-only read file format only -f --read-dataset read data set without file meta information input transfer syntax: -t= --read-xfer-auto use TS recognition (default) -td --read-xfer-detect ignore TS specified in the file meta header -te --read-xfer-little read with explicit VR little endian TS -tb --read-xfer-big read with explicit VR big endian TS -ti --read-xfer-implicit read with implicit VR little endian TS
processing options
error handling: -Er --unknown-relationship accept unknown/missing relationship type -Ev --invalid-item-value accept invalid content item value (e.g. violation of VR or VM definition) -Ec --ignore-constraints ignore relationship content constraints -Ee --ignore-item-errors do not abort on content item errors, just warn (e.g. missing value type specific attributes) -Ei --skip-invalid-items skip invalid content items (including sub-tree) -Dv --disable-vr-checker disable check for VR-conformant string values specific character set: +Cr --charset-require require declaration of extended charset (default) +Ca --charset-assume [c]harset: string assume charset c if no extended charset declared +Cc --charset-check-all check all data elements with string values (default: only PN, LO, LT, SH, ST, UC and UT) # this option is only used for the mapping to an appropriate # XML character encoding, but not for the conversion to UTF-8 +U8 --convert-to-utf8 convert all element values that are affected by Specific Character Set (0008,0005) to UTF-8 # requires support from an underlying character encoding library # (see output of --version on which one is available)
output options
encoding: +Ea --attr-all encode everything as XML attribute (shortcut for +Ec, +Er, +Ev and +Et) +Ec --attr-code encode code value, coding scheme designator and coding scheme version as XML attribute +Er --attr-relationship encode relationship type as XML attribute +Ev --attr-value-type encode value type as XML attribute +Et --attr-template-id encode template id as XML attribute +Ee --template-envelope template element encloses content items (requires +Wt, implies +Et) XML structure: +Xs --add-schema-reference add reference to XML Schema "dsr2xml.xsd" (not with +Ea, +Ec, +Er, +Ev, +Et, +Ee, +We) +Xn --use-xml-namespace add XML namespace declaration to root element writing: +We --write-empty-tags write all tags even if their value is empty +Wi --write-item-id always write item identifier +Wt --write-template-id write template identification information
Notes
DICOM Conformance
The dsr2xml utility supports the following SOP Classes:
SpectaclePrescriptionReportStorage 1.2.840.10008.5.1.4.1.1.78.6 MacularGridThicknessAndVolumeReportStorage 1.2.840.10008.5.1.4.1.1.79.1 BasicTextSRStorage 1.2.840.10008.5.1.4.1.1.88.11 EnhancedSRStorage 1.2.840.10008.5.1.4.1.1.88.22 ComprehensiveSRStorage 1.2.840.10008.5.1.4.1.1.88.33 Comprehensive3DSRStorage 1.2.840.10008.5.1.4.1.1.88.34 ProcedureLogStorage 1.2.840.10008.5.1.4.1.1.88.40 MammographyCADSRStorage 1.2.840.10008.5.1.4.1.1.88.50 KeyObjectSelectionDocumentStorage 1.2.840.10008.5.1.4.1.1.88.59 ChestCADSRStorage 1.2.840.10008.5.1.4.1.1.88.65 XRayRadiationDoseSRStorage 1.2.840.10008.5.1.4.1.1.88.67 RadiopharmaceuticalRadiationDoseSRStorage 1.2.840.10008.5.1.4.1.1.88.68 ColonCADSRStorage 1.2.840.10008.5.1.4.1.1.88.69 ImplantationPlanSRDocumentStorage 1.2.840.10008.5.1.4.1.1.88.70 AcquisitionContextSRStorage 1.2.840.10008.5.1.4.1.1.88.71 SimplifiedAdultEchoSRStorage 1.2.840.10008.5.1.4.1.1.88.72 PatientRadiationDoseSRStorage 1.2.840.10008.5.1.4.1.1.88.73 PlannedImagingAgentAdministrationSRStorage 1.2.840.10008.5.1.4.1.1.88.74 PerformedImagingAgentAdministrationSRStorage 1.2.840.10008.5.1.4.1.1.88.75
Please note that currently only mandatory and some optional attributes are supported.
Character Encoding
The XML encoding is determined automatically from the DICOM attribute (0008,0005) 'Specific Character Set' using the following mapping:
ASCII (ISO_IR 6) => "UTF-8" UTF-8 "ISO_IR 192" => "UTF-8" ISO Latin 1 "ISO_IR 100" => "ISO-8859-1" ISO Latin 2 "ISO_IR 101" => "ISO-8859-2" ISO Latin 3 "ISO_IR 109" => "ISO-8859-3" ISO Latin 4 "ISO_IR 110" => "ISO-8859-4" ISO Latin 5 "ISO_IR 148" => "ISO-8859-9" Cyrillic "ISO_IR 144" => "ISO-8859-5" Arabic "ISO_IR 127" => "ISO-8859-6" Greek "ISO_IR 126" => "ISO-8859-7" Hebrew "ISO_IR 138" => "ISO-8859-8" Thai "ISO_IR 166" => "TIS-620" Japanese "ISO 2022 IR 13ISO 2022 IR 87" => "ISO-2022-JP" Korean "ISO 2022 IR 6ISO 2022 IR 149" => "ISO-2022-KR" Chinese "ISO 2022 IR 6ISO 2022 IR 58" => "ISO-2022-CN" Chinese "GB18030" => "GB18030" Chinese "GBK" => "GBK"
If this DICOM attribute is missing in the input file, although needed, option --charset-assume can be used to specify an appropriate character set manually (using one of the DICOM defined terms). For reasons of backward compatibility with previous versions of this tool, the following terms are also supported and mapped automatically to the associated DICOM defined terms: latin-1, latin-2, latin-3, latin-4, latin-5, cyrillic, arabic, greek, hebrew.
Option --convert-to-utf8 can be used to convert the DICOM file or data set to UTF-8 encoding prior to the conversion to XML format.
If no mapping is defined and option --convert-to-utf8 is not used, non-ASCII characters and those below #32 are stored as '&#nnn;' where 'nnn' refers to the numeric character code. This might lead to invalid character entity references (such as '' for ESC) and will cause most XML parsers to reject the document.
Error Handling
Please be careful with the processing options --unknown-relationship, --invalid-item-value, --ignore-constraints, --ignore-item-errors and --skip-invalid-items since they disable certain validation checks on the DICOM SR input file and, therefore, might result in non-standard conformant output. However, there might be reasons for using one or more of these options, e.g. in order to read and process an incorrectly encoded SR document.
Limitations
The XML Schema dsr2xml.xsd does not support all variations of the dsr2xml output format. However, the default output format (plus option --use-xml-namespace) should work.
Logging
The level of logging output of the various command line tools and underlying libraries can be specified by the user. By default, only errors and warnings are written to the standard error stream. Using option --verbose also informational messages like processing details are reported. Option --debug can be used to get more details on the internal activity, e.g. for debugging purposes. Other logging levels can be selected using option --log-level. In --quiet mode only fatal errors are reported. In such very severe error events, the application will usually terminate. For more details on the different logging levels, see documentation of module 'oflog'.
In case the logging output should be written to file (optionally with logfile rotation), to syslog (Unix) or the event log (Windows) option --log-config can be used. This configuration file also allows for directing only certain messages to a particular output stream and for filtering certain messages based on the module or application where they are generated. An example configuration file is provided in <etcdir>/logger.cfg.
Command Line
All command line tools use the following notation for parameters: square brackets enclose optional values (0-1), three trailing dots indicate that multiple values are allowed (1-n), a combination of both means 0 to n values.
Command line options are distinguished from parameters by a leading '+' or '-' sign, respectively. Usually, order and position of command line options are arbitrary (i.e. they can appear anywhere). However, if options are mutually exclusive the rightmost appearance is used. This behavior conforms to the standard evaluation rules of common Unix shells.
In addition, one or more command files can be specified using an '@' sign as a prefix to the filename (e.g. @command.txt). Such a command argument is replaced by the content of the corresponding text file (multiple whitespaces are treated as a single separator unless they appear between two quotation marks) prior to any further evaluation. Please note that a command file cannot contain another command file. This simple but effective approach allows one to summarize common combinations of options/parameters and avoids longish and confusing command lines (an example is provided in file <datadir>/dumppat.txt).
Environment
The dsr2xml utility will attempt to load DICOM data dictionaries specified in the DCMDICTPATH environment variable. By default, i.e. if the DCMDICTPATH environment variable is not set, the file <datadir>/dicom.dic will be loaded unless the dictionary is built into the application (default for Windows).
The default behavior should be preferred and the DCMDICTPATH environment variable only used when alternative data dictionaries are required. The DCMDICTPATH environment variable has the same format as the Unix shell PATH variable in that a colon (':') separates entries. On Windows systems, a semicolon (';') is used as a separator. The data dictionary code will attempt to load each file specified in the DCMDICTPATH environment variable. It is an error if no data dictionary can be loaded.
Files
<datadir>/dsr2xml.xsd - XML Schema file
See Also
Copyright
Copyright (C) 2000-2022 by OFFIS e.V., Escherweg 2, 26121 Oldenburg, Germany.