Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: re-read and wordsmithed

...

CDISC Controlled Terminology (CT) maintains a codelist of units of measurement (codelist code C71620, short name UNIT). It is used to represent values for unit variables in various domains, such as demographics (AGEU), concomitant medications (CMDOSU), lab (LBORRESU, LBSTRESU), vital signs (VSORRESU, VSSTRESU). Note: AGEU is a codelist subset of the UNIT superset.

Indiana University School of Medicine's The Regenstrief Institute develops the Unified Code for Units of Measure (UCUM). It is a code system of units intended to be unambiguous to both human and machine. It has many applications in life sciences, such as EHR, and, EDI and HL7 electronic messaging. Logical Observation Identifiers Names and Codes (LOINC) is another code system that incorporates UCUM.

Even though many terms are identical between CDISC and UCUM, there are differences. For example, millimeter of mercury, a unit commonly used to measure blood pressure, is mmHg in CDISC, while mm[Hg] in UCUM. Here are a few additional examples showing differing values:

...

  • All CDISC CT can be retrieved from the NCIt browser
  • NCIt has contains biomedical knowledge from multiple sources, e.g., CDISC, UCUM, SNOMED, etc.
  • Relationships between sources are maintained, where applicable

...

Despite being off to a good start, manual lookup via the NCIt browser would be too tedious to be useful. Upon further research, NCI Center for Biomedical Informatics and Information Technology (CBIIT) publishes the NCIt in OWL/RDF format in a regular basis.

With an OWL/RDF file at our disposal, SPARQL is the tool to get the job donedo some graph-based data analyses.

The following is a snippet from the Thesaurus OWL/RDF file, showing how it represents metadata for the term millimeter of mercury:

...

With the above sample depicting the model, the following is the a SPARQL query for obtaining a list objects having a UCUM mapping:

...

Incidentally, all CDISC CT packages are also available in OWL/RDF. The goal is to reduce the UCUM query above to only the entries found on the UNIT (C71620) codelist. Continuing with the flow, this is a snippet from the CDISC CT OWL/RDF for the same term,  millimeter of mercury code:

Code Block
themeEclipse
languagexml
titleMillimeter of Mercury from sdtm-terminology.owl
linenumberstrue
<CodeList OID="CL.C71620.UNIT" Name="Unit" DataType="text" nciodm:ExtCodeID="C71620" nciodm:CodeListExtensible="Yes">
    <Description>
        <TranslatedText xml:lang="en">Terminology codelist used for units within CDISC.</TranslatedText>
    </Description>
    <EnumeratedItem CodedValue="mmHg" nciodm:ExtCodeID="C49670">
        <nciodm:CDISCSynonym>Millimeter of Mercury</nciodm:CDISCSynonym>
        <nciodm:CDISCDefinition>A unit of pressure equal to 0.001316 atmosphere and equal to the pressure indicated by one millimeter rise of mercury in a barometer at the Earth's surface. (NCI)</nciodm:CDISCDefinition>
        <nciodm:PreferredTerm>Millimeter of Mercury</nciodm:PreferredTerm>
    </EnumeratedItem>
    <nciodm:CDISCSubmissionValue>UNIT</nciodm:CDISCSubmissionValue>
    <nciodm:CDISCSynonym>Unit</nciodm:CDISCSynonym>
    <nciodm:PreferredTerm>CDISC SDTM Unit of Measure Terminology</nciodm:PreferredTerm>
</CodeList>

Unlike the esoteric nature of in the Thesaurus OWL/RDF, the CDISC CT one is very straightforward and readable. With that, here is a SPARQL query to extract information such as submission values and their c-code from the UNIT codelist:

...

The two result sets can be linked via the individual term's c-code. Therefore, the final query is a combination of the two SPARQL queries above, with a slight adjustment to make the nested queries work efficiently. It yields 143 mappings for 130 unique terms.

  • cdisc_ct_ucum.rq - Final SPARQL text that extract UCUM information from Thesaurus and subset it to the UNIT codelist in CDISC CT
  • cdisc_ct_ucum.txt - Result set in tab-delimited format

...

NCI EVS actively maintains a rich repository of terminology and biomedical ontology. Their OWL/RDF offering enables scalable IT solutions to search, link, and combine intricate biomedical concepts. This demonstration illustrates one semantic web technology application. SPARQL made analyzing over 2,160,000 triples (2,100,000 from Thesaurus, 60,000 from CDISC CT for SDTM) with ease. The more UCUM entries curated by NCI EVS, the more mappings will become available.

End Notes

  1. SPARQL as specified by W3C: http://www.w3.org/2009/sparql/wiki/Main_Page
  2. All SPARQL queries and OWL/RDF files were processed using TopQuadrant TopBraid Composer FE Version 4.4.0.
  3. These file versions are used in this demonstration: NCI Thesaurus 14.10d; and, CDISC CT 2014-09-26
  4. URL to download NCI Thesaurus OWL/RDF: http://cbiit.nci.nih.gov/evs-download/thesaurus-downloads
  5. URL to download CDISC CT OWL/RDF for SDTM: http://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.OWL.zip