Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As a test run, I selected a small paragraph from the work-in-progress COVID-19 Interim User Guide. [3] This is a text about a data collection example for the disease's signs & and symptoms:

Data collection may include questions about groups of symptoms, such as

  • GI symptoms (nausea, vomiting, diarrhea)
  • Cough (non-productive, productive, or haemoptisis)

The next step was to extract named entities by running the signs & and symptoms text through the BC5CDR biomedical model, A named entity is text with a label of name of things. For BC5CDR, the entity types are DISEASE and CHEMICAL. This process is often referred to as named entity recognition (NER). These are the results:

...

Notice how the table above does not include any UMLS concept for the named entity haemoptisis. With some online searches, "Ii" came to me as another surprise that it is due to a typographical error. After correcting it to "hemoptysis,",a hit appears in the outcome, as follows:

EntityCUINameDefinitionDfinitionScore
hemoptysis
C0019079HemoptysisExpectoration or spitting of blood originating from any part of the RESPIRATORY TRACT, usually from hemorrhage in the lung parenchyma (PULMONARY ALVEOLI) and the BRONCHIAL ARTERIES.1.0
hemoptysisC0030424ParagonimiasisInfection with TREMATODA of the genus PARAGONIMUS.0.7546218633651733

Suffice to mention, these CUIs are available on the NCI Metathesaurus. This is the URL template: https://ncim.nci.nih.gov/ncimbrowser/ConceptReport.jsp?dictionary=NCI%20Metathesaurus&code={CUI}

Visualization

spacy has spaCy includes built in visualization constructors to display part-of-speech tags and syntactic dependencies. This The following  graphic is the rendition using the text described above:

...

At this point, there seems to be a lot of NLP opportunities and applications in standards development. Linkage to UMLS will allow team members to ensure semantic meaning by referencing the curated definition. Quality will increase as I demonstrated how detecting the spelling error detection was an unintended experience. I can certainly see it has a utility in CDISC 360's biomedical concept authoring. Named entities can be used as keywords or tags in Example Collection. 

...