Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Notice how the table above does not include any UMLS concept for the named entity haemoptisis. With some online searches, "Ii" it came to me as another surprise that it is due to a typographical error. After correcting it to "hemoptysis," , a hit appears in the outcome, as follows:

...

We discussed named entity recognition, which can be displayed as such:

Example Code

Code Block
languagepy
linenumberstrue
import scispacy
import spacy
from scispacy.umls_linking import UmlsEntityLinker
from spacy import displacy

nlp = spacy.load("en_ner_bc5cdr_md")

linker = UmlsEntityLinker(resolve_abbreviations=True)
nlp.add_pipe(linker)

text = """
Data collection may include questions about groups of symptoms, such as
  GI symptoms (nausea, vomiting, diarrhea)
  Cough (non-productive, productive, or haemoptisis)
"""

doc = nlp(text)

entities = doc.ents
for entity in entities:
    print(entity.text, entity.start_char, entity.end_char, entity.label_)

    for umls_ent in entity._.umls_ents:
        # tuple with 2 values
        conceptId, score = umls_ent

        print(f"Name: {entity}")
        print(f"CUI: {conceptId}, Score {score}")
        print(linker.umls.cui_to_entity[umls_ent[0]])
        print()

colors = {
    'CHEMICAL': 'lightpink',
    'DISEASE': 'lightorange',
}

# show NER
displacy.serve(doc, style="ent", host="127.0.0.1", options={'colors': colors})
displacy.serve(doc, style="dep", host="127.0.0.1")
HTML
<script src="https://bitbucket.cdisc.org/snippets/9817e4728bb1465ea8c338685ea454eb.js"></script>

The Road Ahead

At this point, there seems to be a lot of NLP opportunities and applications in standards development. Linkage to UMLS will allow team members to ensure semantic meaning by referencing the curated definition. Quality will increase as I demonstrated how detecting the spelling error was an unintended experience. I can certainly see it has a utility in CDISC 360's biomedical concept authoring. Named entities can be used as keywords or tags in Example Collection. 

...