...
Notice how the table above does not include any UMLS concept for the named entity haemoptisis. With some online searches, "Ii" it came to me as another surprise that it is due to a typographical error. After correcting it to "hemoptysis," , a hit appears in the outcome, as follows:
...
We discussed named entity recognition, which can be displayed as such:
Code Block | ||||
---|---|---|---|---|
| ||||
import scispacy
import spacy
from scispacy.umls_linking import UmlsEntityLinker
from spacy import displacy
nlp = spacy.load("en_ner_bc5cdr_md")
linker = UmlsEntityLinker(resolve_abbreviations=True)
nlp.add_pipe(linker)
text = """
Data collection may include questions about groups of symptoms, such as
GI symptoms (nausea, vomiting, diarrhea)
Cough (non-productive, productive, or haemoptisis)
"""
doc = nlp(text)
entities = doc.ents
for entity in entities:
print(entity.text, entity.start_char, entity.end_char, entity.label_)
for umls_ent in entity._.umls_ents:
# tuple with 2 values
conceptId, score = umls_ent
print(f"Name: {entity}")
print(f"CUI: {conceptId}, Score {score}")
print(linker.umls.cui_to_entity[umls_ent[0]])
print()
colors = {
'CHEMICAL': 'lightpink',
'DISEASE': 'lightorange',
}
# show NER
displacy.serve(doc, style="ent", host="127.0.0.1", options={'colors': colors})
displacy.serve(doc, style="dep", host="127.0.0.1") | ||||
HTML | ||||
<script src="https://bitbucket.cdisc.org/snippets/9817e4728bb1465ea8c338685ea454eb.js"></script> |
At this point, there seems to be a lot of NLP opportunities and applications in standards development. Linkage to UMLS will allow team members to ensure semantic meaning by referencing the curated definition. Quality will increase as I demonstrated how detecting the spelling error was an unintended experience. I can certainly see it has a utility in CDISC 360's biomedical concept authoring. Named entities can be used as keywords or tags in Example Collection.
...