...
We discussed named entity recognition, which can be displayed as such:
Code Block | ||||
---|---|---|---|---|
| ||||
import scispacy
import spacy
from scispacy.umls_linking import UmlsEntityLinker
from spacy import displacy
nlp = spacy.load("en_ner_bc5cdr_md")
linker = UmlsEntityLinker(resolve_abbreviations=True)
nlp.add_pipe(linker)
text = """
Data collection may include questions about groups of symptoms, such as
GI symptoms (nausea, vomiting, diarrhea)
Cough (non-productive, productive, or haemoptisis)
"""
doc = nlp(text)
entities = doc.ents
for entity in entities:
print(entity.text, entity.start_char, entity.end_char, entity.label_)
for umls_ent in entity._.umls_ents:
# tuple with 2 values
conceptId, score = umls_ent
print(f"Name: {entity}")
print(f"CUI: {conceptId}, Score {score}")
print(linker.umls.cui_to_entity[umls_ent[0]])
print()
colors = {
'CHEMICAL': 'lightpink',
'DISEASE': 'lightorange',
}
# show NER
displacy.serve(doc, style="ent", host="127.0.0.1", options={'colors': colors})
displacy.serve(doc, style="dep", host="127.0.0.1") | ||||
HTML | ||||
<script src="https://bitbucket.cdisc.org/snippets/9817e4728bb1465ea8c338685ea454eb.js"></script> |
At this point, there seems to be a lot of NLP opportunities and applications in standards development. Linkage to UMLS will allow team members to ensure semantic meaning by referencing the curated definition. Quality will increase as I demonstrated how detecting the spelling error was an unintended experience. I can certainly see it has a utility in CDISC 360's biomedical concept authoring. Named entities can be used as keywords or tags in Example Collection.
...