UmlsBERT: Augmenting Contextual Embeddings with a Clinical Metathesaurus

UmlsBERT

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

General info

This is the code that was used of the paper : UmlsBERT: Augmenting Contextual Embeddings with a Clinical Metathesaurus (NAACL 2021).

In this work, we introduced UmlsBERT, a contextual embedding model capable of integrating domain knowledge during pre-training. It was trained on biomedical corpora and uses the Unified Medical Language System (UMLS) clinical metathesaurus in two ways:

  • We proposed a new multi-label loss function for the pre-training of the Masked Language Modelling (Masked LM) task of UmlsBERT that considers the connections between medical words using the CUI attribute of UMLS.

cuiumls_updated