iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://pubmed.ncbi.nlm.nih.gov/33250149/
Prediction of breast cancer distant recurrence using natural language processing and knowledge-guided convolutional neural network - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov:110:101977.
doi: 10.1016/j.artmed.2020.101977. Epub 2020 Nov 1.

Prediction of breast cancer distant recurrence using natural language processing and knowledge-guided convolutional neural network

Affiliations

Prediction of breast cancer distant recurrence using natural language processing and knowledge-guided convolutional neural network

Hanyin Wang et al. Artif Intell Med. 2020 Nov.

Abstract

Distant recurrence of breast cancer results in high lifetime risks and low 5-year survival rates. Early prediction of distant recurrent breast cancer could facilitate intervention and improve patients' life quality. In this study, we designed an EHR-based predictive model to estimate the distant recurrent probability of breast cancer patients. We studied the pathology reports and progress notes of 6,447 patients who were diagnosed with breast cancer at Northwestern Memorial Hospital between 2001 and 2015. Clinical notes were mapped to Concept unified identifiers (CUI) using natural language processing tools. Bag-of-words and pre-trained embedding were employed to vectorize words and CUI sequences. These features integrated with clinical features from structured data were downstreamed to conventional machine learning classifiers and Knowledge-guided Convolutional Neural Network (K-CNN). The best configuration of our model yielded an AUC of 0.888 and an F1-score of 0.5. Our work provides an automated method to predict breast cancer distant recurrence using natural language processing and deep learning approaches. We expect that through advanced feature engineering, better predictive performance could be achieved.

Keywords: Breast cancer; Distant recurrence; Entity embeddings; Knowledge-guided convolutional neural network; Word embeddings.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Diagram of the workflow. Processing steps are in the diamond boxes; narratives, concepts, and features are in the rectangular boxes. Two major types of configurations are employed in this study, conventional machine learning classifiers and knowledge-guided convolutional neural network (K-CNN). Features are built from free-texted progress notes and pathology reports, as well as structured clinical data. Word vectors and Unified Medical Language System (UMLS) Concept Unique Identifier (CUI) are generated from clinical notes using natural language processing (NLP) techniques. Based on the previous knowledge, a subset of disease-re I a ted CUIs is extracted. Different combinations of word vectors, CUIs, a subset of CUIs, and structured clinical data are fed into various machine learning classifiers for distant recurrence prediction. On the other hand, we generate word embedding and CUI embedding using pre-trained embedding dictionaries. The embedding integrated with structured clinical data is utilized for training and evaluating the K-CNN configuration on breast cancer distant recurrence prediction.
Figure 2:
Figure 2:
Knowledge-guided Convolutional Neural Network. Pre-trained word embeddings and CUI embeddings are first downstreamed to a 1-dimensional (1-D) convolutional layer, followed by a max-pooling layer to select the highest value of each word or CUI embedding. Then, those selected values are concatenated with seven structured clinical features. A fully connected hidden layer is further used, followed by a dropout and ReLU activation layer. Finally, another fully-connected layer with softmax function is used to yield the probability of distant recurrence. 5 different combinations of features are implemented

Similar articles

Cited by

References

    1. W. W.C. R. F. I. for Cancer Research), Diet, nutrition, physical activity and cancer: a global perspective. continuous update project expert report (2018).
    1. DeSantis C, Ma J, Bryan L, Jemal A, Breast cancer statistics, 2013, CA: a cancer journal for clinicians 64 (2014) 52–62. - PubMed
    1. DeSantis C, Siegel R, Bandi P, Jemal A, Breast cancer statistics, 2011, CA: a cancer journal for clinicians 61 (2011) 408–418. - PubMed
    1. Siegel RL, Miller KD, Jemal A, Cancer statistics, 2019, CA: a cancer journal for clinicians 69 (2019) 7–34. - PubMed
    1. Turner J, Hayes S, Reul-Hirche H, Improving the physical status and quality of life of women treated for breast cancer: a pilot study of a structured exercise intervention, Journal of surgical oncology 86 (2004) 141–146. - PubMed