TY - JOUR
T1 - Extracting medical events from clinical records using conditional random fields and parameter tuning for hidden Markov models
AU - Fócil-Arias, Carolina
AU - Sidorov, Grigori
AU - Gelbukh, Alexander
AU - Arce, Fernando
N1 - Publisher Copyright:
© 2018-IOS Press and the authors. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Recently, the extraction of clinical events from unstructured medical texts has attracted much attention of the research community. Machine learning approaches are popular for this task, due to their ability to solve the problem of sequence tagging effectively. It has been suggested previously that simple features, such as word unigrams, part-of-speech tags, chunk tags, among others, are sufficient for this task. We show that more careful preprocessing and feature selection can significantly improve the results. We used conditional random field classifier with more linguistically oriented features and outperformed the current state-of-the-art approaches.We also show that the popular and much simpler Viterbi algorithm (hidden Markov model-based classification algorithm) can produce competitive results, when its parameters are tuned using specific optimization techniques. We evaluate these algorithms for the task of extraction of medical events from the corpus developed for SemEval shared Task 12: Clinical TempEval (Temporal Evaluation) 2016, namely, for its two subtasks: (i) event detection and (ii) event classification based on contextual modality.
AB - Recently, the extraction of clinical events from unstructured medical texts has attracted much attention of the research community. Machine learning approaches are popular for this task, due to their ability to solve the problem of sequence tagging effectively. It has been suggested previously that simple features, such as word unigrams, part-of-speech tags, chunk tags, among others, are sufficient for this task. We show that more careful preprocessing and feature selection can significantly improve the results. We used conditional random field classifier with more linguistically oriented features and outperformed the current state-of-the-art approaches.We also show that the popular and much simpler Viterbi algorithm (hidden Markov model-based classification algorithm) can produce competitive results, when its parameters are tuned using specific optimization techniques. We evaluate these algorithms for the task of extraction of medical events from the corpus developed for SemEval shared Task 12: Clinical TempEval (Temporal Evaluation) 2016, namely, for its two subtasks: (i) event detection and (ii) event classification based on contextual modality.
KW - Clinical reports
KW - Conditional random field
KW - Feature selection
KW - Machine learning
KW - Medical information extraction
KW - Natural language processing
KW - Viterbi algorithm
UR - http://www.scopus.com/inward/record.url?scp=85063505823&partnerID=8YFLogxK
U2 - 10.3233/JIFS-169479
DO - 10.3233/JIFS-169479
M3 - Artículo
SN - 1064-1246
VL - 34
SP - 2935
EP - 2947
JO - Journal of Intelligent and Fuzzy Systems
JF - Journal of Intelligent and Fuzzy Systems
IS - 5
ER -