TY - JOUR
T1 - Improving pattern classification of DNA microarray data by using PCA and Logistic Regression
AU - Ocampo-Vega, Ricardo
AU - Sanchez-Ante, Gildardo
AU - De Luna, Marco A.
AU - Vega, Roberto
AU - Falcón-Morales, Luis E.
AU - Sossa, Humberto
N1 - Publisher Copyright:
© 2016 - IOS Press and the authors. All rights reserved.
PY - 2016/7/13
Y1 - 2016/7/13
N2 - DNA microarrays is a technology that can be used to diagnose cancer and other diseases. To automate the analysis of such data, pattern recognition and machine learning algorithms can be applied. However, the curse of dimensionality is unavoidable: very few samples to train, and many attributes in each sample. As the predictive accuracy of supervised classifiers decays with irrelevant and redundant features, the necessity of a dimensionality reduction process is essential. The main idea is to retain only the genes that are the most influential in the classification of the disease. In this paper, a new methodology based on Principal Component Analysis and Logistics Regression is proposed. Our method enables the selection of particular genes that are relevant for classification. Experiments were run using eight different classifiers on two benchmark datasets: Leukemia and Lymphoma. The results show that our method not only reduces the number of required attributes, but also increase the classification accuracy in more than 10% in all the cases we tested.
AB - DNA microarrays is a technology that can be used to diagnose cancer and other diseases. To automate the analysis of such data, pattern recognition and machine learning algorithms can be applied. However, the curse of dimensionality is unavoidable: very few samples to train, and many attributes in each sample. As the predictive accuracy of supervised classifiers decays with irrelevant and redundant features, the necessity of a dimensionality reduction process is essential. The main idea is to retain only the genes that are the most influential in the classification of the disease. In this paper, a new methodology based on Principal Component Analysis and Logistics Regression is proposed. Our method enables the selection of particular genes that are relevant for classification. Experiments were run using eight different classifiers on two benchmark datasets: Leukemia and Lymphoma. The results show that our method not only reduces the number of required attributes, but also increase the classification accuracy in more than 10% in all the cases we tested.
KW - DNA microarray
KW - Feature reduction
KW - Logistic regression
KW - Principal Component Analysis
UR - http://www.scopus.com/inward/record.url?scp=84983666641&partnerID=8YFLogxK
U2 - 10.3233/IDA-160845
DO - 10.3233/IDA-160845
M3 - Artículo de la conferencia
AN - SCOPUS:84983666641
SN - 1088-467X
VL - 20
SP - S53-S67
JO - Intelligent Data Analysis
JF - Intelligent Data Analysis
IS - s1
T2 - 19th Iberoamerican Congress on Pattern Recognition, CIARP 2014
Y2 - 2 November 2014 through 5 November 2014
ER -