TY - GEN
T1 - Feature selection to detect botnets using machine learning algorithms
AU - Alejandre, Francisco Villegas
AU - Cortés, Nareli Cruz
AU - Anaya, Eleazar Aguirre
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/4/3
Y1 - 2017/4/3
N2 - In this paper, a novel method to do feature selection to detect botnets at their phase of Command and Control (C&C) is presented. A major problem is that researchers have proposed features based on their expertise, but there is no a method to evaluate these features since some of these features could get a lower detection rate than other. To this aim, we find the feature set based on connections of botnets at their phase of C&C, that maximizes the detection rate of these botnets. A Genetic Algorithm (GA) was used to select the set of features that gives the highest detection rate. We used the machine learning algorithm C4.5, this algorithm did the classification between connections belonging or not to a botnet. The datasets used in this paper were extracted from the repositories ISOT and ISCX. Some tests were done to get the best parameters in a GA and the algorithm C4.5. We also performed experiments in order to obtain the best set of features for each botnet analyzed (specific), and for each type of botnet (general) too. The results are shown at the end of the paper, in which a considerable reduction of features and a higher detection rate than the related work presented were obtained.
AB - In this paper, a novel method to do feature selection to detect botnets at their phase of Command and Control (C&C) is presented. A major problem is that researchers have proposed features based on their expertise, but there is no a method to evaluate these features since some of these features could get a lower detection rate than other. To this aim, we find the feature set based on connections of botnets at their phase of C&C, that maximizes the detection rate of these botnets. A Genetic Algorithm (GA) was used to select the set of features that gives the highest detection rate. We used the machine learning algorithm C4.5, this algorithm did the classification between connections belonging or not to a botnet. The datasets used in this paper were extracted from the repositories ISOT and ISCX. Some tests were done to get the best parameters in a GA and the algorithm C4.5. We also performed experiments in order to obtain the best set of features for each botnet analyzed (specific), and for each type of botnet (general) too. The results are shown at the end of the paper, in which a considerable reduction of features and a higher detection rate than the related work presented were obtained.
KW - Botnet
KW - Feature selection
KW - Machine learning
KW - Malware detection
UR - http://www.scopus.com/inward/record.url?scp=85018939145&partnerID=8YFLogxK
U2 - 10.1109/CONIELECOMP.2017.7891834
DO - 10.1109/CONIELECOMP.2017.7891834
M3 - Contribución a la conferencia
AN - SCOPUS:85018939145
T3 - 2017 International Conference on Electronics, Communications and Computers, CONIELECOMP 2017
BT - 2017 International Conference on Electronics, Communications and Computers, CONIELECOMP 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 27th International Conference on Electronics, Communications and Computers, CONIELECOMP 2017
Y2 - 22 February 2017 through 24 February 2017
ER -