Anomaly-Based Intrusion Detection: Feature Selection and Normalization Influence to the Machine Learning Models Accuracy

  • Danijela Protić MSc, Center for Applied Mathematics and Electronics, Belgrade, Serbia
  • Miomir Stanković

Abstract

Anomaly-based intrusion detection system detects intrusion to the computer network based on a reference model that has to be able to identify its normal behavior and flag what is not normal. In this process network traffic is classified into two groups by adding different labels to normal and malicious behavior. Main disadvantage of anomaly-based intrusion detection system is necessity to learn the difference between normal and not normal. Another disadvantage is the complexity of datasets which simulate realistic network traffic. Feature selection and normalization can be used to reduce data complexity and decrease processing runtime by selecting a better feature space This paper presents the results of testing the influence of feature selection and instances normalization to the classification performances of k-nearest neighbor, weighted k-nearest neighbor, support vector machines and decision tree models on 10 days records of the Kyoto 2006+ dataset. The data was pre-processed to remove all categorical features from the dataset. The resulting subset contained 17 features. Features containing instances which could not be normalized into the range [-1, 1] have also been removed. The resulting subset consisted of nine features. The feature ‘Label’ categorized network traffic to two classes: normal (1) and malicious (0). The performance metric to evaluate models was accuracy. Proposed method resulted in very high accuracy values with Decision Tree giving highest values for not-normalized and with k-nearest neighbor giving highest values for normalized data.Keywords: feature selection, normalization, k-NN, weighted k-NN, SVM, decision tree, Kyoto 2006+
Published
Dec 29, 2018
How to Cite
PROTIĆ, Danijela; STANKOVIĆ, Miomir. Anomaly-Based Intrusion Detection: Feature Selection and Normalization Influence to the Machine Learning Models Accuracy. European Journal of Formal Sciences and Engineering, [S.l.], v. 1, n. 3, p. 43-48, dec. 2018. ISSN 2601-8683. Available at: <http://journals.euser.org/index.php/ejfe/article/view/4054>. Date accessed: 24 apr. 2019.