An intrusion detection scheme for identifying known and unknown web attacks (I-WEB)

[thumbnail of WRAP_Theses_Kamarudin_2018.pdf]
Preview
PDF
WRAP_Theses_Kamarudin_2018.pdf - Submitted Version - Requires a PDF viewer.

Download (47MB) | Preview

Request Changes to record.

Abstract

The number of utilised features could increase the system's computational effort when processing large network traffic. In reality, it is pointless to use all features considering that redundant or irrelevant features would deteriorate the detection performance. Meanwhile, statistical approaches are extensively practised in the Anomaly Based Detection System (ABDS) environment. These statistical techniques do not require any prior knowledge on attack traffic; this advantage has therefore attracted many researchers to employ this method. Nevertheless, the performance is still unsatisfactory since it produces high false detection rates. In recent years, the demand for data mining (DM) techniques in the field of anomaly detection has significantly increased. Even though this approach could distinguish normal and attack behaviour effectively, the performance (true positive, true negative, false positive and false negative) is still not achieving the expected improvement rate. Moreover, the need to re-initiate the whole learning procedure, despite the attack traffic having previously been detected, seems to contribute to the poor system performance.

This study aims to improve the detection of normal and abnormal traffic by determining the prominent features and recognising the outlier data points more precisely. To achieve this objective, the study proposes a novel Intrusion Detection Scheme for Identifying Known and Unknown Web Attacks (I-WEB) which combines various strategies and methods. The proposed I-WEB is divided into three phases namely pre-processing, anomaly detection and post-processing. In the pre-processing phase, the strengths of both filter and wrapper procedures are combined to select the optimal set of features. In the filter, Correlation-based Feature Selection (CFS) is proposed, whereas the Random Forest (RF) classifier is chosen to evaluate feature subsets in wrapper procedures. In the anomaly detection phase, the statistical analysis is used to formulate a normal profile as well as calculate the traffic normality score for every traffic. The threshold measurement is defined using Euclidean Distance (ED) alongside the Chebyshev Inequality Theorem (CIT) with the aim of improving the attack recognition rate by eliminating the set of outlier data points accurately. To improve the attack identification and reduce the misclassification rates that are first detected by statistical analysis, ensemble-learning particularly using a boosting classifier is proposed. This method uses using LogitBoost as the meta-classifier and RF as the base-classifier. Furthermore, verified attack traffic detected by ensemble learning is then extracted and computed as signatures before storing it in the signature library for future identification. This helps to reduce the detection time since similar traffic behaviour will not have to be re-executed in future.

Item Type: Thesis [via Doctoral College] (PhD)
Subjects: Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
Library of Congress Subject Headings (LCSH): Intrusion detection systems (Computer security) -- Statistical methods, Computer networks -- Security measures -- Software, Hacking -- Prevention
Official Date: April 2018
Dates:
Date
Event
April 2018
UNSPECIFIED
Institution: University of Warwick
Theses Department: Warwick Manufacturing Group
Thesis Type: PhD
Publication Status: Unpublished
Supervisor(s)/Advisor: Maple, Carsten ; Watson, Tim ; Safa Sohrabi, Nader
Extent: xviii, 168 leaves : illustrations, charts
Language: eng
URI: https://wrap.warwick.ac.uk/103911/

Export / Share Citation


Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item