Abstract:
Security is synonymous with safety, and as such, is a very important issue, when dealing with life and property. Over the years, various information security measures have been employed to combat Information security threats, but as the measures are being developed, new and more sophisticated threats are emerging. The Honeypot system provides a distraction to prospective perpetrators, luring them into a seemingly vulnerable network to attack, while capturing their mode of operations and all other information that could then be used to create the adequate preventive measures. This thesis offers a methodical approach to the development of a Honeypot-based Intrusion Detection system, using machine learning techniques. This research work, first and foremost, offers detailed background on Network Security and Security threats, as well as several intrusion detection techniques. It also provides detailed information on Honeypots, the various types of Honeypots that have been developed, their strengths and limitations, and their mode of operations.
The Bayesian approach to Intrusion Detection was adapted, and the Naïve Bayes model was used to develop the architectural framework for this research. The Observation Space was a real life network environment, where low-interaction vulnerabilities were arbitrarily introduced over a period of five months, and web server log data was obtained.
Data mining techniques were employed on the data. Platforms for Data Cleaning and Data Discretization were designed and implemented in MS Windows 7 environment, using C# and C++ programming language, running on Visual Studio 2012. Class labels were extracted for each month, using a code written in C++. The problem of imbalanced dataset was adequately handled by removing redundant or duplicate data. The Bayesian Information Criterion model was implemented using C++ and the percentage accuracy was obtained for each month. The data order for the data set is one-step. A prototype experiment was carried out on the same data set, but using two-step data order. The two results were compared, and it was observed that the percentage accuracy was higher in the one-step data order.
The research establishes a Honeypot-based Network Intrusion Detection system implemented using real life Webserver log data and is able to correctly classify the data with high true-positive rates. Hence, it will assist Information Technology Professionals in preventing fraudulent and unauthorized access to critical enterprise information.