| dc.description.abstract |
With the continuous use of cloud and distributed computing, the threats associated with data and
information technology (IT) in such an environment have also increased, thus, information leakage
is a challenge. Information Leakage Prevention (ILP) is a very broad term that covers activities
ranging from identification, discovery, restriction, and prevention of sensitive data from leaving
an organization. Various cases of leakage of sensitive files such as confidential reports and private
documents of customers and staff have been reported to be mistakenly sent via email, leaked
through unprotected USB Sticks and mobile devices. Most of the works done on ILP are simulated
on the Linux operating system instead of the Windows operating system with the largest number
of users. To address the problem of identifying and reacting to insider threats by monitoring and
detecting anomaly behaviours, this work focused on mobile agent-based ILP systems, and
Machine Learning with document types classification and deep-content analysis as they are more
suitable when discussing all the strategies that are possible for corporate executives to prevent data
leakage and information loss. The document files types were divided into different fragments and
each fragment was handled using different classification algorithms, the file features obtained from
Binary Frequency Distribution (BFD), are reduced by Sequential Forward Selection Algorithm
(SFS) and Sequential Floating Forward Selection Algorithm (SFFS) to increased speed and
accuracy. The reduced features was fed to three machine learning algorithm Naïve Bayes, k-
Nearest Neighbor (kNN) and Support Vector Machines (SVM) to shows that there is substantial accuracy in the classification of all files types. The algorithms were used on 21 files types (.apk,
.bin, .bmp, .class, .css, .dll, .doc, .exe, .frm, .htm, .java, .jpg, .js, .mdb, .mp3, .pdf, .php,
.png, .ppt, .txt, and .xls) and were all correctly detected. The result shows that there is substantial
accuracy in the classification of all files types. The Precision values for Naïve Bayes, K-NN, and SVM methods are 87%, 90.5%, and 99.2% respectively thereby showing great accuracy in
classification. SVM has the highest accuracy with an average TPR value of 0.992 and an FP Rate
of 0.001. The primary benefit of Agent-based Information leakage detection and prevention system
lies in the ability to modify and add detection capabilities, modularize the capabilities, and use
such capabilities at the discretion of the central control mechanism. |
en_US |