Abstract:
The Fourth Industrial Revolution is driven by cloud computing for increased productivity and efficiency. Despite the influence of cost-effectiveness, on-demand service and scalability, cloud computing is faced with many challenges such as security, performance and fault tolerance. Very few related works have contributed to fault tolerance in real-time cloud, characterized by congestion and delay, which have effect on consumers’ waiting time and rise in service providers’ cost. This research develops a reactive mitigation model using two fault tolerance techniques which are checkpointing and replication to overcome the challenges. The developed model has four components: client/user, controller, fault tolerance (hybrid) and virtual machines layers. In the user layer, user submits a request R with quality of service of parameter S via cloud clients to the controller layer, housing the service manager (Sm) and the scheduler module.
A virtual machine (VM) scheduler gets the list of stable VMs to handle the user’s request by accessing the database and performs reliability checks on the VMs based on their probability failure rate. This leads to simulation that ascertains zero level system execution failure using replication and checkpointing in the fault tolerance module on Cloud simulator platform. Various numbers of VMs are used, the results reveal that simulating with three VMi, i = 0, 1, 2; VM0 executes fourteen jobs, VM1 executes seven jobs while VM2 executes four jobs with an average time of 10.0236 seconds while the average computational cost was 32.1969. The second simulation at 50 MIPS (Million Instructions Per Second) with increased simulation condition to ten VMi, i = 0, 1, …, 9; 25 Checkpoints and 50 Requests records 41 successful executions and 10 failed executions with success percentage of 80.39% and 19.61% failure. The simulation average time is 13.4344 seconds and the average computational cost is 42.3052. The third simulation is carried out on 100 MIPS, 20 VMi, i = 0, 1, …, 19; 25 Checkpoints and 100 Requests. The average time of execution is 14.1034 seconds and the average computational cost
is 44.1830. The percentage of successful executions is 80.20% while that of failed executions is 19.80%. The conditions for the fourth simulation are set to 1000 MIPS, fifty VMi, i = 0, 1, …, 49; 50 Checkpoints and 1000 Requests. The average time of execution is 15.7763 seconds while the average computational cost is 49.1742. A total of 798 tasks are successfully executed, while 203 failed. The percentage of successful and failed executions are 79.72% and 20.28% respectively. The functionalities of the developed model are compared to the existing system using the same simulation condition. The results show a relative improvement over the existing system. Furthermore, it is observed that addressing the failure issue in real-time cloud systems yields failure rates reduction with reduced response time and computational cost. Simulations show that the developed system is able to react to faults by transferring requests to stable VMs and resuming task execution at the last checkpoint, thereby recording reduced waiting time while executing tasks. The developed model proves to work efficiently under various simulation conditions.