Universiti Teknologi Malaysia Institutional Repository

Low latency fast data computation scheme for map reduce based clusters

Shabbir, Aisha (2020) Low latency fast data computation scheme for map reduce based clusters. PhD thesis, Universiti Teknologi Malaysia, Faculty of Engineering - School of Computing.

[img]
Preview
PDF
626kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

MapReduce based clusters is an emerging paradigm for big data analytics to scale up and speed up the big data classification, investigation, and processing of the huge volumes, massive and complex data sets. One of the fundamental issues of processing the data in MapReduce clusters is to deal with resource heterogeneity, especially when there is data inter-dependency among the tasks. Secondly, MapReduce runs a job in many phases; the intermediate data traffic and its migration time become a major bottleneck for the computation of jobs which produces a huge intermediate data in the shuffle phase. Further, encountering factors to monitor the critical issue of straggling is necessary because it produces unnecessary delays and poses a serious constraint on the overall performance of the system. Thus, this research aims to provide a low latency fast data computation scheme which introduces three algorithms to handle interdependent task computation among heterogeneous resources, reducing intermediate data traffic with its migration time and monitoring and modelling job straggling factors. This research has developed a Low Latency and Computational Cost based Tasks Scheduling (LLCC-TS) algorithm of interdependent tasks on heterogeneous resources by encountering priority to provide cost-effective resource utilization and reduced makespan. Furthermore, an Aggregation and Partition based Accelerated Intermediate Data Migration (APAIDM) algorithm has been presented to reduce the intermediate data traffic and data migration time in the shuffle phase by using aggregators and custom partitioner. Moreover, MapReduce Total Execution Time Prediction (MTETP) scheme for MapReduce job computation with inclusion of the factors which affect the job computation time has been produced using machine learning technique (linear regression) in order to monitor the job straggling and minimize the latency. LLCCTS algorithm has 66.13%, 22.23%, 43.53%, and 44.74% performance improvement rate over FIFO, improved max-min, SJF and MOS algorithms respectively for makespan time of scheduling of interdependent tasks. The AP-AIDM algorithm scored 66.62% and 48.4% performance improvements in reducing the data migration time over hash basic and conventional aggregation algorithms, respectively. Moreover, an MTETP technique shows the performance improvement in predicting the total job execution time with 20.42% accuracy than the improved HP technique. Thus, the combination of the three algorithms mentioned above provides a low latency fast data computation scheme for MapReduce based clusters.

Item Type:Thesis (PhD)
Uncontrolled Keywords:MapReduce, algorithm, heterogeneous resources, migration time
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:98237
Deposited By: Yanti Mohd Shah
Deposited On:23 Nov 2022 08:06
Last Modified:23 Nov 2022 08:06

Repository Staff Only: item control page