Mapreduce Overview and its disadvantage

Estimated reading: 2 minutes 901 views

Mapreduce Overview

MapReduce is a distributed computing model proposed by Google.It is mainly used in the search field to solve the calculation problem of massive data. Apache has made an open source implementation of it and integrated it in hadoop to realize general distributed data computing. MR consists of two stages: Map and Reduce.Users only need to implement two functions map() and reduce() to realize distributed computing, which is very simple. Greatly simplifies the development of distributed concurrent processing programs.

Mapreduce Disadvantage

I) Not good at real-time calculation
MapReduce is not suitable for returning results in milliseconds or seconds.


II) Not good at stream
computing The input data of stream computing is dynamic, while the input data set of MapReduce is static and cannot be changed dynamically. This is because the design characteristics of MapReduce itself determine that the data source must be static.

c) Not good at DAG (directed graph) calculation of
dependencies between multiple applications, the input of the latter application is the output of the previous one. In this case, MapReduce is not impossible to do, but after use, the output of each MapReduce job will be written to the disk, which will cause a lot of disk IO, resulting in very low performance.

d) it will process the data inside the RAM but write the intermediate output in the DISK.

Leave a Comment

Share this Doc