Tag Archives: shell

Hadoop Combiners

Mapper/Reducer like this:
1.    Your mapper may have gone through the records and output a key; value pair that looked like: day of week; value.
2.    For each day of the week, your reducer kept a running total of the value as well as a count of the number of records.
3.    You divided the total value by the number of records to get the mean.

But there’s a problem here. That second step involves moving a lot of data around your network. What if we could do some of the reduction locally before sending the data to the reducers? Read more of this post

Advertisement