Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Mammoth : gearing Hadoop towards memory-intensive MapReduce applications

Tools
- Tools
+ Tools

Shi, Xuanhua, Chen, Ming, He, Ligang, Xie, Xu, Lu, Lu, Jin, Hai, Chen, Yong and Wu, Song (2015) Mammoth : gearing Hadoop towards memory-intensive MapReduce applications. IEEE Transactions on Parallel and Distributed Systems, 26 (8). pp. 2300-2315. doi:10.1109/TPDS.2014.2345068 ISSN 1045-9219.

[img]
Preview
PDF
WRAP_He_0584410-cs-161214-2014-tpds-mammoth_.pdf - Accepted Version - Requires a PDF viewer.

Download (1729Kb) | Preview
Official URL: http://dx.doi.org/10.1109/TPDS.2014.2345068

Request Changes to record.

Abstract

The MapReduce platform has been widely used for large-scale data processing and analysis recently. It works well if the hardware of a cluster is well configured. However, our survey has indicated that common hardware configurations in small and medium-size enterprises may not be suitable for such tasks. This situation is more challenging for memory-constrained systems, in which the memory is a bottleneck resource compared with the CPU power and thus does not meet the needs of large-scale data processing. The traditional high performance computing (HPC) system is an example of the memory-constrained system according to our survey. In this paper, we have developed Mammoth, a new MapReduce system, which aims to improve MapReduce performance using global memory management. In Mammoth, we design a novel rule-based heuristic to prioritize memory allocation and revocation among execution units (mapper, shuffler, reducer, etc.), to maximize the holistic benefits of the Map/Reduce job when scheduling each memory unit. We have also developed a multi-threaded execution engine, which is based on Hadoop but runs in a single JVM on a node. In the execution engine, we have implemented the algorithm of memory scheduling to realize global memory management, based on which we further developed the techniques such as sequential disk accessing, multi-cache and shuffling from memory, and solved the problem of full garbage collection in the JVM. We have conducted extensive experiments with comparison against the native Hadoop platform. The results show that the Mammoth system can reduce the job execution time by more than 40% in typical cases, without requiring any modifications of the Hadoop programs. When a system is short of memory, Mammoth can improve the performance by up to 5.19 times, as observed for I/O intensive applications, such as PageRank. Given the growing importance of supporting large-scale data processing and analysis and the proven success of the MapReduce platform, the Mammoth system can have a promising potential and impact.

Item Type: Journal Article
Subjects: Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
Divisions: Faculty of Science, Engineering and Medicine > Science > Computer Science
Library of Congress Subject Headings (LCSH): Electronic data processing, Memory management (Computer science)
Journal or Publication Title: IEEE Transactions on Parallel and Distributed Systems
Publisher: IEEE
ISSN: 1045-9219
Official Date: 1 August 2015
Dates:
DateEvent
1 August 2015Published
31 July 2014Available
Volume: 26
Number: 8
Page Range: pp. 2300-2315
DOI: 10.1109/TPDS.2014.2345068
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Restricted or Subscription Access
Date of first compliant deposit: 28 December 2015
Date of first compliant Open Access: 28 December 2015
Funder: Guo jia zi ran ke xue ji jin wei yuan hui (China) [National Natural Science Foundation of China] (NSFC), National Science and Technology Pillar Program, MOE-Intel Special Research Fund of Information Technology, Chinese Universities Scientific Fund
Grant number: 61370104 (NSFC), 61133008 (NSFC), 2012BAH14F02 (Pillar), MOE-INTEL-2012-01
Embodied As: 1

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics

twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us