The Library
Performance optimization for managing massive numbers of small files in distributed file systems
Tools
Fu, Songling, He, Ligang, Huang, Chenlin, Liao, Xiangke and Li, Kenli (2015) Performance optimization for managing massive numbers of small files in distributed file systems. IEEE Transactions on Parallel and Distributed Systems, 26 (12). pp. 3433-3448. doi:10.1109/TPDS.2014.2377720 ISSN 1045-9219.
|
PDF
WRAP_He_0584410-cs-161214-2014-tpds-iflatlfs.pdf - Accepted Version - Requires a PDF viewer. Download (1097Kb) | Preview |
Official URL: http://dx.doi.org/10.1109/TPDS.2014.2377720
Abstract
The processing of massive numbers of small files is a challenge in the design of distributed file systems. Currently, the combined-block-storage approach is prevalent. However, the approach employs the traditional file systems such as ExtFS and may cause inefficiency when accessing small files randomly located in the disk. This paper focuses on optimizing the performance of data servers in accessing massive numbers of small files. We present a Flat Lightweight File System (iFlatLFS) to manage small files, which is based on a simple metadata scheme and a flat storage architecture. iFlatLFS is designed to substitute the traditional file system on data servers and can be deployed underneath distributed file systems that store massive numbers of small files. iFlatLFS can greatly simplify the original data access procedure. The new metadata proposed in this paper occupies only a fraction of the metadata size based on traditional file systems. We have implemented iFlatLFS in CentOS 5.5 and integrated it into an open source Distributed File System (DFS), called Taobao FileSystem (TFS), which is developed by a top B2C service provider, Alibaba, in China and is managing over 28.6 billion small photos. We have conducted extensive experiments to verify the performance of iFlatLFS. The results show that when the file size ranges from 1KB to 64KB, iFlatLFS is faster than Ext4 by 48% and 54% on average for random read and write in the DFS environment, respectively. Moreover, after iFlatLFS is integrated into TFS, iFlatLFS-based TFS is faster than the existing Ext4-based TFS by 45% and 49% on average for random read access and hybrid access (the mix of read and write accesses), respectively.
Item Type: | Journal Article | ||||||
---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||
Library of Congress Subject Headings (LCSH): | File processing (Computer science) | ||||||
Journal or Publication Title: | IEEE Transactions on Parallel and Distributed Systems | ||||||
Publisher: | IEEE | ||||||
ISSN: | 1045-9219 | ||||||
Official Date: | December 2015 | ||||||
Dates: |
|
||||||
Volume: | 26 | ||||||
Number: | 12 | ||||||
Page Range: | pp. 3433-3448 | ||||||
DOI: | 10.1109/TPDS.2014.2377720 | ||||||
Status: | Peer Reviewed | ||||||
Publication Status: | Published | ||||||
Access rights to Published version: | Restricted or Subscription Access | ||||||
Date of first compliant deposit: | 26 April 2016 | ||||||
Date of first compliant Open Access: | 26 April 2016 | ||||||
Funder: | Important National Science & Technology Specific Projects of China (HGJ), Leverhulme Trust (LT) | ||||||
Grant number: | 2013ZX01040-002 (HGJ), RPG-101 (LT) | ||||||
Embodied As: | 1 |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year