Posted on 2008-02-15 19:51 ZelluX
阅读(429) 评论(0) 编辑 收藏
Mars: A MapReduce Framework on Graphics Processors
by Bingsheng He @ Hong Kong Univ. of Sci. & Tech.
Nage K. Govindaraju @ Microsoft Corp.
Qiong Luo, Tuyong Wang @ Sina Corp.
Three challenges in implementing the MapReduce framework on the GPU:
First, the synchronization overhead in the run-time system of the framework must be low.
Second, a fine-grained load balancing scheme is required.
Third, the core tasks of MapReduce, including string processing, file manipulation and concurrent reads and writes, are unconventional to GPUs and must be handled efficiently.
Each thread is responsible for a Map or a Reduce task with a small number of key/value pairs as input.
Performance improvement: 1.5-16 times
2. Priliminary and Related Work
2.1. Graphics Processors
It is desirable to schedule the tasks between the CPU and the GPU to fully exploit their computation power.
Given a kernel program, the occupancy of the GPU is the ratio of active schedule units to the maximum number of schedule units supported on the GPU.
The GPU has a hardware feature called coalesced access to exploit the spatial locality of memory accesses among threads.
Map: (k1, v1) -> (k2, v2)*
Reduce: (k2, v2*) -> v3*
3. Design and Immplementation
3.1. Design Goals
3.2. System Workflow and Configuration
3.4. Implementation Techniques
Based on this compilation information and the total computation resources on the GPU, we set the number of threads per thread group and the number of thread groups to achieve a high occupancy at run time.
4.1. Experimental Setup