Latency Hiding for Host to Device Memory Transfers
- Mentors
- Jon Chesterfield, Johannes Doerfert
- Organization
- The LLVM Compiler Infrastructure
Given the increasing number of use cases for massively parallel devices (GPUs), solving the problems they bring have become an important research field. One of the main problems that needs to be solved is the long time (latency) that it takes to move data from the computer’s main memory to the device’s memory. Therefore, using the LLVM compiler infrastructure, the proposed solution consists of adding a new functionality to the current OpenMP interprocedural optimization pass, OpenMPOpt, such that the OpenMP runtime calls that involve host to device memory transfers are split into “issue” and “wait” functions. The “issue” function will contain the code necessary to transfer the data from the host to the device in an asynchronous manner, returning a handle in which the “wait” function will wait for completion. The “issue” and “wait” functions will be moved upwards and downwards respectively, until it is illegal to do so. Doing this, the instructions between the “issue” and the “wait” can be executed, while separately doing the data transfer to the device, hence, reducing the time the process is blocked waiting for the transfer to finish.