Massively parallel applications are in the dawn of our future. The scale of problems and the amount of data available have made it infeasible to solve a meaningful problem in a single computer. As such, in such a scale, it is imperative to combine CPUs and accelerator cards such as GPUs to achieve one unified computing source. HPXCL aims at unifying kernel launch and data transfer into HPX asynchronous graph through seamless integration. However, the current implementation of the system with CUDA fails to achieve the desired performance improvement metrics while running in a cluster of computers. The proposal is aimed at improving the performance of CUDA-based HPXCL applications through an ‘Event’ triggered mechanism. Appropriate tests are also to be written to ensure necessary functionality performs as required. Standard benchmarking algorithms (Floyd Warshall, FFT etc.) will also be implemented in HPXCL, OpenCL and CUDA to validate, measure and improve the performance of the existing system, while also keeping in mind the new functionality added.