Recently various nonlocal models have been applied for understanding of complex spatially multiscale phenomena like solid mechanics, fluid mechanics, particulate media, directed self-assembly of Block-Copolymer, etc. Nonlocal models have dependence of a single point on a small set of near neighboring points, which hints for data-level parallelism. We utilize a modern parallelization library, HPX, to develop the parallel solver. The goal of the proposed solution is to achieve full distribution of workload starting from mesh partition to computation of fields at mesh nodes. The following goal is achieved step by step, starting from a simple sequential implementation of 1D nonlocal diffusion. The 1D nonlocal diffusion equation is made multi-threaded to enable concurrent processing within a single node and then is extended to a 2D nonlocal diffusion equation. The 2D nonlocal diffusion equation is then made completely distributed to exploit data level parallelism.