Neuronal and multiscale simulations are computationally demanding, they require large numbers of very similar calculations. One of the core problems in this domain is to rapidly perform detailed single-neuron calculations. These are typically the bottleneck in detailed multiscale and network models. The central computation is solution of a large, almost tridiagonal matrix representing compartments in a neuron. Individual entries in this matrix require an inner loop to compute current contributions to the compartment. It is a particularly interesting problem to optimize GPU computations for these neuron calculations, since there is a tradeoff between memory transfers and speed of individual GPU cores. Optimizing the GPU code will increase the speedup, when the no of computations are very large, then even a small amount of speedup is very beneficial as it saves a lot of time, so optimizing the GPU code is important.