Reverse time migration (RTM) in attenuating media should take absorption and dispersion effects into consideration. The latest proposed viscoacoustic wave equation with decoupled fractional Laplacians facilitates separate amplitude compensation and phase correction in -compensated RTM (-RTM). However, intensive computation and enormous storage requirements of -RTM prevent it from being extended into practical application, especially for large-scale 2D or 3D cases. The emerging graphics processing unit (GPU) computing technology, built around a scalable array of multithreaded streaming multiprocessors, presents an opportunity for greatly accelerating -RTM by appropriately exploiting GPUs architectural characteristics. We have developed the cu-RTM, a CUDA-based code package that implements -RTM based on a set of stable and efficient strategies, such as streamed CUDA fast Fourier transform, checkpointing-assisted time-reversal reconstruction, and adaptive stabilization. The cu-RTM code package can run in a multilevel parallelism fashion, either synchronously or asynchronously, to take advantages of all the CPUs and GPUs available, while maintaining impressively good stability and flexibility. We mainly outline the architecture of the cu-RTM code package and some program optimization schemes. The speedup ratio on a single GeForce GTX760 GPU card relative to a single core of Intel Core i5-4460 CPU can reach greater than 80 in a large-scale simulation. The strong scaling property of multi-GPU parallelism is demonstrated by performing -RTM on a Marmousi model with one to six GPU(s) involved. Finally, we further verified the feasibility and efficiency of the cu-RTM on a field data set.