5. Conclusions
The cell-based AMR on unstructured quadrilateral mesh is realized on GPU in this study. Specifically, we implemented and optimized the well-validated numerical method-VAS2D on GPU: Null memory recycling is added to improve the utilization efficiency of memory; List processing is parallelized on GPU with low frequency atomic operations. In this way, we have made one step further to realize the AMR on GPU. Our work is, to the best of our knowledge, the first unstructured cell-based algorithm that has been fully implemented on GPU. The shock diffraction problem is simulated with the solver running on CPU (Intel E3-1230 V2) and on GPUs (Geforce GT9800 and Tesla C2050) for comparison. The simulation results are consistent with the experimental result, which validates the method implemented on GPU. The non-coalescent memory accessing is a serious problem which drags the performance of the GPU code and is nearly impossible to be solved in the cellbased AMR. However, 4×’s speedup on GT9800 and 15× on C2050 are still achieved by the GPU code to the series code on the CPU E3-1230. With the optimization of configuring a larger L1 cache and adopting Shared Memory based atomic operations, the optimized code gains a 20×’s speedup on the C2050. In the Mesh Adapting part, 2×’s speedup on GT9800 and 18× on Tesla C2050 are obtained by the parallelized algorithms, respectively. As a whole, the considerable speedups show our implementation is successful, and it has proved that running cell-based AMR method on GPU, including the mesh adapting processes, can be practicable and high-efficiency. Our results also indicate that the new development of GPU architecture benefits the fluid dynamics computing significantly.