- مبلغ: ۸۶,۰۰۰ تومان
- مبلغ: ۹۱,۰۰۰ تومان
A GPU accelerated inviscid flow solver is developed on an unstructured quadrilateral grid in the present work. For the first time, the cell-based adaptive mesh refinement (AMR) is fully implemented on GPU for the unstructured quadrilateral grid, which greatly reduces the frequency of data exchange between GPU and CPU. Specifically, the AMR is processed with atomic operations to parallelize list operations, and null memory recycling is realized to improve the efficiency of memory utilization. It is found that results obtained by GPUs agree very well with the exact or experimental results in literature. An acceleration ratio of 4 is obtained between the parallel code running on the old GPU GT9800 and the serial code running on E3-1230 V2. With the optimization of configuring a larger L1 cache and adopting Shared Memory based atomic operations on the newer GPU C2050, an acceleration ratio of 20 is achieved. The parallelized cell-based AMR processes have achieved 2x speedup on GT9800 and 18x on Tesla C2050, which demonstrates that parallel running of the cell-based AMR method on GPU is feasible and efficient. Our results also indicate that the new development of GPU architecture benefits the fluid dynamics computing significantly.
The cell-based AMR on unstructured quadrilateral mesh is realized on GPU in this study. Specifically, we implemented and optimized the well-validated numerical method-VAS2D on GPU: Null memory recycling is added to improve the utilization efficiency of memory; List processing is parallelized on GPU with low frequency atomic operations. In this way, we have made one step further to realize the AMR on GPU. Our work is, to the best of our knowledge, the first unstructured cell-based algorithm that has been fully implemented on GPU. The shock diffraction problem is simulated with the solver running on CPU (Intel E3-1230 V2) and on GPUs (Geforce GT9800 and Tesla C2050) for comparison. The simulation results are consistent with the experimental result, which validates the method implemented on GPU. The non-coalescent memory accessing is a serious problem which drags the performance of the GPU code and is nearly impossible to be solved in the cellbased AMR. However, 4×’s speedup on GT9800 and 15× on C2050 are still achieved by the GPU code to the series code on the CPU E3-1230. With the optimization of configuring a larger L1 cache and adopting Shared Memory based atomic operations, the optimized code gains a 20×’s speedup on the C2050. In the Mesh Adapting part, 2×’s speedup on GT9800 and 18× on Tesla C2050 are obtained by the parallelized algorithms, respectively. As a whole, the considerable speedups show our implementation is successful, and it has proved that running cell-based AMR method on GPU, including the mesh adapting processes, can be practicable and high-efficiency. Our results also indicate that the new development of GPU architecture benefits the fluid dynamics computing significantly.