4. Conclusions
In this study, a spatial decomposition method to implement the GSD simulation on parallel computers was adopted. At first, performance profiling was conducted to analyze computational hot-spot in the serial scheme. Two schemes were designed to parallelize the simulation, in which MPI and OpenMP were employed. An efficient tridiagonal solver [17] based on the SPIKE algorithm [20,21] was also incorporated into the parallel implementation to handle the most computationally expensive function. Performance analysis and comparison between the two schemes were investigated. The hybrid scheme shows its advantage of implementation ease and the running result demonstrates its speedup, 9.7, on 16 HPCC node-processors. On the other hand, the MPI scheme leads to an improved performance in efficiency and scalability. The strong-scaling experiment proves a high efficiency of 0.93 using 32 HPCC cores, while the isogranular-scaling experiment shows that the efficiency holds up to 0.83 using 64 HPCC cores. To further improve the speedup for symmetric converging shock simulations, symmetric boundary conditions were developed to reduce the problem size considerably. Results have shown that for a dodecagonal converging shock front, a speedup of up to 19.26 can be achieved. Although this study only focuses on converging shock configurations, the parallel schemes can be easily extended to solve more generalized shock setups by changing the boundary conditions. In the future, parallelization on three dimensional GSD model will be investigated.