دانلود رایگان مقاله قطری سازی دقیق مدلهای شبکه ای کوانتومی در coprocessors

قیمت خرید این محصول

رایگان

دانلود مقاله انگلیسی سفارش ترجمه این مقاله

عنوان فارسی

قطری سازی دقیق مدلهای شبکه ای کوانتومی در coprocessors

عنوان انگلیسی

Exact diagonalization of quantum lattice models on coprocessors

صفحات مقاله فارسی

صفحات مقاله انگلیسی

سال انتشار

2016

نشریه

الزویر - Elsevier

فرمت مقاله انگلیسی

PDF

کد محصول

E3025

رشته های مرتبط با این مقاله

فیزیک

گرایش های مرتبط با این مقاله

فیزیک کوانتومی

مجله

ارتباطات کامپیوتر و فیزیک - Computer Physics Communications

دانشگاه

دانشگاه آلتو علوم، فنلاند

کلمات کلیدی

اتصال محکم، مدل هابارد قطری دقیق، GPU ،CUDA ،MIC، پردازنده های Xeon فی

برای سفارش ترجمه این مقاله با کیفیت عالی و در کوتاه ترین زمان ممکن توسط مترجمین مجرب سایت ایران عرضه؛ روی دکمه سبز رنگ کلیک نمایید.

۰.۰ (هنوز امتیازی ثبت نشده است)

چکیده

Abstract

We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.

نتیجه گیری

6. Conclusions

We have implemented the Lanczos algorithm to compute the ground state energy of a many-particle quantum lattice model on three platforms: a multi-core Intel Xeon CPU, an Intel Xeon Phi coprocessor and an NVIDIA GPU. The CPU and the Xeon Phi were parallelized with OpenMP, and with only one spin species in the model, the MKL library was used to compute the sparse matrix–vector product in the Lanczos algorithm. With two spin species, a custom OpenMP function was used. The GPU was programmed with CUDA. In the single spin species case, we used the CUSPARSE library and with two spin species we used a custom CUDA kernel. We benchmarked the programs with single and double precision arithmetic in two different lattice geometries: a 1D ring with nearest-neighbor hopping and a checkerboard lattice with hoppings up to the third nearest-neighbor lattice sites. In all cases, the CPU is the fastest of the three platforms when the particle number is very low. With larger particle numbers, the GPU is the fastest, with speedup factors of up to 7.6 compared to the CPU. While the Xeon Phi is never the fastest of the three test platforms, it does outperform the CPU when the particle number is sufficiently high, by up to a speedup of 2.5. This is important, since an existing CPU code can be run on the Xeon Phi with practically no coding effort, resulting in an instant performance gain. All in all, our results indicate that with the current hardware, graphics processors with custom low level kernels offer the best performance in exactly diagonalizing manyparticle quantum lattice models at large system sizes. The Xeon Phi was shown to be a good choice for gaining a significant speedup over an existing multi-core code with very little programming effort.