دانلود رایگان مقاله آنالیز شبکه IO در محیط کلان داده بر اساس ظروف داکر

قیمت خرید این محصول

رایگان

دانلود مقاله انگلیسی سفارش ترجمه این مقاله

عنوان فارسی

آنالیز شبکه IO در محیط کلان داده ها بر اساس ظروف داکر

عنوان انگلیسی

Analysis of a Network IO Bottleneck in Big Data Environments Based on Docker Containers

صفحات مقاله فارسی

0

صفحات مقاله انگلیسی

5

سال انتشار

2016

نشریه

الزویر - Elsevier

فرمت مقاله انگلیسی

PDF

کد محصول

E2287

رشته های مرتبط با این مقاله

مهندسی کامپیوتر

گرایش های مرتبط با این مقاله

شبکه های کامپیوتری

مجله

تحقیقات کلان داده ها - Big Data Research

کلمات کلیدی

ظروف، تعویض متن، داکر، هادوپ، نگاشت کاهش

برای سفارش ترجمه این مقاله با کیفیت عالی و در کوتاه ترین زمان ممکن توسط مترجمین مجرب سایت ایران عرضه؛ روی دکمه سبز رنگ کلیک نمایید.

۰.۰ (بدون امتیاز)

امتیاز دهید

چکیده

Abstract

We live in a world increasingly driven by data with more information about individuals, companies and governments available than ever before. Now, every business is powered by Information Technology and generating Big data. Future Business Intelligence can be extracted from the big data. NoSQL [1] and Map-Reduce [2] technologies find an efficient way to store, organize and process the big data using Virtualization and Linux Container (a.k.a. Container) [3] technologies. Provisioning containers on top of virtual machines is a better model for high resource utilization. As the more containers share the same CPU, the context switch latency for each container increases significantly. Such increase leads to a negative impact on the network IO throughput and creates a bottleneck in the big data environments. As part of this paper, we studied container networking and various factors of context switch latency. We evaluate Hadoop benchmarks [5] against the number of containers and virtual machines. We observed a bottleneck where Hadoop [4] cluster throughput is not linear with the number of nodes sharing the same CPU. This bottleneck is due to virtual network layers which adds a significant delay to Round Trip Time (RTT) of data packets. Future work of this paper can be extended to analyze the practical implications of virtual network layers and a solution to improve the performance of big data environments based on containers.

نتیجه گیری

7. Conclusion

We have analyzed the factors that effects performance of the Hadoop cluster. We found that the network IO throughput is inversely proportional to the number of cluster nodes (sharing same CPU) on a VM. If the CPU is busy in the context switching it will add a significant latency to the RTT of the data packets. A raise in the RTT will reduce the throughput of the entire system. Ideally, adding nodes to the Hadoop cluster will improve the throughput of the system linearly. But, beyond a number (cluster nodes running on containers), adding nodes to Hadoop cluster will decrease the performance due to a raise in the RTT of data packets of the containers. Future work of this paper would be on studying the docker network bridge and come up with a solution to improve the network IO throughput of the big data environments. We will study the TCP work flow and identify the overhead caused by the virtual network layers. Then we will try to build a solution to reduce the RTT of the data packets passing through the virtual network layers. The solution would be a counter logic in the docker network bridge and transparent to the containers and virtual machines.

برچسب‌ها: دانلود رایگان مقالات انگلیسی مهندسی کامپیوتر، دانلود رایگان مقالات isi