دانلود رایگان مقاله چهارچوب آنالیز داده انعطاف پذیر برای برنامه کاربردی کلان داده

عنوان فارسی
فلکس آنالیتیک: چهارچوب تجزیه و تحلیل داده انعطاف پذیر برای برنامه های کاربردی کلان داده با بهبود عملکرد I / O
عنوان انگلیسی
FlexAnalytics: A Flexible Data Analytics Framework for Big Data Applications with I/O Performance Improvement
صفحات مقاله فارسی
0
صفحات مقاله انگلیسی
10
سال انتشار
2014
نشریه
الزویر - Elsevier
فرمت مقاله انگلیسی
PDF
کد محصول
E419
رشته های مرتبط با این مقاله
مهندسی کامپیوتر
گرایش های مرتبط با این مقاله
مهندسی نرم افزار و برنامه نویسی کامپیوتر
مجله
تحقیقات کلان داده - Big Data Research
دانشگاه
آزمایشگاه ملی آرگون، ایالات متحده آمریکا
کلمات کلیدی
تنگناهای I / O، تجزیه و تحلیل در جا، آماده سازی داده ها؛ کلان داده، محاسبات High-end
چکیده

Abstract


Increasingly larger scale applications are generating an unprecedented amount of data. However, the increasing gap between computation and I/O capacity on High End Computing machines makes a severe bottleneck for data analysis. Instead of moving data from its source to the output storage, in-situ analytics processes output data while simulations are running. However, in-situ data analysis incurs much more computing resource contentions with simulations. Such contentions severely damage the performance of simulation on HPE. Since different data processing strategies have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. In this paper, we explore and analyze several potential data-analytics placement strategies along the I/O path. To find out the best strategy to reduce data movement in given situation, we propose a flexible data analytics (FlexAnalytics) framework in this paper. Based on this framework, a FlexAnalytics prototype system is developed for analytics placement. FlexAnalytics system enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and visualization, as well as for large-scale data transfer. Two use cases – scientific data compression and remote visualization – have been applied in the study to verify the performance of FlexAnalytics. Experimental results demonstrate that FlexAnalytics framework increases data transition bandwidth and improves the application end-to-end transfer performance.

نتیجه گیری

7. Conclusions and future work


In this paper, we studied questions about how to introduce data analytics into HEC to reduce data movement and release severe I/O performance bottleneck. To explore the possible solutions, a quantitative model is built to evaluate the potential analytics algorithms and placement strategies. Based on this model, we propose a flexible analytics framework for I/O performance optimization. This flexible placement strategy combines data compression and visualization query – two algorithms, which can be dynamically selected and switched to achieve the best I/O performance with features profiling and real-time system resource status monitoring. The experiments with the real application of GKW are conducted on an 80-node 1280-core cluster machine and an SGI visualization machine. The analysis of results shows us that the data reduction ratio and available processors are two most important factors to impact the analytics selection. The experiments investigate these two factors in details. Thus, our future work should focus on the improvement of on these two factors.


بدون دیدگاه