دانلود رایگان مقاله طرح زمان کارآمد برای تجزیه و تحلیل پیشرفته بهینه سازی در کلان داده ها

عنوان فارسی
طرح زمان کارآمد برای تجزیه و تحلیل پیشرفته بهینه سازی در کلان داده ها
عنوان انگلیسی
An Efficient Time Optimized Scheme for Progressive Analytics in Big Data
صفحات مقاله فارسی
0
صفحات مقاله انگلیسی
11
سال انتشار
2016
نشریه
الزویر - Elsevier
فرمت مقاله انگلیسی
PDF
کد محصول
E2289
رشته های مرتبط با این مقاله
مهندسی کامپیوتر
گرایش های مرتبط با این مقاله
نرم افزار
مجله
تحقیقات کلان داده ها - Big Data Research
دانشگاه
بخش علوم کامپیوتر، دانشگاه Thessaly، یونان
کلمات کلیدی
کلان داده، نمایش داده شد به طور مداوم، تجزیه و تحلیل ترافیک پیشرفته، مدل های بهینه سازی زمان های متوالی
چکیده

abstract


Big data analytics is the key research subject for future data driven decision making applications. Due to the large amount of data, progressive analytics could provide an efficient way for querying big data clusters. Each cluster contains only a piece of the examined data. Continuous queries over these data sources require intelligent mechanism to result the final outcome (query response) in the minimum time with the maximum performance. A Query Controller (QC) is responsible to manage continuous/sequential queries and return the final outcome to users or applications. In this paper, we propose a mechanism that can be adopted by the QC capable of managing partial results retrieved by a number of processors each one responsible for each cluster. Each processor executes a query over a specific cluster of data. The proposed mechanism adopts two sequential decision making models for handling the incoming partial results. The first model is based on a finite horizon time-optimized model and the second one is based on an infinite horizon optimally scheduled model. We provide mathematical formulations for solving the discussed problem and present simulation results. Through a large number of experiments, we reveal the advantages of the proposed models and give numerical results comparing them with a deterministic model. These results indicate that the proposed models can efficiently reduce the required time for returning the final outcome to the user/application while keeping the quality of the aggregated result at high levels.

نتیجه گیری

5. Conclusions and future work


Progressive analytics can offer many advantages when adopted to manage big data. Such technique could be very efficient, especially when streams of data is the main scenario. In such cases, data are continually updated and, thus, there is not any insight on their form. In this paper, we focus on data parallelism and assume an underlying progressive analytics service. We propose a mechanism for handling responses retrieved by processors querying clusters of data. Each processor adopts a progressive analytics scheme and is responsible to return early (partial) results and a confidence value to our mechanism. We adopt the principles of the Optimal Stopping Theory (OST) and model the behaviour of a Query Controller (QC) responsible to manage multiple queries. We build on top of the processors and provide an intelligent decision making mechanism. Our aim is to alleviate users/applications from the responsibility of monitoring continuous results retrieved by processors and deciding when it is the right time to stop the process in order to save time and resources. Two models are described: the first assumes a finite horizon scheme while the second considers an infinite horizon setting. A large number of experiments reveal the efficiency of the proposed models. We focus on the throughput of the QC when working in a continuous query scenario and on the quality of the final outcome. Through our results, it is revealed that there is a trade off between throughput and the quality of the final outcome. Future extensions of our work include the definition of an intelligent scheme for creating plans and resulting assignments of queries to specific processors. Every query will be assigned to specific processors, probably, a subset of the processors available to the QC. For this, we are going to provide specific models for queries and processors characteristics. Through this approach, the efficiency of the proposed system will be maximized as the appropriate processors will be selected only for those queries that their performance will be the maximum. A learning technique will be also adopted to build an intelligent scheme for assigning queries to processors. For this, modelling the underlying data and the adoption of an algorithm that splits them to the appropriate pieces, in the most efficient way, seem to be imperative.


بدون دیدگاه