Abstract
This paper investigates reactive elasticity in stream processing environments where the performance goal is to analyze large amounts of data with low latency and minimum resources. Working in the context of Apache Storm, we propose an elastic management strategy which modulates the parallelism degree of applications' components while explicitly addressing the hierarchy of execution containers (virtual machines, processes and threads). We show that provisioning the wrong kind of container may lead to performance degradation and propose a solution that provisions the least expensive container (with minimum resources) to increase performance. We describe our monitoring metrics and show how we take into account the specifics of an execution environment. We provide an experimental evaluation with real-world applications which validates the applicability of our approach.
1 INTRODUCTION
BIg data is a challenge in various computing system domains. It is present in IoT with the proliferation of connected devices, grows with the increasing scale of high performance computing systems and is coupled with the expanding Internet and social network activities. It is a major topic in the data intelligence business.
There are two major techniques to process big data: batch processing and stream processing. In batch processing, data is first stored in huge databases and is processed later, usually with scalable programming models such as Google’s MapReduce [1]. However, with the ever growing size of data, the cost of data transfer and storage becomes prohibitive [2], [3]. Moreover, in multiple domains, what is important is not to keep the initial data but to analyze it as fast as possible to produce valuable intelligence [4], [5]. To tackle these issues, stream processing systems put the emphasis on reactivity and analyze data as it is produced. Recent years have seen the emergence of multiple stream processing solutions [6], [7], [8], [9].
7 CONCLUSION
The focus of our paper is on the impact of different execution containers on the performance of an elastic stream processing system. We have explicitly considered the hierarchy of execution containers (machines, processes and threads) and have shown that their provisioning comes at a different cost. More importantly, we have shown that provisioning the wrong type of containers may decrease performance.