- مبلغ: ۸۶,۰۰۰ تومان
- مبلغ: ۹۱,۰۰۰ تومان
In the last decade, many systems for the extraction of operational statistics from computer network interconnects have been designed and implemented. Those systems generate huge amounts of data of various formats and in various granularities, from packet level to statistics about whole flows. In addition, the complexity of Internet services has increased drastically with the introduction of cloud infrastructures, Content Delivery Networks (CDNs) and mobile Internet usage, and complexity will continue to increase in the future with the rise of Machine-to-Machine communication and ubiquitous wearable devices. Therefore, current and future network monitoring frameworks cannot rely only on information gathered at a single network interconnect, but must consolidate information from various vantage points distributed across the network.In this paper, we present DBStream, a holistic approach to large-scale network monitoring and analysis applications. After a precise system introduction, we show how its Continuous Execution Language (CEL) can be used to automate several data processing and analysis tasks typical for monitoring operational ISP networks. We discuss the performance of DBStream as compared to MapReduce processing engines and show how intelligent job scheduling can increase its performance even further. Furthermore, we show the versatility of DBStream by explaining how it has been integrated to import and process data from two passive network monitoring systems, namely METAWIN and Tstat. Finally, multiple examples of network monitoring applications are given, ranging from simple statistical analysis to more complex traffic classification tasks applying machine learning techniques using the Weka toolkit.
9. Conclusion and future work
In this paper, we presented DBStream, a Data Stream Warehouse (DSW) tailored for, but not limited to, Network Traffic Monitoring and Analysis (NTMA) applications. We have shown, that if instrumented correctly, a PostgreSQL database engine can process large amounts of data in a fast and efficient way. In a performance study, we demonstrated that a single-node instance of DBStream can outperform a cluster of 10 Spark nodes by a factor of 2.6, running the same query workload on the same dataset. The flexibility of DBStream was presented in another application, where it was instrumented to run multiple complex machine learning tasks. The resulting MTRAC approach, based only on the analysis of coarse-grained network descriptors, shows a very high accuracy for the continuous classification of M2M devices in a 3G mobile network. The current design of DBStream is the result of its usage for several NTMA applications and its deployment in a mobile operational network. This experience allowed us to derive useful in sights on how to improve the system to offer increased performance and higher flexibility at the same time. Although current results indicate that DBStream is already very much suited system for typical network monitoring applications, some technical challenges and interesting research questions remain to be solved. For example, we want to investigate the possibility of extending DBStream by replacing the database engine PostgreSQL with either the parallel database system Greenplum , or a MapReduce based large-scale data processing framework like, e.g., Spark . Indeed, this would be a logical extension of the current single machine DBStream architecture to a cluster system, thus enabling scale-out properties found in modern big data processing frameworks.