9. Conclusion and future work
In this paper, we presented DBStream, a Data Stream Warehouse (DSW) tailored for, but not limited to, Network Traffic Monitoring and Analysis (NTMA) applications. We have shown, that if instrumented correctly, a PostgreSQL database engine can process large amounts of data in a fast and efficient way. In a performance study, we demonstrated that a single-node instance of DBStream can outperform a cluster of 10 Spark nodes by a factor of 2.6, running the same query workload on the same dataset. The flexibility of DBStream was presented in another application, where it was instrumented to run multiple complex machine learning tasks. The resulting MTRAC approach, based only on the analysis of coarse-grained network descriptors, shows a very high accuracy for the continuous classification of M2M devices in a 3G mobile network. The current design of DBStream is the result of its usage for several NTMA applications and its deployment in a mobile operational network. This experience allowed us to derive useful in sights on how to improve the system to offer increased performance and higher flexibility at the same time. Although current results indicate that DBStream is already very much suited system for typical network monitoring applications, some technical challenges and interesting research questions remain to be solved. For example, we want to investigate the possibility of extending DBStream by replacing the database engine PostgreSQL with either the parallel database system Greenplum [48], or a MapReduce based large-scale data processing framework like, e.g., Spark [16]. Indeed, this would be a logical extension of the current single machine DBStream architecture to a cluster system, thus enabling scale-out properties found in modern big data processing frameworks.