- مبلغ: ۸۶,۰۰۰ تومان
- مبلغ: ۹۱,۰۰۰ تومان
Recently, Google revealed that it has replaced the 10-year old MapReduce with its new systems (e.g., DataFlow) which can provide better performances and support more sophisticated applications. Simultaneously, other new systems, such as Spark, Impala and epiC, are also being developed to handle new requirements for big data processing. The fact shows that since their emergence, big data techniques are changing very fast. In this paper, we use our experience in developing and maintaining the information security system for Netease as an example to illustrate how those big data systems evolve. In particular, our first version is a Hadoop-based offline detection system, which is soon replaced by a more flexible online streaming system. Our ongoing work is to build a generic real-time analytic system for Netease to handle various jobs such as email spam detection, user pattern mining, game log analysis, etc. The example shows how the requirements of users (e.g., Netease and its clients) affect the design of big data system and drive the advance of technologies. Based on our experience, we also propose some key design factors and challenges for future big data systems.
6. Conclusions and open problems
In this paper, we use the information security system in Netease as an example to illustrate how big data system evolves when users’ requirements keep changing. We start from an of- fline Hadoop system to an online streaming system. Finally, we want to design a generic system that can provide near real-time analytic services for many Netease applications, such as spam detection, game log analysis and social community mining. Based on our experiences, no solution can address all big data problems, especially when 1) data size keeps increasing; 2) more complex user requirements need to be handled; 3) the emergence of new hardware violates the old design; and 4) the old system becomes too complicated for maintenance.