Abstract
The data storage system is central in determining the performance and cost in data mining or ITS. As the computing power of servers has increased so have the problems caused by the bottlenecks from slower storage protocol interfaces, which restrict data throughput and the accessing raw data from the physical storage systems. This paper presented new big data storage architecture to optimize the efficiency of data mining or mass surveillance by integrating a distributed and embedded searching engine inside each storage drive. By integrating the intrinsic search engine (iSearch) into the core controller chip some of the work of searching for patterns and keywords takes place inside the drive freeing up resources of a higher level host and ultimately the server. Only those drives, in which the expected pattern or keywords were detected, are analyzed by the higher level host. Not only does iSearch free up the server for other high level computing tasks it also helps preserve as the bandwidth of the big data storage interface.
1. Introduction
A disk array is used for computer servers or data centers. Increasingly cloud based applications and websites are placing ever greater demands on big storage systems. Intelligent-Transportation System (ITS) is one of the best applications for surveillance and is widely deployed in this fast growing vertical market [1][2]. Today’s data bases are growing at exponential rates and tomorrows data storage systems will need to grow exponential to accommodate them. At the same time, cloud computing is seeking higher data processing abilities. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning [3][4]. ITS or mass surveillance is trying to process huge files composed of video based information rather than that of text, file or document. For example, facer recognition technology is trying to detect and recognize a criminal wanted by the law enforcement among the thousands of people who are walking through a railway station or airport[5-9].
5. Conclusion
A real silicon SSD controller chip was designed with embedded iSearch engines and applied to build a disk array for big data storage system. The search engines were distributed in each SSD unit. A server could issue primary search tasks to these engines and put them into parallel searching inside each drive without transferring data to a server. The preliminary search actions, finished by iSearch engines, helped a server to do the secondary searching more accurately and much less data accessing to the database. Therefore, iSearch engines built in each storage drive can make the data mining or data analysis faster in hardware method. This research work is still in its early stages. The next step is to combine with a real database and its optimized indexes or file systems, which is proven to work well as a software method. The ultimate objective is to find the optimal combination of software and hardware searching methods for the mining of big data.