Abstract
The unbroken amplification of a versatile urban setup is challenged by huge Big Data processing. Understanding the voluminous data generated in a smart urban environment for decision making is a challenging task. Big Data analytics is performed to obtain useful insights about the massive data. The existing conventional techniques are not suitable to get a useful insight due to the huge volume of data. Big Data analytics has attracted significant attention in the context of large-scale data computation and processing. This paper presents a Hadoop-based architecture to deal with Big Data loading and processing. The proposed architecture is composed of two different modules, i.e., Big Data loading and Big Data processing. The performance and efficiency of data loading is tested to propose a customized methodology for loading Big Data to a distributed and processing platform, i.e., Hadoop. To examine data ingestion into Hadoop, data loading is performed and compared repeatedly against different decisions. The experimental results are recorded for various attributes along with manual and traditional data loading to highlight the efficiency of our proposed solution. On the other hand, the processing is achieved using YARN cluster management framework with specific customization of dynamic scheduling. In addition, the effectiveness of our proposed solution regarding processing and computation is also highlighted and decorated in the context of throughput.
5. Conclusion and Future
Work In this paper, a Hadoop-based smart urban data management is proposed to deal with the problems in Big Data analytics. The projected solution particularly deals with Big Data loading into Hadoop, cluster management and computation. The proposed scheme comprised of Big Data loading and storage in Hadoop file system and Big Data computation and processing. The first part is responsible for transferring and storing the Big Data in Hadoop. The data loading performance and efficiency is tested using our proposed methodology, based on a variety of experiments, to load the Big Data to a distributed and processing platform, i.e., Hadoop. In addition, data loading is performed and compared with different decisions repeatedly and influenced features are examined. The second part of the research deals with data computation and processing. Unlike traditional MapReduce architecture, YARN-based cluster resource management solution is utilized in this research to manage the cluster resources and process the data using MapReduce algorithm separately. YARN is customized with dynamic scheduling. Using Hadoop framework, the proposed architecture is tested with reliable datasets to verify and reveals that the proposed solution offers precious impending into the society development structures to obtain better architectures. In addition, the proposed solution will be used for OFFLINE application as Hadoop only provides offline processing. Moreover, the effectiveness of our proposed scheme with regard to throughput is also highlighted in this paper.