5. Conclusion
This paper proposes a data placement policy (DDP) for map tasks of data locality to allocate data blocks. The Hadoop default data placement strategy is assumed to be applied in a homogeneous environment. In a homogeneous cluster, the Hadoop strategy can make full use of the resources of each node. However, in a heterogeneous environment, a produces load imbalance creates the necessity to spend additional overhead. The proposed DDP algorithm is based on the different computing capacities of nodes to allocate data blocks, thereby improving data locality and reducing the additional overhead to enhance Hadoop performance. Finally in the experiment, for two types of applications, WordCount and Grep, the execution time of the DDP compared with the Hadoop default policy was improved. Regarding WordCount, the DDP can improve by up to 24.7%, with an average improvement of 14.5%. Regarding Grep, the DDP can improve by up to 32.1%, with an average improvement of 23.5%. In the future, we will focus on other types of jobs to improve Hadoop performance.