Abstract
Resource virtualization is one of the most prominent characteristics of cloud computing. The placement of virtual machines (VMs) in the physical machines determines the resource utilization efficiency and service quality. Especially for distributed cloud computing, where the data centers (DCs) span a large number of geographical areas and all DCs are connected by high speed internet, the placement of VMs of one big task or of one organization focuses on minimizing the distances and bandwidths between DCs. This minimizes communication latency and improves availability. A data center cluster should be found firstly to accommodate the requested VMs. The purpose is to minimize the maximum inter-DC distance. In contrast to existing method that only considers the distances between data centers, a more efficient clustering based 2-approximation algorithm is developed by taking full use of the topology and the density property of cloud network. The simulation shows the proposed algorithm is especially appropriate for very large scale problems. Then, the requested VMs should be partitioned to the DC cluster, so that the expensive inter-DC bandwidth is saved and the availability is improved. With the introduction of a half communication model, a novel heuristic algorithm which further cuts down the used bandwidths is presented to partition VMs. Its time complexity is reduced to O(n2) by a factor of O(logn) and it runs 3 times faster than the existing method.
1. Introduction
Cloud computing has gained great popularity in recent years for the efficient resource usage and convenient service access [1, 2]. These competitive powers are attributable to the introduction of virtual technology and distributed networking of cloud. Based on the actual standard of virtualization industry, the cores of physical machines (PMs) can be virtualized into more virtual CPUs (vCPUs) [3]. Virtual machines (VMs) can be placed on the granularity of vCPUs and thus gain a more efficient resource utilization. It is also hoped VMs can be deployed closer to the end users in different geographical locations by distributed networking. Distributed cloud consists of a lot of data centers (DCs) and all DCs are connected by high speed internet [4]. Contrary to the counterparts of centralized cloud, distributed DCs have relatively small capability because they are planned according to the less traffic of the dispersed area they locate.
6. Conclusions and future work
By means of the notions of clustering methods, this paper presents a more efficient algorithm, CBMinDia, for the DC selection problem. CBMinDia keeps the 2-approximation property and is more appropriate for large scale DCs or requested VMs. Because the algorithm takes full use of the density and DC capacity information of the network, it cuts off the sub-optimum DCs compared to a rather good feasible solution. The computing effort is greatly decreased and the simulation reveals that it is the most efficient for clustering DC distribution.
For VM partition problem, a slightly more effective algorithm is investigated with the introduction of HCM concept. This algorithm determines an appropriate pair of AOT and AIT for each selection of a VM. The value of AOT and AIT permits maximizing the intra-DC traffic and minimizing the inter-DC traffic. More importantly, the concept can facilitate the convenient selection of VMs by means of simple vector addition and subtraction calculation of the traffic matrix. Hence the time complexity is reduced to O(n 2 ) by a factor of O(log n) and the efficiency is improved about 3 times.