Contents
Download as PDF: This article is a slightly shortened version of my seminar paper. Feel free to download the original PDF version, or the presentation slides.
2. Hybrid Clouds
Public clouds have become increasingly popular in the last few years since they allow almost instant resource provisioning and fast scaling without having to maintain a data center. As one of the first public cloud solutions, Amazon’s EC2 has strongly contributed to this development. However, not only the cloud community is growing rapidly, but also the number of critiques is increasing. Especially in terms of data security and privacy, but also in other topics (such as availability, vendor lock-in, cost, or interoperability), public clouds carry inherent risks.
An obvious yet not trivial solution to these issues is the use of both private and public delivery models, and combining them to a hybrid cloud. This section briefly analyzes the existing opportunities and obstacles of cloud computing — particularly regarding hybrid clouds.
2.1. Opportunities
In a hybrid cloud, a company maintains its own private cloud, i.e. a virtualized data center, and can scale out to a public cloud if needed. Moving from a traditional data center to a hybrid cloud approach brings many benefits to businesses.
- Optimal utilization: in typical data centers, only 5% to 20% of the available server resources are actually used (Armbrust et al., 2009). Because peak loads are up to ten times higher than the average load, servers are mostly idle — generating unnecessary costs. Hybrid clouds can increase server utilization by scaling out to public resources to handle flash crowds.
- Data center consolidation: instead of having to provide capacity for worst-case scenarios, a private cloud only requires resources for the average case. The option to burst out allows server consolidation and hence the reduction of operating costs. In particular, this includes the costs for hardware, power, cooling, maintenance, and administration.
- Risk transfer: while the companies themselves are responsible for keeping their data center and private cloud up and running, the public cloud provider has to ensure a high uptime for its service. Using a hybrid cloud model, “the risk of mis-estimating workload is shifted from the service operator to the cloud vendor” (Armbrust et al., 2009). Most cloud providers have service level agreements ensuring an uptime of more than 99.9% per year, i.e., a downtime of max. 9 hours per year (e.g. Amazon EC2 SLA and Azure SLA).
- Availability: ensuring high availability in the corporate data center is difficult and expensive, because it requires redundancy, backups, and geographic dissemination. Especially in companies where IT is not the core business, the expertise in this area is rather limited. In a hybrid cloud environment, the public cloud can scale up or take over operations completely if the company’s data center is unavailable due to failures or DDoS attacks.
2.2. Challenges and Issues
Even though hybrid clouds offer a great value proposition and enable many opportunities, the number of challenges and issues is also very high. Especially due to its still evolving nature, cloud computing has many unsolved economic and technical issues. The following sections discuss the most important issues briefly. Note: this article only discusses a subset of known obstacles of cloud computing. Extensive lists can be found in Fraser et al., 2009, Armbrust et al., 2009, and Magnus, 2010.
2.2.1. Cost
One of the most obvious obstacles, and certainly the most important one from the business perspective, is the fact that hybrid cloud infrastructures require both a local data center and additional remote resources from a cloud provider. That is, the often mentioned benefit of cloud computing — the independence of a data center — does not hold true for hybrid environments. In fact, hybrid cloud infrastructures have to factor in the setup and operating cost for a data center (e.g. hardware, power, cooling, maintenance) as well as the usage-based costs of the cloud provider. Depending on utilization, data center cost and the costs of the cloud provider, businesses have to decide whether or not moving to the cloud is profitable.
In its technical report Above the Clouds, the UC Berkeley proposes a simple model to compare the expected profits of in-house computing with the profits of using public cloud resources. Even though their model is based on very strong assumptions, it identifies the important key characteristics that influence the decision:
- Pay separately per resource: most applications do not use available resources equally, but rather use one of them extensively. Some applications are CPU intensive, others might be storage or bandwidth-oriented. Depending on the resource-type, external providers might offer better conditions than a local data center can offer.
- Power, cooling and physical plant costs: depending on how expensive the private data center is, local applications have to factor in the costs for power, cooling and other plant expenses.
- Operations costs: cloud environments have lower hardware operations costs, because data centers are virtualized and the risks of outages can be moved to external providers. The operations costs of software management, however, stays the same in IaaS environments and decreases the with an increase in the abstraction level (lower costs in SaaS environments).
- Utilization: profits and costs strongly correlate with the degree of data center utilization. While external cloud providers include operations costs in the usage costs, the local data center costs must be set in relation to the utilization.
Depending on the usable capacity of the local data center and the usage costs of the cloud provider, businesses have to decide how many public resources to use. Even though the presented characteristics can help indicating the amount of capacity to buy from external providers, infrastructure decisions are not only profit driven, but have to consider other factors.
2.2.2. Security and Data Confidentiality
One of these factors is the security of data and information. According to studies conducted by Colt in 2009 (cmp. Magnus, 2010), over 60% of the IT decision makers are still insecure regarding cloud security. Robert Biddle, Professor at the Carleton University in Ottawa, anecdotally depicts security in the cloud like this (Fraser et al., 2009):
“Leslie Lamport described a distributed computing system with the following warning: ‘You know you have one when the crash of a computer you’ve never heard of stops you from getting any work done.’ Nowadays there is an additional risk: a machine you’ve never heard of pretends to be another machine you’ve never heard of, and steals all your data.“
Even though his description is provocatively worded, the message is true for most cloud environments. In fact the potentially dangerous scenarios of cloud computing are numerous.
While servers in a privately owned data center are physically under the control of the IT department, virtual machines inside a cloud are located anywhere in the world and controlled by the cloud provider. Leaving business critical machines to an external instance not only requires solid service level agreements, but also trust in the provider’s capabilities and fidelity. But the physical location of the servers is not the only issue: more problematic is that the server is not part of the company’s network anymore, but of the public Internet. In classical corporate networks, the different IT components such as logon servers, directory services or file servers are hidden in the inner perimeter and shielded with a firewall. Inside a cloud, these mechanisms to secure the machines do not apply, and other security measures have to be installed.
In fact potential attackers do not need to break through the firewall anymore, but can easily access the same network or even the same physical machines by simply renting virtual machines from the cloud provider. While the topology and software of an in-house data center is unknown to outside instances, the cloud reveals its technologies to potential attackers by definition. That is attackers can analyze the server environment, network traffic and possible hypervisor bugs without having to break through the inner perimeter. Even though the VMs on a physical host and their virtual networks (VLANs) are isolated from each other, a bug in the hypervisor can void these security measures. As only barrier between the guest and the host system, the hypervisor is a single point of attack and its correctness is crucial (Magnus, 2010).
In hybrid cloud environments companies can avoid these issues at least to a certain extent: the most obvious solution is to not move any sensitive information to the cloud and hence use public resources only for non-critical calculations and services. Other measures to increase security are for example installing a virtual private network (VPN), or disk encryption. However, even with all the suggested measures, the data is still located in a foreign environment, and a company can never be sure about what happens to them.
2.2.3. Availability
Cloud providers are specialized on providing a scalable and fault-tolerant environment with a good quality of service. To ensure a high uptime, they introduce high-availability systems and maintain several data centers all over the world.
In most cases not having to cope with an HA system is beneficial to businesses because it saves hardware and maintenance costs (cmp. section 2.1). But when the public cloud system is not operating as expected, companies are at the provider’s mercy. System failures and complete outages are not only an issue with small cloud providers, but also hit the global players such as Amazon or Google. Amazon AWS for instance was unreachable several times in its young history, e.g. 48 hours in October 2007, two hours in February 2008, and eight hours in July 2008. The last big outage in December 2009 was caused by a power failure in the data center in North Virgina and lasted for “several hours” (CNET News, 2009, was at: http://news.cnet.com/8301-1009_3-10413951-83.html, site now defunct, July 2019). Another outage in January 2010 only hit a few high-capacity instances and was caused by a routing device in the same data center (Techtarget, 2010).
In cloud supported environments, availability is an even more important issue than it is in traditional data centers. Nils Magnus, author at the Linux Magazin, believes that the total availability decreases with an increase of distributed components. If a system requires all local and remote services to be available to operate properly, using public cloud resources can certainly lower the total uptime (Magnus, 2010). However, if different public cloud providers are used as an alternative, high-availability can still be maintained. In the opinion of the UC Berkeley, “the only plausible solution to very high-availability is using multiple cloud computing providers” (Armbrust et al., 2009).
In regard to hybrid clouds, availability strongly depends on how public resources are integrated in the complete system. In case the public cloud is only used for cloudbursting, i.e. the local resources are extended in peak times, the use of multiple providers can limit the risk significantly. However, if the cloud resources are interweaved with important business processes, the impact of a public cloud failure is considerably higher. An example of the latter case is the Ruby on Rails platform provider Heroku, which just recently had to deal with a crash of approximately 44,000 hosted applications. As a startup company, Heroku is completely dependent of the availability of Amazon’s Elastic Computing Cloud. Due a problem in EC2, the 22 rented virtual machines vanished and had to be redeployed after Amazon fixed the problem (Techtarget, 2010).
2.2.4. Interoperability
Another often discussed issue of current clouds is the fact that the different cloud systems do not work well together. In fact the young age of the concept has led to various incompatible systems that only slowly approach each other in terms of interoperability. Beginning from the hypervisor level up to the application programming interfaces, currently available clouds differ fundamentally.
At the lowest level, clouds consist of interconnected virtualized hosts using hypervisors like Xen, KVM or VMware’s ESX. The hypervisor is responsible for managing the physical hardware and mapping them to several virtual machines. The technologies are developed and maintained by different organizations, and are only compatible to a certain extent. That is a virtual machine that has been created and deployed for one hypervisor does not necessarily run on a different one. Even though the interoperability on the hypervisor level has increased in the last years, there are still different virtual disk and virtual machine file formats. However, the collaborative work of XenSource/Citrix and VMware has pushed the development of common file formats towards standards (e.g. the Open Virtualization Format).
While many hypervisors already support several VM formats, current infrastructure providers do not use this potential interoperability on the next higher level. An example for this compatibility gap is the market’s strongest player Amazon: instead of using its market strength to establish standard APIs and virtual machine formats, Amazon holds on to its EC2 API and Amazon Machine Image (AMI) format. Other competitors such as GoGrid (was at: gogrid.com, site now defunct, July 2019) or ElasticHosts also use their own APIs and virtual machine formats (cf. GoGrid API, was at: http://wiki.gogrid.com/wiki/index.php/API, site now defunct, July 2019), ElasticHosts API), so that switching from one cloud hosting provider to another is currently only possible if virtual machines are converted and API calls are adapted.
To close this gap, the Open Cloud Manifesto defines six basic principles that aim towards establishing and adapting standards where ever possible. While over 300 companies already signed the document (including the several cloud providers such as VMware, Rackspace, or GoGrid), many big names are missing on the list of supporters: Amazon, Google, Salesforce and Microsoft refused to sign the manifesto. All of them have proprietary cloud software or APIs, and are competitors on the cloud market. They intensively advertise their cloud solutions and try to establish their software as de-facto standards (lock-in). Larry Dignan, Editor in Chief of ZDNet, believes that the providers are in an “API war” and that “it is far too early [for them] to sign off on a manifesto when the cloud is still in its infancy“.
The issue of interoperability is particularly important in hybrid cloud environments because they integrate different cloud solutions. While a completely compatible system would allow exchanging cloud providers and VM images transparently, current hybrid cloud toolkits have to deal with numerous existing incompatibilities. Instead of using a standardized API and file format, toolkits usually implement a public interface and translate them to public cloud API calls. VM image formats are currently completely incompatible and cannot be handled by toolkits like Eucalyptus or OpenNebula. That makes migration between private and public cloud impossible, and hence restricts the flexibility of hybrid clouds significantly. Brian J. Dooley, cloud expert and author at Information Management Online, even thinks that the “vision of the hybrid cloud is, at present, a projection. Currently, interoperability is somewhat limited at various points, including at the virtualization hypervisor level; data transfer also remains problematic, as is integration between applications in separate clouds“.
Recent Comments