Over the past several months I have seen a tremendous increase in customer and partner interest in infrastructure architectures that exploit Software Defined Networks (SDN), commodity infrastructure deployments (ala. OCP) and an increasing desire for fine grained distributed tenancies (massive tenant scale). For me, many of these requests are harbingers of emerging IT requirements:
- the desire to maximize CapEx return: maximize the utilization of capital [equipment] (ROI or maybe even ROIC)
- the goal of Minimize OpEx: to drive agility and useful business value with a decreasing ratio of expense (Google is rumored to have a 10,000:1 server to administrator ratio), but we must also factor other key elements of cost including Power, WAN networking costs, management licenses, rack/stack expenses, etc . . .
- Value Amplification – the ability to improve overall the ability to exploit clouds for value vs. cost arbitrage
Why the Cloud Hype? CapEx Reductions!
From an CapEx perspective the OCP project helps a company get onto commodity pricing curves that offer both choice for competition on price, but also volume which puts a natural pressure on price. One often overlooked measure of advantage comes from an infrastructure that is suitably “fungible” to allow a maximum number of applications to be fulfilled through composition. Tracking the OpenStack Cloud model, there is an increasing trend to create independent composition of compute (Nova), network (Quantum) and storage (Cinder) in order to maximize the workloads that can be deployed and hopefully minimize the stranding of resource.
We’ll talk about some of the critical technology bits later.
There are obviously many other factors that contribute to capital efficiency and utilization, the efficiency of the storage in the protection of the information, from RAID and erasure encoding to de-duplication. Hadoop’s HDFS implementation is notoriously inefficient with it’s 3 copies of each object in order to fully protect the information. This is certainly an area in which most EMC (Isilon implementation of HDFS), like other storage vendors excel.
Another factor in efficiency in these emergent environments is multi-tenancy. In the server market we’ve traditionally talked about virtualization as providing the ability to stack multiple independent VM’s in a server in a way that allows some resources to be shared, and even oversubscribed. Over-subscription is obviously a technique that allows one to take a finite resource like memory or disk, and mediate access to the resource in a way that the total offered to the tenants exceeds the real availability capacity. We have been doing this for years in storage and call it “thin provisioning” and the idea is to be able to manage the addition of resource to better match actual demand vs. contracted demand.
In many environments, that have high Service Level Objectives (SLOs), we need to be more pessimistic with respect to resource provisioning. Infrastructure is increasingly providing the tools needed to provide substantially higher isolation guarantees, from Intel’s CPU’s that are providing hard resource boundaries at the per-core level, to switch infrastructures that support L2 QoS and even storage strategies in flash and disk that optimize I/O delivery against ever more deterministic boundaries. These capabilities are emerging as critical requirements for cloud providers that anticipate delivering guaranteed performance to their customers.
In a recent paper that the EMC Team produced, we reasoned that Trusted Multi-Tenancy had a number of key requirements detailed in a prior post, but certainly relevant to our discussion around advancing tenant densities “with trust” in more commodity environments.
Reducing the OpEx Target
Operational Expenses do tend to dominate many conversations today we know that power consumption and wan costs are beginning to eclipse manpower as the top expense categories. In fact some studies have suggested the Data Center power infrastructure accounts for ~50% of facilities costs and ~25% of total power costs, obviously power is an issue. I’ll cover this in another post soon, but for the interested, James Hamilton provides an interesting analysis here.
I also believe that there is further a case for looking different at workload selection criteria for placement in advantaged cloud providers dominated by the proximity to critical data ingress, use, and egress points. As I’ve reviewed many hybrid plans, they tend to center on Tier 3 and lower applications in terms of what can be pushed offsite; backup, test & dev and other non-critical work. This move of moving low value work to cheaper providers is certainly driven by cost-arbitrage, and in cases makes sense when it’s really looking for elasticity in the existing infrastructure plant to make headroom for higher priority work. But, I’m seeing a new use-case develop, namely, as the enterprise begins to focus on analyzing customer behavior, sharing information with partners, and even exploiting cloud data sets in social or vertical marketplaces. One can even look at employee mobility and BYOD as an extension as even employee devices enter through 4G networks across the corporate VPN/firewall complex. All of these cases point to information that is generated, interacted and consumed much closer to the Internet Edge (and transitively further from the corporate data center). It is my continued premise that the massive amount of information transacted at the Internet edge generates the need to store, process and deliver information from that location versus the need to backhaul it into the high cost Enterprise fortress.
An Opportunity at the Edge of the Internet
Service Providers similarly see this trend, and are beginning to stand up “trusted clouds” to enable an enterprise’s GRC concerns to be better addressed – removing barriers, but they also recognize the opportunity. As processing and storage costs continue to decrease at a faster rate than transport and routing costs, companies who begin to place information close to the edge and be able to act on it there will benefit from reduced cost of operation. Traditional Content Delivery Network companies like Akamai have been using edge delivery caching for years in order to decrease WAN transport costs, but the emerging models are pointing to a much richer experience in the provider network. The ability to manage data in the network, and bring compute to the data in order to exploit improved availability, location awareness and REDUCED COST.
People and Process
The people and process cost may have the biggest immediate impact to OpEx. The massive success of Converged Infrastructure (CI), such as the VCE VBlock™ certainly point to the value of a homogeneous infrastructure plant for improvements in procurement, management, interoperability, process, and security/compliance as detailed by ESG here. Now today’s vBlocks are exceptionally well suited for low latency transactional I/O systems for applications like Oracle, SAP, Exchange and VDI, basically anything that a business has traditionally prioritized into it’s top tiers of service. The reduction in total operating costs can be terrific, and well detailed elsewhere.
Public Cloud = Commodity Converged Infrastructure at the Edge
What’s exceedingly interesting to me is the web-scale companies like Facebook, Rackspace, Goldman Sachs and though not OCP, eBay, have been looking at the notion of a CI platform for their scale-out workloads and are all focused on data and compute placement throughout the network. These workloads and their web-scale strategies shift the management of non-functional requirements (reliability, availability, scalability, serviceability, manageability) away from the infrastructure and into the application service. In effect, by looking at cheap commodity infrastructure, and new strategies for a control integration which favor out of band asynchronous models to in band synchronous ones smart applications can run at massive scale, over distributed infrastructure and with amazing availability. These are the promises of the new public cloud controllers enabled by Cloud Foundry, OpenStack and Cloud Stack like control planes (InfoWorld’s Oliver Rist, even suggests that OpenStack has become the “new Linux,”). This all sounds scary good right?
Well here’s the rub, this shift from smart infrastructure to smart services almost always requires a re-evaluation of the architecture and technologies associated with a deployment. A simple review of what NetFlix has had to build to manage their availability, scale, performance and cost. In effect, they built their own PaaS on the AWS cloud. This was no small undertaking, and, as Adrian says frequently, is under continuous improvement against specific goals:
- Fast (to develop, deploy and mostly to the consumer),
- Scalable (eliminate DC constraints, no vertical scaling, elastic),
- Available (robust and available past what a data center typically provides – think dial tone, and no downtime – rolling upgrades and the like), and l
- Productive (producing agile products, well structured and layered interfaces)
The net is that the architectural goals that NetFlix put forward, like it’s Web Scale peers forced them to really move to a green field architecture, with green field services built to take on these new objectives. Web Scale services are really only possible on a substantially homogeneous platform built to provide composibility of fungible units of Compute, Network and Storage scale, to enable the application to elastically [de]provision resources across a distributed cloud fabric with consistency . There is an implied requirement of a resource scheduler who is knowledgeable of locations and availability zones, and makes those factors available to the application services to align/control costs and performance against service requirements.
Open Virtual Networks
The emergence of Software Defined Networks, the OpenFlow control strategies and fabric models finally make the transport network a dynamically provisionable and fungible resource. No longer are we constrained by internetwork VLAN strategies which lock you into a single vendor. As Nicira states:
Virtualizing the network “changes the laws of network physics”. Virtual networks allow workload mobility across subnets and availability zones while maintaining L2 adjacency, scalable multi-tenant isolation and the ability to repurpose physical infrastructure on demand. The time it takes to deploy secure applications in the cloud goes from weeks to minutes and the process goes from manual to automatic.
This ability to now virtualize the physical network turns a channel / path into a individually [tenant] controlled resource that can be both dynamic in it’s topology and elastic in it’s capacity. A huge step forward in creating an IaaS that is more fungible than ever before. Google has just started talking publicly about their use of OpenFlow for radical savings and efficiency. In fact, Urs. Hölzle says that “the idea behind this advance is the most significant change in networking in the entire lifetime of Google”.
Summary
When you do allow substantial freedom in leaving some legacy behind, the economics can be remarkable, and certainly not everything has to be jettisoned. But it should be expected that the application and platform architectures will be substantially different. The new opportunities afforded by these distributed cloud technologies and commodity infrastructure advances certainly entice startups with legacy first, but. we must expect enterprises to look to these new architectures. Everyone want to benefit from reductions in their capital and operational expenses, improved service performance, and to getting closer to their partners and customers AND the data behind these interactions.
Oh, and Google just launched IaaS with “Compute Engine”…