28 December 2014

Saving Money by Investing in Performance: A Financial Model

We know that improving performance can affect revenue in many situations. Performance can also save the business money and reduce costs. Simple financial modeling can show why investing in performance makes business and financial sense! For Example:

A copy of my financial model can be found on google docs

TL;DR

  • We know that improving performance can affect revenue in many situations
  • Performance can also save the business money and reduce costs
  • Simple financial modeling can show why investing in performance makes business and financial sense
  • Example:

  • A copy of my financial model can be found on google docs

Overview

I like to make things simple. Ok, if you’ve skimmed ahead you’re already raising an eyebrow. Please bear with me. I promise that I’ll show you another way to easily convince the business to invest in performance.

When the business asks us why we should care about performance we always point them to the research done by the commerce giants. Performance means revenue! Yea!

The benefits to the business doesn’t have to stop at revenue. Performance can also improve the bottom line. This is especially important in situations where it is harder to prove revenue impacts from performance improvements. Focusing on infrastructure and operational savings can make it easier to convince VIPs to invest in performance (while also having the added benefit of having happier and more productive users)

I’d like to share with you a financial model that I’ve used to show how performance can impact the cost of doing business.

Caveats

As with all things, there always footnotes, caveats and provisos.

First, this is just another tool to demonstrate why performance matters – a tool that compares before & after. Using financial modeling can easily lead you into a rathole. There are always details that you will need to defend. Your objective should be to show directionality, not absolute position. (Let your financial experts in your organization compute the actual numbers.)

Second, I’m going to use shortcuts and generalities. This is based on my experience owning a business, my years managing Infrastructure and Operations and the many conversations I’ve had with other managers of I&O. I use these shortcuts to, again, show directionality. Don’t mistake shortcut to mean inferior. To the contrary! If anything, using the shortcuts will give conservative numbers.

How it Works; When to Use Financial Modeling

The root premise of this financial model is that improving performance per webpage (or per transaction) is generally accomplished by:

  1. reducing number of requests and round trips
  2. reducing the bytes sent
  3. reducing processing time on the back-end

Number 3 is often tightly connected with #1 and 2. That is you are either building out more hardware or you are optimizing what you have. Building out more hardware per interaction doesn’t scale with user growth.

Therefore, this model will work best when you are optimizing backend processes, adding caching layers (back-end, cdn, client) or optimizing user workflows.

In contrast this likely will fall down if you try to use it to argue for optimizations such as leveraging the GPU for client rendering, adding webp support.

Of course, this is just the beginning. There are many other financial models that you should consider. I’ll leave those for another post! For example:

  • how performance increases sales per user (ARPU)
  • how performance increases user growth (CAGR)

Basic Equation

The financial model I use boils down to a basic equation: Cash_Flow = Capital_Exenses + Operational_Expenses That is, how much money do you have to spend to buy new hardware (CapEx) and how much money do you have to spend to keep the hardware working and the electricity flowing (OpEx).

Each year in the model we will add new hardware, which will increase our operational costs.

Later on I might use Max PageView as a proxy for load and will compute the peak CashFlow/PV:

(OpEx + CapEx) / Max_PageView

Once we have these three data points projected, it is as simple as comparing different scenarios. Did your improvements slow the rate of new hardware purchased? Reduce the operating costs? What are the projected costs with and without improved performance?

The tricky part is computing the OpEx and CapEx. Here are the equations that I will be using:

CapEx = Number_of_New_Servers * Average_Server_Cost
OpEx = Number_of_Servers * Average_Server_KVA * CoLo_Cost_per_KVA

Avoiding Funny Numbers

You’re probably thinking about a million variables and inputs I should be using in the numbers above. As I mentioned above, I’m going to stick with generalizations and avoid all the particulars. However, one thing I’m going to stress is that I’m avoiding all the funny numbers – soft costs, contract renegotiations, etc. This will allow you to bypass a 7 week discussion with your procurement about the true cost of your enterprise agreement.

For these reasons, I will intentionally avoid:

  • Costs of User Productivity (This is truest in funny money. Users and staff will be as productive with the time available. You will not get this money back – but you might be able to invest this time in other activities.)

  • Software Licensing costs (the true cost of an enterprise agreement could power an improbability drive – calculating the savings will require the power from a small star)

  • Revenue from selling old hardware

Calculating Capital Expenses (CapEx)

Capital Expenditure is the easiest item to calculate.

We want to make sure we are just capturing the cost of procuring the hardware and getting it installed. Once procured, it is a “Sunk Cost” and can’t be recouperated. A more sophisticated model could turn this into an amortization schedule – but we want to show Cash Flow impacts instead.

CapEx = Number_of_Servers * Average_Cost

To be clear, when we talk about the Average Cost of hardware we should think of it in two ways:

  1. What is the fully loaded cost – more than just the list price of the hardware, but also what it takes to install. Cost to procure, security audits, colo service tickets, rack and stack, etc. That said, if you can’t get the fully loaded cost, be conservative and avoid “guestimation”.
  2. You rarely have a uniform set of hardware. Instead of spending time inventorying your infrastructure use an average cost across hardware. Yes, you might have bigger boxes for db compared to app servers. Use an educated average to make the math simpler.

Some good numbers I’ve used are:

  • $5k for a pizza box
  • $100k for high density compute server

Virtual Servers also require CapEx. You have two options, one is to do the translation of number of VMs per server to actual hardware (don’t forget to factor in vMotion buffer). Or, if your IT group has the cost already computed, you can use the cost per VM. Bottom line: be consistent.

Calculating Operating Expenses (OpEx)

There are many ways to calculate the cost of operations for your application. The easiest way is to look at it in an aggregate view. That is, how many servers in total are used to deliver your app – regardless of the role.

The assumption we start with is that your current infrastructure is necessary to deliver the current level of performance. Increasing user traffic will likewise need to increase your infrastructure proportionally.

Most Co-Location providers these days use a simple billing model of charging only for energy used. The beauty of this model is that it usually includes everything you need for hosting as well. Functionally you can assume for the price of energy you get all the cooling, floor space and bandwidth you need.

Of course, each datacenter is a special snowflake. Don’t get bogged down in the details and keep the formula general. It is better to underestimate than to overestimate or worse, spend 6 weeks and arrive at a similar number.

The equation works out to:

OpEx = Number_of_Servers * KVA_per_Server * KVA_Price

The KVA per server can be the most challenging to calculate. Some hardware manufacturers provide a power calculator for server configurations. Many provide a range of potentials. Here is my recommendation:

  • Don’t try to calculate each piece of hardware. Pick one configuration that is representative
  • If in doubt, use the newest hardware’s power consumption since it will likely be the most efficient
  • If the manufacturer offers a fully loaded power use, use 80% of the value
  • If the manufacturer only offers one power profile – assume it to already be 80% loaded
  • Most hardware reports power as Watts and BTU. Assume a power-factor of 0.9 and use: KVA = Watts / 900
  • When in doubt I use these approximation numbers:
    • 0.5KVA for an average pizza box server
    • 3.5KVA for a 6U high density compute chassis

As I mentioned, most colo providers charge by electricity used and bundle all the other amenities into this price. Like all colo solutions there is a range of offerings from high-end ($0.70/KVA/mo) to low-end ($0.20/KVA/mo). In my experience, I’ve found that the cost to run your own datacenter can be pretty close to the average cost of renting colo space.

You’ll probably have a hard time getting procurement to offer up the price you pay for colo, so to save you the time I’d recommend using a number around $0.50/KVA/mo. This should also be sufficiently padded to account for any MPLS lines or dedicated circuits that your data center might need.

What about IaaS?

So far this model assumes you own or lease your infrastructure. However, if you use IaaS this model will fall short since you don’t own capital and it is pure operational expenses. The tricky part is that the cost of operations is not based on hardware procured but based on utilization. Savings can still be realized and modeled but it requires a slightly different formula and ultimately requires more insight into your cost of operations. This is worthy of a different talk.

PageView and Interaction Cost

A useful model to measure user load on a system is to look the maximum Page Views per second. Consider a Page View requests that return Content-Type: text/html. The principle is that each user ‘interaction’ or ‘transaction’ with your website will return html.

Using PageView isn’t always perfect – especially for single url apps. The goal is to find a metric that everyone can agree on and consistently represents the volume of user activity on your site.

Your current configuration of infrastructure is designed to meet a peak in volume of traffic. That is, you have built it to sustain the peak traffic throughout the year and get by with the least number of user complaints. This peak could be Black Friday or it could be annual performance review time.

Using the maximum page view per second will tell us how much money is spent to maintain this peak traffic:

Interaction_Cost = (OpEx + CapEx) / Page_View_per_Second

Each year, you expect to grow the business. As you grow, you will build more hardware in lock step. If you do nothing to improve the performance, then you should expect the Interaction Cost to remain constant, year over year.

Example:

Let me share an example use of this model – based on real life events.

In this example, let’s assume a retailer, with this configuration:

  • 140 “pizza box” type servers ($5k, 0.5KVA)
  • 400 Page View/s peak
  • 30% YoY growth rate

The problem is that the home page and on key category page account for 40% of the site traffic – all of which cannot be cached by the local varnish or cdn layers and must go back to the datacenter. This is because:

  • html included basic personalization rendered by server side (“Hello Colin”)
  • unique shopper cookie is generated for each request (intrinsic for business KPIs)

You’re probably shaking your head. I know. But this kind of web application is all too common.

Investing in changes to the website html generation would improve the local cache layer and the offload of the cdn – not to mention improve the time to first byte substantially. (To accomplish this, client side javascript could be used to inject the personalization as well as guid generation for shopper tracking. This way the business goals can still be met and now a page is cacheable.)

With even a small TTL (eg: 1min) will result increase the offload from the datacenter by 40%. Assuming we don’t turn off any previously commissioned hardware, we can delay expanding the data center footprint by one year!

The results are more than enough to justify the cost of investment. The best part is that these projected savings don’t include any costs by the infrastructure teams to maintain the growing footprint. Your I/O teams will approve!

Plugging the numbers in we can see the cash spent this year (year 0) and the projected cash flow for next year if we don’t make the changes. In contrast we now have shown how our performance improvement will impact the total cost of ownership

Looking at it another way, we can see that the cost per interaction also decreases.

How can I use this model?

The example I gave shows how increasing caching impacts operations but it doesn’t stop there. Any change you make that makes applications more efficient, takes advantage of caches, reduces number of requests or makes requests smaller will impact the cost to the business. Using this simple model we can project the financial impact of those performance improvements.

This is only the beginning. I believe that there are many other financial models that can be used to help convince the business that performance matters!

A copy of my financial model can be found on google docs