Three Rules for SLA Management: Tips for Enterprise SaaS Providers October 1, 2012 While working with a SaaS company that delivers medical information to large pharmacy chains, I was reminded of how Service Level Agreement (SLA) management is a challenging and often overlooked part of delivering Software-as-a-Service (SaaS). This is especially true when these services are aimed at enterprise customers. A number of years ago I wrote a white paper on this issue and thought it would be good to review what I wrote and to reproduce it here, in-part. Too often, SaaS providers realize the necessity of sound SLA management practices too late and are faced with vague, unrealistic, unmanageable, or incompatible SLA commitments and over-exposure to financial penalties. In the consumer-market, SaaS providers may get away with SLAs that create unrealistic expectations and then mitigate risk through fine print and toothless penalty clauses. However, this approach won’t work in enterprise or government markets. In these markets the long term success of a SaaS company will be determined by the quality of its SLA management. The development of a successful SLA requires SaaS providers to do three things: 1) Effectively define, monitor, and report on core service deliverables; 2) Engage in sound risk analysis; 3) Balance the often conflicting demands of different clients against the need for consistency. Define, Monitor, and Report The success of any SaaS SLA depends on the ability to: 1) Define the key service deliverables from the consumer’s perspective; 2) Monitor the availability and performance of these deliverables from the customer’s perspective; 3) Reliably report the resulting SLA metrics in a way that is aligned with SLA commitments. Defining the key service deliverables from the customer’s perspective is an essential first step in building an effective SaaS SLA. These deliverables must be expressed in terms of the user’s fundamental interactions with the service. This may be as simple as the download time of a web page or as complex as the successful completion of a query or form request from a particular region within a defined amount of time during a specified time of day. Unfortunately, too many SLAs refer only to “application” availability or responsiveness. This definition is too broad a deliverable to be meaningful within a SaaS SLA. Rather, the SaaS provider and the customer must agree on a few specific and quantifiable user interactions with the SaaS application that best represent the quality of service that is expected. To be useful, these deliverables must be modeled and ultimately scripted as “synthetic” or “simulated” transactions and then integrated into a global monitoring system with geo-distributed “checkpoints” simulating end-users in various regions (and networks) around the globe. In this way, the simulated transactions generate the necessary data points to quantitatively represent the deliverable, i.e. the availability and responsiveness of the representative user transactions, set forth in the SLA. It is critical for the management and enforcement of the SLA that the collection of and reporting on data points associated with your SLA’s deliverables is straightforward and accessible. It is certain that unless this reporting is automated then the SLA management process will be compromised. It is also important to understand the costs associated with SLA reporting. Enterprise-grade SLA monitoring and reporting services from industry leaders like Keynote and Gomez can be expensive. The more performance and availability metrics you need to report, the more expensive it will be. While these services are excellent for troubleshooting complex performance issues with granularity, there are affordable alternatives on the market, such as Uptrends, that can address the baseline SLA management requirements of most SaaS providers. Special Tip: A safe way forward when building an SLA is to know what metrics your monitoring tools can actually deliver as well as the cost of collecting and reporting on that data. When you know what metrics you can report easily, work backwards to build out your SLA. Risk Analysis Managing risk through cost/benefit analysis is fundamental to crafting a successful SLA. Unfortunately, SaaS providers are often unable to calculate risk effectively because they have no accurate information about the actual IT costs associated with meeting particular SLA objectives. The old rule of thumb is that each additional “nine” of availability (as in 99.9% v. 99.99% v. 99.999%) costs ten times more than the previous one. In reality, there are many factors which influence the cost of availability, performance, and security as they relate to a SLA. These factors include: the structure of the application, use patterns, the maturity of a SaaS provider’s IT Service Management (ITSM) processes, and the capabilities of the underlying technology platform. SaaS providers must analyze these factors and others to determine the most cost effective way to achieve enterprise-class SLA objectives. Understanding the costs of delivering different levels of service is only part of the risk analysis equation facing SaaS providers. To complete the equation the penalties for being offside must also be known. A functional SLA will not leave this an open question. It will manage expectations for both parties up-front — this is when goodwill is at its highest and SLA violations are still theoretical — through explicit and fair penalties. With predictable SLA penalties and a realistic idea of the costs associated with avoiding those penalties in place then better business decisions can be made when analyzing the risk associated with any particular SLA commitment. Balancing Act If the audience for your SaaS application is a large enterprise, the SLA is almost always a part of the contract negotiations. Each “whale” you try to land will want you to cater to their business requirements and established processes. This is a dangerous and potentially costly dynamic. Without a minimum level of consistency, particularly around standard and emergency maintenance windows, you will be locked in a situation where any outage, for any reason, causes a SLA violation. Moreover, it is imperative that your maintenance windows and notification periods align with your service provider’s maintenance policies. Not all providers are the same. SaaS providers should look very closely at their service provider’s SLA(s) and make sure that its policies and procedures are compatible with the SLA requirements of your target customers. At Carbon60 Networks, we work with customers to be as flexible and accommodating as possible and have a standard two week notification period. However, it is impossible for any provider to accommodate all of its customers all the time. A SaaS provider must recognize this reality and make accommodations within their SLAs. The most effective way for SaaS providers to maintain the balancing act between the demands of each enterprise customer and avoid compromising the imperatives of sound SLA management is to be prepared with a well-crafted and fair SLA from day one. Expect to explain to the large enterprise prospect why the SLA is the way it is, why that structure is important to the management of your business and how it impacts the quality of service you can provide. Special Tip: The IT department of a large enterprise is sometimes your best source of support when negotiating SLAs. The “techies” are more likely to sympathize with the technical challenges of managing a complex business application and the legal team will often defer to the IT department’s opinion. SLAs are a very important part of doing business with enterprises and governments when you are a SaaS provider. It is much easier and cheaper to establish strong SLA management early on than suffer the consequences of poor SLA management down the road. SaaS providers must be well prepared out of the gate. Above all else, they must: define, monitor, and automate the reporting of application performance and availability around their core deliverables; do risk analysis by understanding the cost of availability for their application versus the cost of downtime to their business; and not let customers dictate maintenance windows out of sync with each other or the SLAs of their hosting or network service provider(s).