Application 2022-09-10

SLI, SLO, and SLA Explained: A Practical Guide for Engineers

Learn the differences between SLIs, SLOs, and SLAs in site reliability engineering. Understand how to define service level objectives that balance reliability and velocity.

Read in: ja
SLI, SLO, and SLA Explained: A Practical Guide for Engineers

About SLI, SLO, and SLA

This post summarizes various findings about SLI, SLO, and SLA.

What are SLO, SLI, and SLA?

SLO, SLI, and SLA are indicators, objectives, and agreements related to service levels. A service level is a measure of the service provided over a certain period, expressed in a specific way.

How to Set SLI and SLO

NewRelic's proposed best practices are easy to implement and effective.

newrelic.com - Best Practices for Setting SLOs and SLIs for Modern Complex Systems

The method for formulating SLI and SLO is introduced, including defining system boundaries, defining functions for each boundary, defining availability for each function, and defining SLIs for measuring availability.

When starting to operate SLI and SLO, it is recommended to start with simple and loose values.

cf. sre.google - Chapter 4 - Service Level Objectives

When I actually formulated SLI and SLO in my work, I followed this NewRelic practice but adjusted the functional units to avoid becoming too detailed.

If you make the functional units too detailed from the start, it becomes difficult to operate, so I think it's better to adjust the granularity as needed during operation.

Tips

Tips on keywords related to SLI and SLO.

Difference Between Reliability and Availability

List of Uptime and Downtime, Availability Calculation

Uptime Annual Downtime Monthly Downtime
99.0% 87.6 hours 7.6 hours
99.5% 43.8 hours 3.65 hours
99.9% 8.76 hours 43.8 minutes
99.95% 4.38 hours 21.9 minutes
99.99% 52.56 seconds 4.38 minutes
99.999% 5.256 seconds 26.28 seconds
99.9999% 31.536 seconds 2.628 seconds

What is an Error Budget?

An error budget is a permissible reliability indicator calculated based on the SLO. ex. SLO 99.99% β†’ Error Budget less than 0.01%

Impressions

By making service levels measurable, it becomes possible to observe whether the service users (users or systems) are satisfactorily provided with the service, and it can also serve as an indicator for service providers to determine whether service level improvements are necessary.

References

Tags: SLI SLA SLO
Share: 𝕏 Post Facebook Hatena
✏️ View source / Discuss on GitHub
β˜• Support

If you enjoy this blog, consider supporting it. Every bit helps keep it running!


Related Articles