March 2, 2020
by Hannah Culver / March 2, 2020
When someone asks what type of “shop” your organization is, can you answer confidently that it’s ITIL, DevOps, or SRE?
Maybe some people can, but if you’re a large enterprise, the answer is likely a combination of several of these operating models, especially since SRE has become a key implementation of DevOps. ITIL can work effectively alongside DevOps and SRE principles, though at first glance they appear to be different species.
The trick is to ensure that regardless of your organizations’ different operating models or toolchains, there is shared visibility, communication, and collaboration across teams. This will allow your disparate teams to stay aligned while using the best practices from each methodology.
ITIL stands for information technology infrastructure library, and is a methodology that was developed to create a single source of best practices for information technology. According to Sarah K. White and Lynn Greiner:
“Developed by the British government's Central Computer and Telecommunications Agency (CCTA) during the 1980s, the ITIL first consisted of more than 30 books, developed and released over time, that codified best practices in information technology accumulated from many sources (including vendors' best practices) around the world.”
ITIL has been updated to the fourth version now, and the approach condensed to nine books. While these books reflect the modern technological era, they are still very centrally focused on ITIL’s original core ideals. These ideals include “automating processes, improving service management and integrating the IT department into the business.” ITIL is traditionally a very top-down, highly structured, and process-driven methodology, and it remains one of the most adopted IT frameworks today.
Some of the key practices of ITIL include service catalog and design, monitoring and event management, incident and problem management, release management, configuration management, and more. All of these practices hold regardless of operating model, but they may manifest themselves differently in the context of different architectural needs and workflows. These practices often are valuable even for organizations that strongly identify as either DevOps or SRE shops.
ITSM, or information technology security management, is the process for how a company manages its IT services. This process is very customer oriented and typically contains 5 steps:
ITIL is a framework for implementing ITSM practices. This framework is organization neutral and therefore can be applied to almost all businesses, even if the only customers that IT focuses on are internal ones. As they are so closely linked, it’s no surprise that ITIL and ITSM align on many issues.
According to itiltraining.com:
“There’s a big emphasis on continual improvement. This involves consistently measuring and improving processes, IT services, and IT infrastructure. Doing so maximizes their efficiency, effectiveness, and cost effectiveness.”
When you follow the ITIL process, your focus is on aligning IT with your organization’s business goals. This dovetails well with that DevOps philosophy of breaking down silos throughout the organization. Additionally, by breaking down these silos, you can eliminate bottlenecks in communication, allowing teams to ship features that customers want faster and abide by the CAMS model (culture, automation, measurement, sharing) more closely. But how does this actually look when applied to an organization?
Your organization will probably rely on ITIL as it is closely connected to incident management systems as it provides a structured framework for managing incidents effectively within an organization's IT service management practices. Thus, by seamlessly integrating ITIL incident management systems, that serve as a foundational component of customer support, you will enable your business to promptly address and resolve IT service disruptions, thereby cultivating positive customer experiences and fostering increased sales opportunities. For example, it may make sense to leverage DevOps best practices between development and operations teams, which need to be aligned on workflows, code pushes, automation, and monitoring.
However, when communicating between different parts of the organization that may be running at different speeds, say sales and IT, ITIL practices might come in handy. This graph below gives just a few examples of how the two methodologies might be applied in differing situations:
The result of employing a mixture of ITIL and DevOps best practices is better alignment on organization-wide goals. When IT and the rest of an organization function as totally independent entities, one side will likely always feel overworked and under-supported. In “The Phoenix Project,” a novel looking at a fictional organization’s struggles with IT integration, this becomes a central conflict.
In the book, IT was partially responsible for R&D and sales initiatives succeeding. R&D required accurate data and inventory reporting in order to replenish inventory and go to market with new products in a timely fashion. Sales required a CRM, phone/voicemail, and MRP system that was effective. Otherwise, they went without the ability to add or change customer orders and have no way to manage customer health.
Without cross-functional communication, there was no way to plan for these necessary changes. Instead, departments made unreasonable demands on each other, balls were frequently dropped, and the company revenue tanked.
This conflict was resolved when IT aligned and communicated with the rest of the organization, and other department heads provided high-level buy-in for IT initiatives. By breaking down silos and working together, many of these issues were resolved.
Sometimes, the timing of IT initiatives and business initiatives seem asynchronous. However, by utilizing ITIL and DevOps best practices, organizations can create a cohesive timeline. Below is a graph that shows how these processes can work simultaneously to satisfy the whole organization.
Besides the process improvements, creating alignment between DevOps and ITIL frameworks in your organization also leads to another significant benefit: a shift in mindset. DevOps brings new innovations to the ITIL framework by encouraging shared ownership and continuous improvement.
When organizational silos are minimized, the goals of the organization become the goals of individuals. Everyone owns the success and failure of the business, because they’re all members of the same team, oriented around the same outcomes. Departments are no longer pitted against one another. Continuous improvement becomes a part of the company culture, and failure is celebrated and recognized as a learning opportunity.
Discover: As you navigate along, learn how software reliability is a top priority for your company!
Now that we’ve covered how DevOps and ITIL align, it’s time to talk about how SRE and ITIL align. As SRE is an implementation of DevOps, many of these alignments are similar. It's possible to use the best practices from all three methodologies to help an organization function at peak efficiency. In practice, ITIL and SRE can actually make for a great combination.
The first reason why is simple: every organization wants happy customers, and ITIL and SRE can help different functions work together to make that a reality. Embedding reliability throughout the software lifecycle can ensure a higher rate of customer happiness. With the newest revision of ITIL, which introduces seven guiding principles, SRE and ITIL align even more closely.
Below are the seven tenets of ITIL 4. Let's discuss them in more detail.
Adopting SRE best practices is not one size-fits-all, and everyone starts somewhere. Taking the first steps and implementing and iterating as you go is what matters most.
In the Google SRE book’s chapter on simplicity, it states:
“Unlike just about everything else in life, ‘boring’ is actually a positive attribute when it comes to software! We don’t want our programs to be spontaneous and interesting; we want them to stick to the script and predictably accomplish their business goals.”
Simplicity in both software and business operations streamlines communication, increases velocity, and helps ensure that reliability isn’t compromised. Less is more.
One of the goals of SRE is to automate toil-heavy processes, and free up developer time to focus innovation instead of unplanned work. This optimizes workflows and allows new features to ship faster.
SREs set alerts for the most important and user-centric metrics. The metrics, alerts, and SLOs they’re tied to are all iterated upon to satisfy customer needs.
SRE is culturally collaborative. It focuses on a blameless work culture that values learning from failure, and trusting that each team member is doing what he or she thinks is best for the organization.
Without customers, there is no value in software. Business value is created when customers want, and are getting what they need, from a product. SRE best practices ensure that the product is reliable enough to provide value to the customers, therefore providing value to the organization.
By breaking down silos and focusing on scalability and reliability on a holistic level, SREs are able to provide significant benefits in maturing the organization. Business-wide success is in the hands of every team member, and SREs work to make sure that the company’s product, systems, and procedures are resilient enough to not just meet but exceed customer standards.
One of ITIL’s best practices is coordinated change management overseen by the change authorization board (CAB). However, as noted by partner at Mindbridge Kaimar Karu:
“Having the CAB review every single change request isn't efficient, and it's definitely not common sense, especially when their costs can run to tens of thousands of deployments per hour in some organizations. However, having the CAB review change requests of unknown risk, when parts of the business need to be consulted because they might be impacted, makes a lot of sense.”
SRE can help with this, and its core principles help facilitate far more flexible and rapid change management. On-call practices empower teams to be more accountable around the clock for code in production. Rollbacks can be automated as part of rapid fixes. Incident postmortems facilitate critical learning insights such as SLOs help teams to align on what matters and cut through the exploding complexities of modern service management.
Additionally, error budgets create a guideline for development teams on when it’s safe to ship a new feature. If there is ample room in the error budget, the change is approved, but if the error budget is depleted or nearing depletion, the change is postponed until the next window.
This added flexibility is also inspired by the SRE leadership mindset. Instead of the command-and-control philosophy, SRE recognizes the need for flexibility in a constantly changing environment and focuses on context over control. This means that if a business-critical feature must be shipped, it will be shipped.
While some organizations operate in context of only one of these methodologies, many find that a mixture of the three is the most efficient way to align business and IT goals to create secure, reliable services. Below is a graph of the strengths of each methodology. While they may be based on the same principles and are trying to achieve the same result, the methodologies are in fact different, and very complementary.
ITIL | DevOps | SRE | |
Philosophy & Culture |
Align IT with business needs to create a symbiotic relationship Command-and-control and process-driven to mitigate risk |
Improve teamwork and eliminate silos Aims to create alignment and minimize silos between development and operations Often oriented toward helping teams improve velocity and quality of deploys |
Eliminate toil, design for operability Treats operations as a software problem to maximize efficiency Ideal to support distributed services at scale that need to be hyper-reliable |
Key Practices & Tooling |
Capacity planning Service catalog/CMDB Problem management Change management/ advisory board |
Capacity planning On-call Microservices CI/CD Infra as code Monitoring and logging Comms & collaboration |
Matching DevOps key practices alongside: progressive rollouts, SLOs & error budgets Observability Chaos engineering |
Teamwork | Traditional model of centralized process and visibility. Work is typically queued (‘waterfall’). Incidents routed through central NOC team |
Dev and ops increasingly share the same process and tooling throughout the entire service lifecycle. Typically, this means devs go on-call for what they build, but may engage ops for L2 support |
SREs often act as consultants to establish reliability-oriented practices Software Engineers and SREs’ roles converge, aligning around shared process and outcomes |
Key Measures | Availability, # incidents, # escalations, etc. | Availability, deployment frequency, etc. |
SLOs as well as availability, deployment frequency, etc. Error budgets |
By identifying which practices make the most sense for your team, and with some trial and error, you can find the ultimate combination that ensures your organization will operate at maximum efficiency.
More content: Keep learning. Discover how your company can benefit from a blameless culture.
Hannah Culver is a content writer at Blameless. She shares posts about SRE best practices, DevOps principles, and how to prioritize reliability alongside innovation. You can find more of her work on the Blameless blog.
Information technology is at the forefront of modern businesses.It’s the core element that...
Some professionals view data lineage as the GPS of data.
Storing large amounts of data means finding solutions that work best for your business.
Information technology is at the forefront of modern businesses.It’s the core element that...
Some professionals view data lineage as the GPS of data.