Highlight Report

PagerDuty unifies IT operations with a blend of insights and automation

Home » Research & Insights » PagerDuty unifies IT operations with a blend of insights and automation
The Situation: The IT operations technology landscape is highly fragmented, providing a significant challenge for organizations becoming cloud native. While many AIOps and observability toolsets provide insights, only a handful of vendors blend those insights with action and automation. Yet, those blended capabilities are central to operationalizing the OneOffice.

The IT operations space is as fascinating as it is confusing. On the one hand, organizations have deep investments in this space, and the journey toward cloud native is a central component of services delivery. On the other hand, the vendor landscape is immensely fragmented, capabilities are confined to silos and specific domains, and marketing is full of engineering jargon. Unlike in business operations, where RPA (robotic process automation) and process intelligence providers invest deeply in marketing and market education, there is no obvious thought leader educating the market on the IT side of the house. Against this background, how can operations leaders advance strategies to get closer to enterprise-wide monitoring and automation?

Expanding from incident management to a OneOffice-centric operations cloud allows organizations to manage the complexity of cloud native

PagerDuty aims to evolve from a category leader in incident management toward a holistic operations platform blending actionable intelligence and automation. It accelerated its corporate development when it added automation capabilities through the Rundeck acquisition. Rundeck provides PagerDuty with runbooks for DevOps scenarios and requirements. Rundeck’s $100 million price tag reflects the strategic importance of automation capabilities, given PagerDuty’s Q3 FY22 earnings came in at $72 million.

As the “PagerDuty” name implies, the platform offers incident management capabilities, including live call routing, self-serve schedule management, and automated escalations. When the company was founded in 2009, those activities were prompted by pagers that engineers carried to receive messages. The core activities of PagerDuty’s incident management approach are

  • Triaging incidents, problems, and changes and answering questions such as, “What does a problem really mean? Is this an important value-driving service? Is the request urgent or not?”
  • Mobilizing the right individuals and teams to act, especially in DevOps scenarios while also notifying others across the business who need to be informed.
  • Resolving incidents through auto-remediation before engaging a responder and providing common diagnostic and remediation tasks to responders during incidents.
  • Preventing incidents, with self-healing as the North Star.
Investing in an operations ecosystem while blending actionable insights with automation provides the differentiation from AIOps for PagerDuty

Two things set PagerDuty apart. First, it operationalizes the cloud-native journey by integrating its platform with an expansive set of operations tools. Second, it aims to overcome the fragmented toolset landscape by combining insights from those tools with concrete actions such as auto-remediation and automation.

It focuses on critical event management, such as incident management, especially in cloud-native applications. The primary focus is applying machine learning to observability data, thus correlating across multiple observability vendors to give responders a single pane of glass on operational issues. To provide a sense of the platform’s scale, PagerDuty has more than 650 integrations spanning legacy applications and cloud-native apps such as ServiceNow and DataDog. Consequently, clients get a holistic service experience from a broad operational ecosystem rather than PagerDuty supplanting existing investments.

On the one hand, this provides the foundation for overcoming operational tool fragmentation. On the other hand, it allows PagerDuty to depict the interdependency of operational services in a service graph that, for instance, captures custom contextual markup occurring on incidents. It also allows the responder to click on events to better understand historical and contextual information. This is critical for overcoming the operational challenges of managing cloud-native applications. When organizations run their applications containerized, understanding the interdependencies of those complex building blocks is critical.

Exhibit 1: PagerDuty’s AIOps ecosystem mindset demonstrates deep integration with cloud-native operations tools

Source: PagerDuty, 2022

Aligned with PagerDuty’s ambition to progress toward a holistic operations platform is its ecosystem mindset. Exhibit 1 outlines its AIOps architecture, where the fundamental idea is to enable IT operations and DevOps teams through incident management and problem-solving. The differentiator to the many AIOps and observability tools is the ability to provide both actionable insights and action and automation on those insights. Only a few companies, like IPsoft, TCS/Digitate, and Resolve Systems, have similar capabilities. Customers tend to benchmark PagerDuty more with incident management specialists like xMatters, BMC, or ServiceNow. PagerDuty executives summarized the value proposition as they orchestrate teams and algorithms to reduce the mean time to solve problems. When an alert comes in, the question is, “Now what?” If they don’t have the right people or systems to respond appropriately, it doesn’t matter how “fine-grained” server data is. The key questions are, “How do you drive the response? How do you get the right data to do a specific action? How do you drive the response?” To deliver on its operations cloud ambition, PagerDuty must double-click on its marketing effort and highlight outcomes resonating with a broader set of stakeholders outside of incident management.

Driving contextual knowledge for actions and automation to enable organizations to deliver on OneOffice experiences

PagerDuty’s Service Graph is more than “just” a visualization of the interdependencies of services. It tries to tackle one of automation’s big challenges, namely reacting to and capturing changes in the process environment. Capturing historical events in a graph database allows PagerDuty to provide responders with information about past events showing similarities to the events that need attention, providing the foundation for auto-remediation.

From a holistic operations point of view, PagerDuty gathers information from historical events and inputs it receives from the tools it integrates with (such as AIOps and observability tools) and turns both into events using a common events format. Events can be alerts or state changes. In other words, the focus is on an event-driven architecture the PagerDuty platform can leverage, regardless of the inputs.

The goal is to bring machine automation to bear in a domain-agnostic way, translating incoming signals and allowing users to see, for example, that an event is associated with an SQL database. PagerDuty recognizes how different systems signal a problem and uses a common taxonomy to translate those signals and drive responses, including automating a diagnosis or triggering automated repairs. While the strategy is sound, PagerDuty needs to educate the market on the relevance of contextual knowledge; the marketing noise tends to center on insights such as logs and metrics delivered by AIOps and observability providers.

The Bottom Line: To manage cloud-native deployments, operations leaders must progress to domain-agnostic AIOps strategies.

In the wondrous world of AIOps and observability, PagerDuty stands out for many reasons. The main distinction is it blends insights with actions and automation from a domain-agnostic point of view, aiming to overcome fragmentation common to many operational tools. With that, PagerDuty could evolve into a conduit of operationalizing the OneOffice as it orchestrates a broad array of operational technologies.

If PagerDuty follows ServiceNow and expands its capabilities to business operations, the broader market will pay more attention to its progress.

Operations leaders should heed the lessons from those discussions with PagerDuty to advance their operational strategies and blend the insights from AIOps and observability with the automation that runbooks and other approaches offer. Only then can applications that run containerized be effectively supported.

Sign in to view or download this research.

Login

Register

Insight. Inspiration. Impact.

Register now for immediate access of HFS' research, data and forward looking trends.

Get Started

Logo

confirm

Congratulations!

Your account has been created. You can continue exploring free AI insights while you verify your email. Please check your inbox for the verification link to activate full access.

Sign In

Insight. Inspiration. Impact.

Register now for immediate access of HFS' research, data and forward looking trends.

Get Started
ASK
HFS AI