Skip to main content

Incident Response Automation

Scenario: When a known failure pattern is detected, an engineer uses the AI to execute a standard response playbook โ€” creating the incident, adding responders, triggering the relevant workflow, and leaving structured notes โ€” all through conversation.

Mode: Write (--enable-write-tools required)

caution

This use case requires write tools. Ensure you understand the actions being taken before confirming any AI-suggested operations that create or modify PagerDuty resources.

Tools involvedโ€‹

ToolPurpose
create_incidentOpen a new incident
manage_incidentsAcknowledge or escalate
add_respondersPage the right people
add_note_to_incidentLog actions and context
start_incident_workflowTrigger a pre-built response workflow
list_incident_workflowsFind the right workflow to trigger
list_escalation_policiesIdentify the correct escalation path
list_usersFind responder user IDs

Example promptsโ€‹

Create a high-urgency incident titled "Database primary down" on the
database-primary service and assign it to the database escalation policy.
Add a note to incident P123456: "Failover initiated to replica db-west-2.
ETA 5 minutes. Monitoring replication lag."
Find the major incident response workflow and trigger it for incident P123456.
Add the database on-call engineer and the SRE lead as responders to P123456
with the message "Database failover in progress โ€” need DB expert and SRE coverage."

Workflowโ€‹

  1. Identify the incident type and affected service
  2. Create the incident with correct title, service, and escalation policy
  3. Add the relevant responders with context in the message
  4. Trigger the incident workflow for your team's standard runbook
  5. Leave structured notes as actions are taken to maintain a clear timeline

Tipsโ€‹

  • Use Tool Filtering with a focused allow list for response workflows โ€” e.g. only expose create_incident, manage_incidents, add_responders, add_note_to_incident to prevent the AI from touching unrelated resources
  • Always review what the AI is about to do before confirming โ€” especially manage_incidents which can bulk-update multiple incidents at once
  • Combine with read tools to let the AI look up user IDs and policy IDs rather than hardcoding them in prompts