Runbook: Third-Party Outage
Security SpecialistOperations & StrategyDevops
This is an example runbook. Review and customize for your protocol before use. Fill in your specific vendors, dependencies, and failover procedures.
Quick Reference
| Field | Value |
|---|---|
| Typical Severity | P2-P3 |
| Primary Responder | Infrastructure SME |
| Last Updated | [Date] |
| Owner | [Name] |
Identification
Symptoms
- Specific functionality broken
- Errors referencing external service
- Status page shows provider outage
- Timeouts to external endpoints
Common Dependencies
| Service Type | Provider(s) | Criticality | Fallback? |
|---|---|---|---|
| RPC | Critical / High / Med | Yes / No | |
| Hosting | |||
| CDN | |||
| DNS | |||
| Monitoring | |||
| Auth | |||
| Oracle |
Immediate Actions
Step 1: Confirm Third-Party Issue
Why: Verify it's not your own infrastructure
- Check provider status page
- Test provider directly
- Check if other users reporting (Twitter, Discord)
Step 2: Assess Your Impact
- Which of your services affected?
- What functionality is degraded/broken?
- User-facing impact?
Step 3: Enable Fallback (if available)
- Switch to backup provider
- Enable cached/degraded mode
- Disable affected features gracefully
By Service Type
RPC Provider Outage
Fallback options:- Switch to secondary RPC
- Use public endpoints temporarily
- Run own node if available
[Document your RPC failover procedure]Hosting/CDN Outage
Fallback options:- Failover to secondary hosting
- Enable maintenance page
- Redirect to status page
Oracle Outage
Impact: Price feeds may be stale or unavailable
Options:- Pause affected operations
Customize this section for your specific oracle dependencies and what operations depend on them.
Communication
Internal
- Update team on status
- ETA if known
- What's affected
External
If significant user impact:
We're experiencing issues with [affected feature] due to a third-party service outage. We're monitoring and will restore service when the provider recovers.
Include:
- What's affected
- What still works
- Where to check for updates
Monitoring the Outage
- Subscribe to provider status updates
- Set reminder to check every 15-30 min
- Monitor your own metrics for recovery
Recovery
When Provider Recovers
- Verify your services are working
- Check for any data inconsistencies
- Monitor for stability
- Switch back from fallback if used
Post-Outage
- Document timeline
- Review if fallback worked
- Assess if additional redundancy needed
- Consider SLA review with provider
Escalation
- Infrastructure Vendors - contact provider support
- Decision Makers - if extended outage or no ETA
Prevention
Reduce third-party dependency risk:
- Multiple RPC providers
- Multi-region hosting
- CDN with failover
- Graceful degradation
- Cached fallbacks where appropriate
- Regular dependency review
Provider Contacts
| Provider | Status Page | Support Contact |
|---|---|---|