How to Handle Third-Party Downtime with Clients: Communication and Escalation
When Stripe goes down on a Friday afternoon and a client's checkout stops working, someone is going to get a call. If you are a marketing or digital agency managing that client's site, the call is coming to you — even though you did not cause the problem, cannot fix it, and have no visibility into when it will be resolved.
This is one of the most predictable and worst-handled situations in agency client relationships. The agency scrambles to investigate something they cannot control, communicates poorly while under pressure, and the client loses confidence in people who were not at fault.
Here is the playbook that changes that outcome.
The Fundamental Problem: You Are the Interface
When a third-party service causes a client-visible failure, the client does not call Stripe or Cloudflare — they call you. You are the agency that manages their digital presence, so you are the responsible party in their mind.
This is not unfair. It is correct. The agency's job is to manage the client's digital ecosystem, and that includes knowing when upstream vendors are causing problems and communicating that proactively. The failure is not having a vendor outage; the failure is being surprised by it, or being slower than the client to find out, or communicating vaguely under pressure.
The agencies that handle third-party downtime well do three things differently: they know about incidents before the client does, they communicate clearly, and they follow a consistent escalation process.
Detection: Knowing Before the Client Does
The worst version of third-party downtime is when the client calls to report that their checkout is broken and you are hearing about the Stripe outage for the first time. The client now knows the agency is not watching their site, or at least not effectively.
The better version is that your monitoring system detects the vendor incident, your team is already aware of it and investigating, and when the client calls you can immediately say: "Yes, we have been watching this — it is a Stripe outage that started 40 minutes ago. Here is what we know."
Achieving this requires monitoring vendor status feeds, not just checking whether your client's URL returns a 200. URL availability checks will eventually reflect the outage, but they will not tell you what vendor caused it, how severe it is, or when it is expected to resolve. That information lives on the vendor's status page.
The First Response (Minutes 0–15)
The goal of the first response is to communicate two things quickly:
- We are aware of the problem.
- We know (or are working to confirm) the cause.
A first response that does both:
"Hi [Name], we are aware that the checkout flow is experiencing issues. We have identified this as an ongoing Stripe outage affecting payment processing — this is not isolated to your site. We are monitoring their status page for updates and will send you a note as soon as they confirm resolution. [Stripe status page link]."
A first response that does neither:
"Thanks for reaching out, we are looking into this and will get back to you shortly."
The second version is worse than saying nothing, because it implies the agency does not yet know what is wrong. In the time it takes to send that message, a client who can search "[vendor] outage" can find more information than the agency is providing.
What to Include in Status Updates
For an ongoing third-party outage that is affecting a client site, send updates at predictable intervals — every 30–60 minutes if the outage is ongoing — rather than waiting until it resolves. Silence during an outage reads as neglect.
Each update should include:
Current status. Is the outage ongoing? Has it been resolved? Is it partially resolved?
Source. Link to the vendor's own status page. This serves two purposes: it gives the client a place to check between your updates, and it makes clear that the cause is the vendor, not the agency.
Impact scope. Is this affecting all users, some users, a specific region, a specific product? Vendors often provide this detail on their status page.
Estimated resolution. If the vendor has provided an ETA, include it. If they have not, say so explicitly rather than guessing.
What you are doing. During a third-party outage, what the agency is doing is monitoring the vendor's status page and preparing to verify resolution. Be honest about this — "we are monitoring their status page and will verify your site returns to normal as soon as they report resolution" is accurate and professional.
Communicating That It Is Not Your Fault (Without Being Defensive)
The instinct during a third-party outage is to emphasise that it is not the agency's fault. This is understandable, but executed poorly it sounds defensive and shifts focus away from what the client actually needs, which is information and a resolution timeline.
The better approach is to make the cause clear, once, early, and then focus the rest of the communication on status and resolution. The client does not need to be told repeatedly that it is Stripe's fault — they need to know what is happening and when it will be over.
Phrases that communicate cause without being defensive:
- "This is an active [Vendor] outage affecting payment processing across their platform."
- "Our monitoring picked this up as a [Vendor] incident at [time]."
- "The issue is on [Vendor]'s side — we are tracking their status page."
After Resolution: The Incident Close
When the vendor reports resolution, do not assume the client's site is immediately back to normal. Verify it. Check that the specific functionality that was failing is working again, then communicate:
"Stripe has confirmed the outage is resolved. I have verified that checkout is working normally on your site — tested at [time] from [location]. Total downtime for your checkout: approximately [N] hours from [start time] to [end time]. Let me know if you see anything that still looks off."
This close-out message does three things:
- Confirms resolution (not just the vendor's claim of resolution)
- Gives the client a concrete record of the outage window
- Invites the client to report anything else, which prevents a second call about a secondary issue that was missed
Building the Vendor Monitoring Habit
This playbook works only if you know about vendor outages before the client does. That requires:
- Active monitoring of the status pages for every vendor your clients depend on
- Alerting that reaches your team fast enough to respond before the client calls
- A map of which vendors affect which clients, so alerts reach the right account manager
Without this infrastructure, you are dependent on either your own site checks eventually reflecting the outage, or the client telling you. Neither is acceptable for professional agency operations.
Merlonix monitors vendor status feeds for Stripe, Cloudflare, Mailchimp, Shopify, and others, maps them to your client portfolio, and alerts the right account manager before your clients call. See how vendor monitoring works →
→ Complete guide: Agency Monitoring: The Complete Guide to Monitoring Client Websites at Scale