Uptime SLAs for Agency Clients: What to Promise and What Monitoring You Need
Most agencies that offer uptime SLAs to clients have not thought through what they are actually promising. "99.9% uptime" sounds concrete. It means a maximum of 8.7 hours of downtime per year. But do you know which of your clients' sites were down for more than 8.7 hours last year? Do you have the monitoring data to prove they were not?
If the answer is "probably not," you are not alone. Uptime SLAs are common in agency contracts, but the monitoring infrastructure to back them up is rare.
What 99.9% Uptime Actually Means
The number matters less than you might think. What matters is:
- What counts as "downtime" under the SLA?
- How is downtime measured?
- What is the remedy when the SLA is breached?
- What exclusions apply?
An agency that defines "downtime" as "the site returning a non-200 response from our primary monitoring node" and "remedy" as "a service credit of one day's hosting cost" is making a very different commitment than one that defines "downtime" as "any degradation detected from at least two independent measurement points" with a remedy of "30% off the following month's retainer."
Before you include an uptime SLA in a client contract, write out those four definitions. The rest of this guide assumes you are at the stage of making those definitions operational.
Three Tiers of Uptime Commitment
Not every client needs — or should get — the same uptime commitment. A tiered approach keeps your monitoring costs proportional to the value of the commitment.
Tier 1 — Informational only (no SLA)
You monitor the site and share monthly reports, but you make no contractual uptime commitment. You notify the client when downtime occurs. Suitable for small retainer clients where you are managing the site but not the infrastructure.
Tier 2 — Reasonable effort (soft SLA)
You commit to investigating any outage within a defined response window (e.g., 2 hours during business hours). You do not guarantee a specific uptime percentage, but you commit to a documented response process. Suitable for most mid-tier retainer clients.
Tier 3 — Measured SLA (hard SLA)
You commit to a specific uptime percentage over a rolling 30-day window. You have monitoring that measures this from at least two independent vantage points. You have a defined remedy for breaches. Suitable for clients paying a premium retainer where site availability is a material business risk.
Most agencies should only offer Tier 3 to a handful of clients — the ones where the retainer justifies the operational overhead.
What Monitoring You Need to Back a Hard SLA
A hard SLA requires monitoring that is credible enough to function as evidence. This means several things:
Independent measurement. Your monitoring must not run from the same infrastructure as the site you are monitoring. If your client's site and your monitoring tool are both on the same AWS region, a regional outage takes out both simultaneously — and you have no record of the downtime. Use a monitoring provider that measures from independent geographic locations.
Not just HTTP checks. A site can return 200 OK from the wrong server (after a DNS change), from an expired certificate (causing browser warnings), or from a degraded upstream (partial functionality). Pure HTTP uptime checks miss all of these. A defensible SLA measurement should include:
- HTTP/HTTPS availability
- SSL certificate validity (not just that port 443 is open)
- DNS resolution correctness (the record points where it should)
Timestamped, exportable records. If a client disputes an SLA calculation, you need records that show when downtime started, when it ended, and how it was classified. These records need to be exportable from your monitoring system — not locked inside a dashboard that might not exist in two years.
Agreed exclusions. Document what does not count against the SLA: scheduled maintenance, outages caused by the client's own changes, outages from third-party services the client has chosen (e.g., their own hosting provider). Get these exclusions signed before they are relevant.
Common SLA Mistakes Agencies Make
Promising SLAs they cannot measure. If you do not have monitoring data for a site, you cannot measure an SLA for it. Add the monitoring first; write the SLA second.
Including vendor dependencies in the SLA. If a client's checkout is powered by Stripe and Stripe goes down, that is not your uptime failure. Make sure your SLA explicitly excludes third-party service outages that are outside your control — and make sure your monitoring distinguishes between your infrastructure and upstream vendor issues.
Forgetting SSL and DNS. A client whose SSL certificate expires is "up" by the definition of most uptime monitors, but their site will display a browser warning and users will leave. SSL expiry and DNS drift should be covered by your monitoring even if they are not in the SLA percentage calculation.
No escalation path. An SLA without a defined escalation path is just a number. Who gets paged when the site goes down at 2am? How long before they are expected to respond? What happens if the first responder is unreachable? Write this down before you need it.
Writing the SLA Clause
A clean SLA clause covers these elements:
- Measurement window — "Rolling 30-day period"
- Uptime definition — "Percentage of 5-minute intervals during the measurement window in which the site returned a successful HTTP 200 response from at least two independent monitoring nodes"
- Exclusions — "Scheduled maintenance windows agreed in writing at least 48 hours in advance; outages caused by third-party service providers; outages caused by changes made by or at the direction of the client"
- Measurement method — "Merlonix monitoring, checked every 5 minutes from at least two geographically distributed nodes"
- Remedy — "Service credit of X% of the monthly retainer for each percentage point below the guaranteed uptime, up to a maximum of Y% of the monthly retainer"
- Reporting — "Monthly uptime report delivered within 5 business days of month end"
The specific numbers (X%, Y%) depend on your commercial relationship. The structure above is what makes the clause enforceable.
Starting Small
If you have never offered a formal uptime SLA, start with one client — the one whose site is most stable and whose retainer is high enough to justify the operational overhead. Get your monitoring in place first. Run it for 30 days before you write anything into a contract. Use that first month's data to understand what your actual uptime looks like before you promise a number.
Merlonix monitors SSL certificates, DNS records, and HTTP availability from independent resolvers, with timestamped records you can export for SLA reporting. Start monitoring a client site →
→ Complete guide: Agency Monitoring: The Complete Guide to Monitoring Client Websites at Scale