Built for Fly.io agencies — 14-day free trial

Your Fly CLI says the cert is fine.
One region edge is serving an expired one.

Fly.io agencies running multi-region anycast apps with Let's Encrypt HTTP-01 cert renewal deal with per-region cert distribution lag where individual region edges serve stale or expired certs while the central `fly certs check` CLI reports the latest issued resource as OK, apex A records pinned at registrars that break the moment Fly rotates the documented anycast IPs and the next HTTP-01 challenge can't reach the new edge, and Cloudflare-in-front WAF rules that block Let's Encrypt's validators from /.well-known/acme-challenge/* paths needed for HTTP-01 renewal. Merlonix monitors every served cert at every region edge so per-region drift surfaces before browsers do.

Start free for 14 days Or start free — $0, no card How it works

No credit card either way — start free, or trial the full workspace.

Check cadence (Agency): 1 min
SSL pre-expiry alert: 30 days
Independent DNS resolvers: 3
Vendors watched: 11

Where Fly.io agencies get caught out

Three failure modes specific to Fly.io anycast deployments with per-region cert distribution where individual region edges can serve stale or expired certs while the central CLI reports OK, apex A records pinned at registrars that don't support ALIAS, and Cloudflare-in-front WAF rules that block Let's Encrypt's HTTP-01 validator.

Fly.io agencies deal with per-region cert distribution lag where one region's edge can serve a stale or expired cert while every other region serves the new cert (and the central `fly certs check` CLI says the cert is fine, because the latest-issued resource IS fine — it just hasn't reached every region's edge yet), apex A records pinned at registrars that break the moment Fly rotates the documented anycast IPs as part of routine infrastructure rebalancing, and Cloudflare-in-front WAF rules where CF's bot management blocks Let's Encrypt's validator from reaching the /.well-known/acme-challenge/* paths needed for HTTP-01 renewal.

One region serves a stale cert

One Fly region edge serves a stale cert while fly certs check reports OK and every other region is fine

A Fly.io agency operates a B2B SaaS for a client with users in 6 countries. The app is deployed to 8 Fly regions for low-latency anycast routing. The cert was provisioned 60 days ago and auto-renews 30 days before expiry via Fly's built-in Let's Encrypt integration. The auto-renewal runs; Fly issues a new cert and begins distributing it to every region's edge. The Sydney (syd) edge machine has a transient network issue during the distribution window — the cert-distribution daemon tried to pull the new cert, failed with a timeout, and the retry logic wasn't aggressive enough to recover before the old cert expired. Every other region (iad, sjc, lhr, fra, gru, hkg, sin) successfully pulled the new cert. The old cert expires 30 days later. Sydney users start seeing browser warnings (NET::ERR_CERT_DATE_INVALID). The agency's SRE receives a user complaint, runs `fly certs check example.com`, sees the cert reported as valid (which is true at the metadata layer — the latest issued cert IS valid, with 60 days until expiry). The SRE checks DNS, checks the registrar, checks Fly's status page (no incidents) — every layer reports healthy. Four hours into the investigation, someone runs `curl -v https://example.com` from a Sydney VPN exit and sees the expired cert. The agency restarts the Sydney edge machine; cert distribution retries and succeeds. Total downtime for Sydney users: 18 hours. The agency's monitoring was hitting iad (US East) which served the new cert correctly, so the dashboard showed everything green.

Anycast IP rotation leaves A records stale

A pinned apex A record goes stale after Fly rotates an anycast IP, so the next HTTP-01 renewal challenge fails

A Fly.io agency provisions a new client app eight months ago on Fly using a budget registrar (Namecheap on the basic DNS tier, which doesn't offer ALIAS records at the apex). The agency engineer follows Fly's documented setup: copy the two documented anycast IPs from the Fly docs page, configure A records at the apex (example.com → 66.241.124.111 and example.com → 66.241.125.111), point www CNAME to the apex. The cert provisions cleanly via Fly's Let's Encrypt integration; the app serves traffic. The agency completes the project and moves on; the client app runs untouched. Eight months in, Fly rotates one of the documented anycast IPs as part of an infrastructure rebalancing. The Fly blog notes "we're rotating one of our documented apex IPs over the next 30 days; agencies should ensure their A records are updated. New customers will get the new IP set." The agency engineer who set up this client is on another project and doesn't read Fly's engineering blog. The cert auto-renews three months in (Fly's default is 30 days before expiry); Let's Encrypt issues an HTTP-01 challenge to the apex domain. The validator queries the A records, gets the OLD anycast IP (the one Fly is rotating away), routes the request to that IP — which now lands on a different Fly cluster that doesn't recognize this client's app. The challenge fails with a 404 on the .well-known/acme-challenge path. Fly silently retries every 24 hours. The existing cert continues serving from the other (still-valid) anycast IP for another 30 days. Then the existing cert expires. Half of the user traffic (the half hitting the still-active anycast IP) starts seeing browser warnings; the other half (the half that was hitting the rotated-away anycast IP) was already broken for two months and just nobody noticed because that path was returning 404s, not cert errors. The agency's discovery happens via a customer report.

Cloudflare WAF blocks ACME challenge

A Cloudflare WAF rule blocks the Let's Encrypt validator from the acme-challenge path, so HTTP-01 renewal fails

A Fly.io agency operates a B2C product for a client. The client is approaching a Black Friday launch and wants production hardening — WAF, DDoS protection, bot management. The agency adds Cloudflare in front of the Fly edge using CF's "Orange Cloud" mode (CF terminates TLS at the edge, then proxies the request to Fly which terminates again — double TLS termination). The CF setup is straightforward: the agency configures custom firewall rules for the application's API endpoints, enables CF's default bot management at the "Medium" sensitivity level. The app continues to work; CF's WAF blocks obvious bot traffic; clients are happy with the upgrade. Three months later, the cert at Fly's side approaches renewal. Fly initiates the Let's Encrypt HTTP-01 challenge. Let's Encrypt's validator (a specific known IP range and user agent — the Let's Encrypt docs document both) requests https://example.com/.well-known/acme-challenge/<token>. The request hits Cloudflare first. CF's bot management at "Medium" sensitivity examines the request: the user agent is unusual (the Let's Encrypt validator doesn't mimic a real browser), the request pattern is automated (one-off requests to a specific path with no session establishment). CF returns a 403 with a JavaScript challenge page. The Let's Encrypt validator doesn't execute JavaScript; the challenge fails. Fly receives a "challenge_failed" response from Let's Encrypt; the cert renewal cycle pauses. Fly silently retries every 24 hours. Every retry fails identically. After 90 days (when the existing cert expires) the app starts serving the now-expired cert. The agency's discovery happens via a customer report — the engineering team looks at Fly's cert state (`fly certs check` shows "renewal failing for 90 days" if anyone scrolls past the latest issued cert), looks at Cloudflare's firewall events (sees the bot management challenges blocking acme-challenge requests), realizes the issue, and adds a CF page rule bypassing the WAF for the /.well-known/acme-challenge/* path. The fix is simple once identified, but discovery took 4 hours of investigation while the app served browser warnings.

How it works

SSL and DNS monitoring for Fly.io agencies across per-region cert distribution lag, anycast IP rotations breaking apex A records, and Cloudflare-in-front WAF rules blocking Let's Encrypt HTTP-01 renewal challenges.

Merlonix monitors SSL expiry and DNS integrity across every Fly-attached subdomain — app.* and api.* (with per-region checks for multi-region deployments), plus the apex domain where A records can drift after Fly rotates documented anycast IPs — and catches per-region cert distribution failures where one edge serves a stale or expired cert while the central CLI reports OK, apex A records pinned at registrars that don't support ALIAS breaking the next HTTP-01 renewal cycle, and Cloudflare-in-front WAF rules where bot management blocks Let's Encrypt's validator from the /.well-known/acme-challenge/* paths — before clients see browser warnings.

Add Fly application domains — apex, www., app., api.* — with DNS TXT verification that catches per-region cert distribution lag, apex A record drift after Fly anycast IP rotations, and CF-in-front WAF blocking HTTP-01 renewal challenges

Verify ownership with a DNS TXT record on the apex domain. All subdomains under that apex — app.* (Fly-edged), api.* (Fly-edged), plus any per-region staging endpoints — are added without additional verification. Monitoring every Fly-attached subdomain from multiple geographic check points catches the per-region cert distribution failures (a Sydney edge serving an expired cert while the central CLI reports OK), the apex A record drift after Fly rotates anycast IPs (the next HTTP-01 challenge fails because the A record points at a stale IP), and the CF-in-front renewal failures (CF's WAF blocks Let's Encrypt's validator from the acme-challenge path). Under two minutes per client.

CNAME and A record monitoring across Fly anycast IP rotations and registrar configuration drift — surfacing the apex A record staleness that breaks HTTP-01 renewal silently

Three independent DNS resolvers check every apex A record on every monitoring interval. When Fly rotates one of its documented anycast IPs and the agency's registrar A record stays pinned to the old IP, the drift is surfaced immediately as an alert — well before the next renewal cycle attempts an HTTP-01 challenge against the stale IP. Cert renewal failures show up as ongoing alerts (renewal has been failing for X days), not just as final expiry events 90 days later when the existing cert runs out.

SSL monitoring 30 days before expiry across every Fly-attached subdomain — including independent per-region edge checks to catch the cert distribution lag that the Fly CLI doesn't surface

Full SSL chain validation on every Fly-attached subdomain — apex, app.*, api.* — with checks from multiple geographic points so that a Sydney edge serving a 2-day-expired cert surfaces in the first check cycle, not 18 hours later when an Australian customer reports a browser warning. Each Fly app gets the same 30-day pre-expiry alert across the full set of regions where it's deployed, so per-region distribution failures are caught at the point of failure rather than at final expiry.

Vendor status for Fly.io (global), Let's Encrypt, Cloudflare (when used as WAF in front of Fly), plus DNS providers used at the registrar to distinguish Fly platform incidents from per-tenant SSL configuration failures

Merlonix monitors Fly.io's global status alongside client SSL and DNS. When a Fly anycast event causes per-region cert distribution failures across multiple client tenants simultaneously, you see the vendor event — not a cluster of individual SSL alerts that each require separate investigation to determine whether the root cause is a Fly platform incident, an apex A record stale after an IP rotation, a CF-in-front WAF rule blocking HTTP-01 validation, or genuine per-region edge cert distribution lag.

What the numbers mean for Fly.io agencies

Monitoring built for Fly.io agencies where one client app means 8 region edges each with independent cert distribution, an apex A record that depends on Fly's anycast IPs staying stable, and (when CF-in-front is added for production hardening) WAF rules that need to bypass /.well-known/acme-challenge/* paths — each with independent failure modes.

Fly.io agencies running multi-region anycast deployments with per-region cert distribution lag, apex A records pinned at budget registrars without ALIAS support, and CF-in-front hardening that can silently block Let's Encrypt's validator need monitoring that covers every region's edge — because per-region cert distribution failures are silent (the central `fly certs check` CLI reports OK while one region serves an expired cert), apex A record drift after a Fly anycast IP rotation is silent (the existing cert keeps serving until expiry, when the next renewal fails), and CF-in-front WAF rules blocking acme-challenge paths are silent (renewal retries every 24 hours, failing the same way every time).

< 10 min

Time from DNS change to alert — catches Fly anycast IP rotations leaving apex A records stale, registrar nameserver changes that strip HTTP-01 validation infrastructure, and CF-in-front configuration changes that newly block /.well-known/acme-challenge/* paths

30 days

SSL expiry warning lead time — enough time to identify a Fly per-region cert distribution failure (one edge serving a stale cert while the rest serve the new one), an apex A record gone stale after a Fly anycast IP rotation (HTTP-01 renewal can't reach the new edge), or a CF-in-front WAF rule newly blocking the Let's Encrypt validator — and correct it before clients see browser warnings

11 vendors

Upstream services monitored — Fly.io global status, Let's Encrypt status, Cloudflare (when in front of Fly for WAF), plus DNS providers used at the registrar included to distinguish Fly platform incidents from per-tenant SSL configuration failures

250 assets

Maximum monitored domains on the Agency plan — covers Fly app.* and api.* across all regions of a multi-region deployment, plus per-environment staging endpoints, across a full Fly client portfolio

Pricing

Flat monthly fee. Every Fly region, every per-region edge cert, every CF-in-front config included.

No per-region charges. No per-app fees. Pick the tier that fits your Fly portfolio and per-region check count and monitor every region's edge cert without billing surprises.

See full feature comparison →

Starter

For individual Fly.io developers managing a small client portfolio with single-region deployments.

$19/ month

15 monitored assets
3 seats
5 min check cadence
SSL + DNS + vendor monitoring
Email + Slack alerts

Start with Starter

Most chosen

Team

For Fly.io agencies managing multi-region deployments where per-region cert distribution lag is a real risk and apex A records depend on stable anycast IPs.

$79/ month

60 monitored assets
10 seats
1 min check cadence
SSL + DNS + vendor monitoring
Email + Slack alerts

Start with Team

Agency

For agencies with a full Fly.io client roster including CF-in-front production hardening where WAF rules can silently block Let's Encrypt HTTP-01 validation, plus multi-region deployments across 8+ regions.

$199/ month

250 monitored assets
Unlimited seats
1 min check cadence
SSL + DNS + vendor monitoring
Email + Slack alerts

Start with Agency

Compliance

For regulated-vertical teams that need continuous, audit-ready evidence.

$699/ month

500 monitored assets
Unlimited seats
1 min check cadence
SSL + DNS + vendor monitoring
Email + Slack alerts

Start with Compliance

Know when one of your Fly region edges is serving a stale cert — 18 hours before an Australian customer complains.

Add your first Fly.io client domain in under two minutes. Multi-region deployments, apex A records, and CF-in-front configurations are monitored from the same dashboard. 14-day trial, no card required.

Start free for 14 days Or start free — $0 Read the agency SSL checklist →

Your Fly CLI says the cert is fine.One region edge is serving an expired one.