Built for Fly.io agencies — 14-day free trial

`fly certs check` reports OK while a Sydney edge serves a 2-day-expired cert.
The CLI shows the latest issued resource. The browser hitting the Sydney edge shows an expired cert.

Fly.io agencies running multi-region anycast apps with Let's Encrypt HTTP-01 cert renewal deal with per-region cert distribution lag where individual region edges serve stale or expired certs while the central `fly certs check` CLI reports the latest issued resource as OK, apex A records pinned at registrars that break the moment Fly rotates the documented anycast IPs and the next HTTP-01 challenge can't reach the new edge, and Cloudflare-in-front WAF rules that block Let's Encrypt's validators from /.well-known/acme-challenge/* paths needed for HTTP-01 renewal. Merlonix monitors every served cert at every region edge so per-region drift surfaces before browsers do.

No credit card for the trial. Cancel any time.

Check cadence (Agency)
5 min
SSL pre-expiry alert
30 days
Independent DNS resolvers
3
Vendors watched
11

Where Fly.io agencies get caught out

Three failure modes specific to Fly.io anycast deployments with per-region cert distribution where individual region edges can serve stale or expired certs while the central CLI reports OK, apex A records pinned at registrars that don't support ALIAS, and Cloudflare-in-front WAF rules that block Let's Encrypt's HTTP-01 validator.

Fly.io agencies deal with per-region cert distribution lag where one region's edge can serve a stale or expired cert while every other region serves the new cert (and the central `fly certs check` CLI says the cert is fine, because the latest-issued resource IS fine — it just hasn't reached every region's edge yet), apex A records pinned at registrars that break the moment Fly rotates the documented anycast IPs as part of routine infrastructure rebalancing, and Cloudflare-in-front WAF rules where CF's bot management blocks Let's Encrypt's validator from reaching the /.well-known/acme-challenge/* paths needed for HTTP-01 renewal.

Fly.io issues one logical cert per custom domain via Let's Encrypt and distributes it to every region's edge where the app runs. `fly certs check` reports the latest-issued cert resource state — but says nothing about per-region distribution. When an edge in a specific region (Sydney, Frankfurt, São Paulo) fails to pull the new cert before the old one expires, that region serves the old (now expired) cert while every other region serves the new cert. End-users in that region see browser warnings; users in other regions are unaffected. The agency's monitoring (if it checks only one region or one IP) misses the failure entirely

A Fly.io agency manages a multi-region SaaS app deployed to 8 Fly regions (iad, sjc, lhr, fra, syd, gru, hkg, sin). The cert auto-renews via Fly's Let's Encrypt integration. During one renewal cycle, the Sydney (syd) edge fails to pull the new cert — the underlying machine had a transient pull failure during the distribution window. Every other region successfully pulled the new cert. `fly certs check` reports the cert as valid (the new cert IS the latest issued resource). Australian users start seeing browser warnings. The agency's SRE checks `fly certs check`, sees OK, and looks for other root causes — wasting 4 hours before someone thinks to check per-region

A Fly.io agency operates a B2B SaaS for a client with users in 6 countries. The app is deployed to 8 Fly regions for low-latency anycast routing. The cert was provisioned 60 days ago and auto-renews 30 days before expiry via Fly's built-in Let's Encrypt integration. The auto-renewal runs; Fly issues a new cert and begins distributing it to every region's edge. The Sydney (syd) edge machine has a transient network issue during the distribution window — the cert-distribution daemon tried to pull the new cert, failed with a timeout, and the retry logic wasn't aggressive enough to recover before the old cert expired. Every other region (iad, sjc, lhr, fra, gru, hkg, sin) successfully pulled the new cert. The old cert expires 30 days later. Sydney users start seeing browser warnings (NET::ERR_CERT_DATE_INVALID). The agency's SRE receives a user complaint, runs `fly certs check example.com`, sees the cert reported as valid (which is true at the metadata layer — the latest issued cert IS valid, with 60 days until expiry). The SRE checks DNS, checks the registrar, checks Fly's status page (no incidents) — every layer reports healthy. Four hours into the investigation, someone runs `curl -v https://example.com` from a Sydney VPN exit and sees the expired cert. The agency restarts the Sydney edge machine; cert distribution retries and succeeds. Total downtime for Sydney users: 18 hours. The agency's monitoring was hitting iad (US East) which served the new cert correctly, so the dashboard showed everything green.

Fly publishes documented anycast IPs for apex domain pointing (A records at the apex when the registrar doesn't support ALIAS/ANAME). Fly explicitly recommends using Fly DNS or ALIAS records — but many registrars (older ones, GoDaddy without their premium DNS tier) don't support ALIAS at the apex, forcing agencies to use the documented A records. Fly periodically rotates these anycast IPs (infrastructure rebalancing, capacity additions, occasional retirements). The agency's registrar A records become stale. The next Let's Encrypt HTTP-01 renewal cycle fails silently — the challenge can't reach the new Fly edge through the old A record

A Fly.io agency provisions a new client app on Fly using their existing registrar (a budget DNS provider that doesn't support ALIAS). The agency engineer follows Fly's docs, copies the two documented anycast IPs (e.g., 66.241.124.111 and 66.241.125.111 — example values from Fly docs), and adds them as A records at the apex of the client's domain. The setup works: the cert provisions, the app serves traffic, the client is happy. Six months later, Fly rotates one of the documented anycast IPs as part of a capacity addition. The cert is approaching renewal; Fly initiates an HTTP-01 challenge. Let's Encrypt's validator queries the A record, gets the OLD anycast IP, attempts the challenge against an IP that no longer routes to the client's app — the challenge fails. Fly retries every 24 hours; every retry fails. The existing cert continues serving from the active anycast IP until expiry — and then begins serving as expired

A Fly.io agency provisions a new client app eight months ago on Fly using a budget registrar (Namecheap on the basic DNS tier, which doesn't offer ALIAS records at the apex). The agency engineer follows Fly's documented setup: copy the two documented anycast IPs from the Fly docs page, configure A records at the apex (example.com → 66.241.124.111 and example.com → 66.241.125.111), point www CNAME to the apex. The cert provisions cleanly via Fly's Let's Encrypt integration; the app serves traffic. The agency completes the project and moves on; the client app runs untouched. Eight months in, Fly rotates one of the documented anycast IPs as part of an infrastructure rebalancing. The Fly blog notes "we're rotating one of our documented apex IPs over the next 30 days; agencies should ensure their A records are updated. New customers will get the new IP set." The agency engineer who set up this client is on another project and doesn't read Fly's engineering blog. The cert auto-renews three months in (Fly's default is 30 days before expiry); Let's Encrypt issues an HTTP-01 challenge to the apex domain. The validator queries the A records, gets the OLD anycast IP (the one Fly is rotating away), routes the request to that IP — which now lands on a different Fly cluster that doesn't recognize this client's app. The challenge fails with a 404 on the .well-known/acme-challenge path. Fly silently retries every 24 hours. The existing cert continues serving from the other (still-valid) anycast IP for another 30 days. Then the existing cert expires. Half of the user traffic (the half hitting the still-active anycast IP) starts seeing browser warnings; the other half (the half that was hitting the rotated-away anycast IP) was already broken for two months and just nobody noticed because that path was returning 404s, not cert errors. The agency's discovery happens via a customer report.

Agencies adding Cloudflare in front of Fly for WAF / DDoS protection (typical hardening for production apps) often forget to configure the page rule that bypasses the WAF for /.well-known/acme-challenge/* paths. Cloudflare's default bot management blocks the Let's Encrypt validator (which presents as an unverified user agent) when it requests the challenge token. The HTTP-01 challenge fails silently. Fly retries every 24 hours; every retry fails. The existing cert continues serving until expiry, then begins serving as expired. The CF dashboard shows the request as "blocked by bot management" but the agency never looks there during renewal failures because they don't connect the two events

A Fly.io agency hardens a client&apos;s production app by adding Cloudflare in front of the Fly edge for WAF and DDoS protection. The CF setup is straightforward: orange-cloud the apex, configure firewall rules for the application&apos;s specific patterns. The app continues to work; clients are happy. Three months later, the cert is approaching renewal. Fly initiates the HTTP-01 challenge. Let&apos;s Encrypt&apos;s validator (acme-staging-v02.api.letsencrypt.org or the prod equivalent) requests https://example.com/.well-known/acme-challenge/<token>. The request hits Cloudflare first; CF&apos;s default bot management identifies the request as a known automation pattern and challenges it with a JavaScript challenge. The Let&apos;s Encrypt validator doesn&apos;t run JavaScript; the challenge is rejected. Fly receives "challenge failed" from Let&apos;s Encrypt and retries 24 hours later. Every retry fails the same way. Three months later (when the existing cert expires) the app starts serving an expired cert

A Fly.io agency operates a B2C product for a client. The client is approaching a Black Friday launch and wants production hardening — WAF, DDoS protection, bot management. The agency adds Cloudflare in front of the Fly edge using CF&apos;s "Orange Cloud" mode (CF terminates TLS at the edge, then proxies the request to Fly which terminates again — double TLS termination). The CF setup is straightforward: the agency configures custom firewall rules for the application&apos;s API endpoints, enables CF&apos;s default bot management at the "Medium" sensitivity level. The app continues to work; CF&apos;s WAF blocks obvious bot traffic; clients are happy with the upgrade. Three months later, the cert at Fly&apos;s side approaches renewal. Fly initiates the Let&apos;s Encrypt HTTP-01 challenge. Let&apos;s Encrypt&apos;s validator (a specific known IP range and user agent — the Let&apos;s Encrypt docs document both) requests https://example.com/.well-known/acme-challenge/<token>. The request hits Cloudflare first. CF&apos;s bot management at "Medium" sensitivity examines the request: the user agent is unusual (the Let&apos;s Encrypt validator doesn&apos;t mimic a real browser), the request pattern is automated (one-off requests to a specific path with no session establishment). CF returns a 403 with a JavaScript challenge page. The Let&apos;s Encrypt validator doesn&apos;t execute JavaScript; the challenge fails. Fly receives a "challenge_failed" response from Let&apos;s Encrypt; the cert renewal cycle pauses. Fly silently retries every 24 hours. Every retry fails identically. After 90 days (when the existing cert expires) the app starts serving the now-expired cert. The agency&apos;s discovery happens via a customer report — the engineering team looks at Fly&apos;s cert state (`fly certs check` shows "renewal failing for 90 days" if anyone scrolls past the latest issued cert), looks at Cloudflare&apos;s firewall events (sees the bot management challenges blocking acme-challenge requests), realizes the issue, and adds a CF page rule bypassing the WAF for the /.well-known/acme-challenge/* path. The fix is simple once identified, but discovery took 4 hours of investigation while the app served browser warnings.

How it works

SSL and DNS monitoring for Fly.io agencies across per-region cert distribution lag, anycast IP rotations breaking apex A records, and Cloudflare-in-front WAF rules blocking Let's Encrypt HTTP-01 renewal challenges.

Merlonix monitors SSL expiry and DNS integrity across every Fly-attached subdomain — app.* and api.* (with per-region checks for multi-region deployments), plus the apex domain where A records can drift after Fly rotates documented anycast IPs — and catches per-region cert distribution failures where one edge serves a stale or expired cert while the central CLI reports OK, apex A records pinned at registrars that don't support ALIAS breaking the next HTTP-01 renewal cycle, and Cloudflare-in-front WAF rules where bot management blocks Let's Encrypt's validator from the /.well-known/acme-challenge/* paths — before clients see browser warnings.

01

Add Fly application domains — apex, www.*, app.*, api.* — with DNS TXT verification that catches per-region cert distribution lag, apex A record drift after Fly anycast IP rotations, and CF-in-front WAF blocking HTTP-01 renewal challenges

Verify ownership with a DNS TXT record on the apex domain. All subdomains under that apex — app.* (Fly-edged), api.* (Fly-edged), plus any per-region staging endpoints — are added without additional verification. Monitoring every Fly-attached subdomain from multiple geographic check points catches the per-region cert distribution failures (a Sydney edge serving an expired cert while the central CLI reports OK), the apex A record drift after Fly rotates anycast IPs (the next HTTP-01 challenge fails because the A record points at a stale IP), and the CF-in-front renewal failures (CF&apos;s WAF blocks Let&apos;s Encrypt&apos;s validator from the acme-challenge path). Under two minutes per client.

02

CNAME and A record monitoring across Fly anycast IP rotations and registrar configuration drift — surfacing the apex A record staleness that breaks HTTP-01 renewal silently

Three independent DNS resolvers check every apex A record on every monitoring interval. When Fly rotates one of its documented anycast IPs and the agency&apos;s registrar A record stays pinned to the old IP, the drift is surfaced immediately as an alert — well before the next renewal cycle attempts an HTTP-01 challenge against the stale IP. Cert renewal failures show up as ongoing alerts (renewal has been failing for X days), not just as final expiry events 90 days later when the existing cert runs out.

03

SSL monitoring 30 days before expiry across every Fly-attached subdomain — including independent per-region edge checks to catch the cert distribution lag that the Fly CLI doesn&apos;t surface

Full SSL chain validation on every Fly-attached subdomain — apex, app.*, api.* — with checks from multiple geographic points so that a Sydney edge serving a 2-day-expired cert surfaces in the first check cycle, not 18 hours later when an Australian customer reports a browser warning. Each Fly app gets the same 30-day pre-expiry alert across the full set of regions where it&apos;s deployed, so per-region distribution failures are caught at the point of failure rather than at final expiry.

04

Vendor status for Fly.io (global), Let&apos;s Encrypt, Cloudflare (when used as WAF in front of Fly), plus DNS providers used at the registrar to distinguish Fly platform incidents from per-tenant SSL configuration failures

Merlonix monitors Fly.io&apos;s global status alongside client SSL and DNS. When a Fly anycast event causes per-region cert distribution failures across multiple client tenants simultaneously, you see the vendor event — not a cluster of individual SSL alerts that each require separate investigation to determine whether the root cause is a Fly platform incident, an apex A record stale after an IP rotation, a CF-in-front WAF rule blocking HTTP-01 validation, or genuine per-region edge cert distribution lag.

What the numbers mean for Fly.io agencies

Monitoring built for Fly.io agencies where one client app means 8 region edges each with independent cert distribution, an apex A record that depends on Fly's anycast IPs staying stable, and (when CF-in-front is added for production hardening) WAF rules that need to bypass /.well-known/acme-challenge/* paths — each with independent failure modes.

Fly.io agencies running multi-region anycast deployments with per-region cert distribution lag, apex A records pinned at budget registrars without ALIAS support, and CF-in-front hardening that can silently block Let's Encrypt's validator need monitoring that covers every region's edge — because per-region cert distribution failures are silent (the central `fly certs check` CLI reports OK while one region serves an expired cert), apex A record drift after a Fly anycast IP rotation is silent (the existing cert keeps serving until expiry, when the next renewal fails), and CF-in-front WAF rules blocking acme-challenge paths are silent (renewal retries every 24 hours, failing the same way every time).

< 10 min

Time from DNS change to alert — catches Fly anycast IP rotations leaving apex A records stale, registrar nameserver changes that strip HTTP-01 validation infrastructure, and CF-in-front configuration changes that newly block /.well-known/acme-challenge/* paths

30 days

SSL expiry warning lead time — enough time to identify a Fly per-region cert distribution failure (one edge serving a stale cert while the rest serve the new one), an apex A record gone stale after a Fly anycast IP rotation (HTTP-01 renewal can&apos;t reach the new edge), or a CF-in-front WAF rule newly blocking the Let&apos;s Encrypt validator — and correct it before clients see browser warnings

11 vendors

Upstream services monitored — Fly.io global status, Let&apos;s Encrypt status, Cloudflare (when in front of Fly for WAF), plus DNS providers used at the registrar included to distinguish Fly platform incidents from per-tenant SSL configuration failures

200 assets

Maximum monitored domains on the Agency plan — covers Fly app.* and api.* across all regions of a multi-region deployment, plus per-environment staging endpoints, across a full Fly client portfolio

Pricing

Flat monthly fee. Every Fly region, every per-region edge cert, every CF-in-front config included.

No per-region charges. No per-app fees. Pick the tier that fits your Fly portfolio and per-region check count and monitor every region's edge cert without billing surprises.

See full feature comparison →

Starter

For individual Fly.io developers managing a small client portfolio with single-region deployments.

$29/ month

  • 10 monitored assets
  • 1 seat
  • 15-min check cadence
  • SSL + DNS + vendor monitoring
  • Email + Slack alerts
Most chosen

Team

For Fly.io agencies managing multi-region deployments where per-region cert distribution lag is a real risk and apex A records depend on stable anycast IPs.

$79/ month

  • 50 monitored assets
  • 5 seats
  • 10-min check cadence
  • SSL + DNS + vendor monitoring
  • Email + Slack alerts

Agency

For agencies with a full Fly.io client roster including CF-in-front production hardening where WAF rules can silently block Let&apos;s Encrypt HTTP-01 validation, plus multi-region deployments across 8+ regions.

$199/ month

  • 200 monitored assets
  • 15 seats
  • 5-min check cadence
  • SSL + DNS + vendor monitoring
  • Email + Slack alerts

Know when one of your Fly region edges is serving a stale cert — 18 hours before an Australian customer complains.

Add your first Fly.io client domain in under two minutes. Multi-region deployments, apex A records, and CF-in-front configurations are monitored from the same dashboard. 14-day trial, no card required.