Built for Railway agencies — 14-day free trial

Cloudflare-in-front of Railway silently blocks Let's Encrypt's HTTP-01 validator.
Railway dashboard shows the domain as Active. Railway is serving its catch-all cert. Browsers reject it.

Railway agencies running Railway Services with custom domains, PR Environments with per-PR Let's Encrypt cert provisioning, and Cloudflare-in-front-of-Railway deal with CF bot management silently blocking Let's Encrypt's HTTP-01 validator (the Railway dashboard reports the domain as Active because DNS resolves and the service responds, but Railway serves its generic catch-all cert), underlying ingress rotations during platform maintenance breaking the long-lived registrar CNAME, and PR Environments mapped to subdomains of the production domain hitting Let's Encrypt's 50-certs-per-week-per-registered-domain rate limit and blocking production cert renewal for the next 7+ days. Merlonix monitors every Railway-attached subdomain so the wrong-cert and rate-limit-blocked modes surface before clients see browser warnings.

Start free for 14 days How it works

No credit card for the trial. Cancel any time.

Check cadence (Agency): 5 min
SSL pre-expiry alert: 30 days
Independent DNS resolvers: 3
Vendors watched: 11

Where Railway agencies get caught out

Three failure modes specific to Railway deployments with Cloudflare-in-front bot management blocking Let's Encrypt validators, underlying ingress rotations during platform maintenance breaking long-lived registrar CNAMEs, and PR Environments under the production domain exhausting Let's Encrypt rate limit budgets.

Railway agencies deal with Cloudflare bot management silently blocking Let's Encrypt's HTTP-01 validator (the Railway dashboard shows the custom domain as Active because DNS resolves and the service responds, but Railway is serving its catch-all cert), underlying ingress rotations during platform maintenance windows breaking the agency's long-lived registrar CNAME pinned to a now-stale Railway internal endpoint, and PR Environments mapped to subdomains of the production domain consuming the 50-certs-per-week rate limit budget and blocking production cert renewal for the next 7+ days.

Railway plans don't bundle a WAF, so production hardening typically puts Cloudflare in front for DDoS protection and bot management. Cloudflare's default bot management treats Let's Encrypt's HTTP-01 validator (specific user agent, automated request pattern, one-off connection per challenge) as a bot. CF serves the validator a JavaScript challenge; the validator doesn't execute JS; LE's HTTP-01 challenge fails. Railway's cert provisioning silently retries every 6 hours. The Railway dashboard custom-domain panel shows the domain as Active because DNS resolves and the service responds — Railway only reports the cert state from its own provisioning side, not from what's actually served at the edge. Railway falls back to serving its catch-all cert (a generic *.up.railway.app cert that doesn't match the custom domain). Browsers reject it

A Railway agency hardens a production deployment by putting Cloudflare in front for DDoS + WAF + caching. The agency configures CF to proxy traffic to the Railway service, switches DNS to Cloudflare, and confirms the site is responding through CF. Two days later, the cert at the edge expires because Let's Encrypt couldn't reach Railway through Cloudflare to validate the HTTP-01 challenge. The Railway dashboard still shows the custom domain as Active

A Railway agency is wrapping up a production launch for a client e-commerce site. The infrastructure: Railway Service for the Node.js API, Railway Postgres for the database, Railway Redis for sessions. The agency adds Cloudflare in front for DDoS protection, WAF rules, and edge caching because Railway plans don't bundle any of these. The agency engineer switches the registrar nameservers to Cloudflare, configures Cloudflare to proxy traffic to the Railway service via a CNAME record (orange-cloud enabled, "Proxied" status), and confirms the production URL is responding through Cloudflare. The Railway dashboard custom-domain panel shows the domain as "Active" — Railway's side of the integration is healthy because DNS resolves to Cloudflare which forwards to Railway. Two days later, the previous cert that Railway provisioned during the initial pre-CF setup expires. Railway's automated renewal flow triggers: Let's Encrypt issues an HTTP-01 challenge to the Railway service. The HTTP-01 challenge request comes from Let's Encrypt's validator pool (specific IP ranges, user agent like "Let's Encrypt validation server"). The request hits Cloudflare first. Cloudflare's default bot management treats the validator request as automated traffic and serves a JavaScript challenge — the standard "Please verify you are human" interstitial. The validator doesn't execute JavaScript, so the HTTP-01 challenge fails. Railway logs the renewal failure internally and retries every 6 hours. The Railway dashboard shows the domain as Active throughout this — Railway doesn't flag the cert renewal failure on the user-facing dashboard. Railway falls back to serving its catch-all cert (a wildcard *.up.railway.app cert). Browsers receive a cert that doesn't match the custom domain hostname; cert-mismatch warnings begin. Cloudflare's Universal SSL is still terminating TLS at the edge for the customer, so end-users may not see the warning at all — but any service-to-service call that hits the origin directly (Stripe webhooks, GA4 server-side requests, partner API integrations) bypasses Cloudflare and hits Railway directly, getting the wrong cert. Webhook deliveries fail with TLS errors. The agency engineer eventually discovers the failure when Stripe disables the webhook endpoint after 14 days of delivery failures.

Railway runs on a Kubernetes-based platform where individual services are scheduled across underlying nodes. During platform maintenance windows (typical cadence: every 2-4 weeks for security patching, monthly for infrastructure upgrades), services migrate between nodes. The underlying ingress target for the service can rotate — Railway's internal DNS for the service points at the new node's ingress controller. Railway's public-facing service domain (<service>.up.railway.app) stays the same because Railway's edge handles the routing. But the agency's long-lived CNAME at the registrar pointing at the Railway service for a custom domain stays pinned to the previous target if the agency originally set it up as a direct A record (bypassing Railway's recommendation to use CNAME) or as a CNAME pointing at a specific Railway internal endpoint (some older Railway integration guides recommended this pattern)

A Railway agency manages a client portfolio where the original Railway integration was set up 18 months ago by a senior engineer who has since left. The CNAME at the registrar points at a Railway internal endpoint that was the recommended target at the time. Railway has since rotated the underlying ingress during a platform maintenance window. The CNAME is now pinned to a target that Railway no longer serves the service from. Railway serves the catch-all cert from the new ingress; the cert keeps validating for the old target (the cert resource is still issued); browsers receive a cert that doesn't match the custom domain hostname

A Railway agency was founded in 2022 and operates 15+ long-running client services on Railway. The agency's lead Railway engineer left 18 months ago. The agency's current Railway lead picked up the client portfolio and runs renewals from the standard Railway dashboard. For one of the clients (a B2B SaaS for HR teams), the original Railway integration was set up before Railway's current custom-domain documentation existed; the senior engineer at the time configured the CNAME at the registrar to point at a specific Railway internal endpoint that was the recommended target in 2023. Railway has since updated its custom-domain integration documentation to recommend CNAMEs pointing at the service's public domain (<service>.up.railway.app) which is stable across infrastructure changes. The agency's CNAME is still pinned to the 2023-era internal endpoint. Railway's platform team rotates the underlying ingress for the service during a routine maintenance window. The new ingress assigns a different internal endpoint. Railway's public service domain continues to work because Railway's edge handles the routing. But the agency's long-lived CNAME at the registrar still points at the old internal endpoint. When users hit the custom domain, the DNS resolves to the old endpoint; that endpoint is now serving as a Railway catch-all (the new ingress is what actually hosts the service); the catch-all serves the *.up.railway.app cert. Browsers throw cert-mismatch warnings. The Railway dashboard shows the cert for the custom domain as valid (the cert resource is still issued and Railway's side considers the integration healthy). Discovery requires understanding that Railway's internal endpoints are not stable across maintenance windows and that the CNAME setup pattern recommended in 2023 is no longer the recommended pattern in 2026. The current agency Railway lead doesn't know this because the original setup was tribal knowledge that left with the senior engineer 18 months ago.

Railway PR Environments create a per-PR deployment with its own custom domain. Agencies often map PR Environments to subdomains of the production domain (e.g., pr-123.example.com, pr-456.example.com) for stakeholder review. Each PR Environment provisions its own Let's Encrypt cert. With 50+ concurrent PRs in a busy sprint, the registered domain hits Let's Encrypt's 50-certs-per-week-per-registered-domain rate limit. Production cert renewal then silently fails for the next 7+ days until the rate limit window resets. Railway's dashboard reports the failed renewal as a cert provisioning error in the per-service log — but the production cert is still valid for another 60-89 days, so the alert doesn't escalate. Three weeks later, when the production cert hits expiry and Railway's automated renewal tries again, the rate limit window has reset but the cert provisioning may fail again if PR activity hasn't slowed down. Production HTTPS breaks

A Railway agency runs a busy delivery sprint with 60+ concurrent PR Environments mapped to subdomains of the production domain. Let's Encrypt's 50-certs-per-week-per-registered-domain rate limit is hit on Tuesday of week 1. Production cert renewal silently fails when it triggers two weeks later; the production cert is still valid for another 30 days; the rate limit window resets four days before production cert expiry. Railway's automated renewal retries — but the next sprint has just opened 20 new PR Environments and the rate limit is hit again. Production cert expires; HTTPS breaks

A Railway agency manages a client product with a high-velocity engineering team. Every PR opens a Railway PR Environment with a custom subdomain under the production domain — e.g., pr-1834.example.com, pr-1835.example.com. Each PR Environment provisions its own Let's Encrypt cert via HTTP-01 challenge. During a release-prep sprint, the team opens 60+ PRs across the week. Each PR provisions a cert. The 50-certs-per-week-per-registered-domain rate limit for example.com is hit on Tuesday. Subsequent PR Environments fail to provision their certs and the Railway dashboard shows the affected PR Environments with a cert provisioning error (these are visible if the engineer clicks into the per-PR environment log, but the agency-wide dashboard view doesn't aggregate cert errors). Two weeks later, the production cert for example.com hits its 60-day pre-renewal threshold. Railway's automated renewal tries to provision a new cert. The rate limit window for example.com has reset (it's a sliding 7-day window from the rate limit hit), but the next sprint has just opened 20 new PR Environments and consumed 20 of the available 50 certs already this week. Railway's renewal request for example.com is rejected by Let's Encrypt. Railway retries every 6 hours; each retry fails because the rate limit budget is being consumed by new PR Environment provisioning. The production cert continues serving (it's still valid for ~30 days). Three weeks pass; the agency engineer doesn't notice the production cert renewal failures in the dashboard because the per-service log entries are visually similar to the PR Environment provisioning errors. The production cert hits expiry. Railway's last-ditch renewal attempt fails. Railway falls back to the catch-all cert. Production HTTPS breaks for end-users; the agency only discovers it when customer-success starts forwarding browser-warning screenshots from the client.

How it works

SSL and DNS monitoring for Railway agencies across Cloudflare-blocked Let's Encrypt validators, Railway ingress rotations during platform maintenance, and PR Environment rate-limit pressure consuming the production domain's renewal budget.

Merlonix monitors SSL expiry and DNS integrity across every Railway-attached subdomain — production custom domains (apex, app.*, api.*), PR Environment subdomains mapped under the production domain, and the underlying Railway service domain that the agency's CNAME at the registrar points at — and catches Cloudflare-in-front bot management blocking Let's Encrypt's HTTP-01 validator (Railway serves the catch-all cert while the dashboard reports the domain as Active), Railway ingress rotations during platform maintenance breaking the agency's long-lived registrar CNAME, and production cert renewal silently failing because PR Environments under the production domain have consumed the Let's Encrypt rate-limit budget — before clients see browser warnings.

Add every Railway-attached custom domain — apex, app., api., plus PR Environment subdomains — with DNS TXT verification that catches Cloudflare-blocked Let's Encrypt validators and Railway ingress rotations

Verify ownership with a DNS TXT record on the apex domain. All subdomains under that apex — app.* (production), api.* (production), pr-*.* (PR Environments if mapped under the production domain) — are added without additional verification. Monitoring every Railway-attached subdomain catches the Cloudflare-blocked LE validator pattern (the served cert is Railway's catch-all *.up.railway.app cert, not the LE cert that the Railway dashboard claims is provisioned) and the ingress rotation pattern (Railway's public service domain still resolves correctly while the agency's CNAME at the registrar pinned to an internal endpoint is now stale). Under two minutes per client.

CNAME monitoring across registrar nameserver changes, Railway service redeployments, and PR Environment provisioning — surfacing the Railway ingress rotations and rate-limit-blocked renewals that Railway's dashboard does not flag

Three independent DNS resolvers check every CNAME delegation on every monitoring interval. When Railway rotates an underlying ingress during a platform maintenance window, the drift from the agency's long-lived registrar CNAME (often pinned to an older internal endpoint from a 2023-era integration pattern) is surfaced immediately. When Cloudflare's proxy mode strips an apex DNS record or changes the resolution chain, the change is detected within the check interval. The Railway dashboard may report the domain as Active for hours or days while the served cert is wrong; Merlonix monitors the served cert directly, not Railway's cert resource state, so the failure surfaces in the first check cycle rather than waiting for a customer browser-warning report.

SSL monitoring 30 days before expiry across production custom domains and PR Environment subdomains — catching production renewals silently blocked by Let's Encrypt rate limits consumed by PR Environment provisioning

Full SSL chain validation on every Railway-attached subdomain — apex, app.*, api.*, plus PR Environment subdomains under the production domain. Independent checks catch the Let's Encrypt rate-limit pattern (production cert renewal silently failing because PR Environments under the production domain have consumed the 50-certs-per-week budget) 30 days before production cert expiry — well before Railway's automated retry runs out of attempts. Each PR Environment subdomain gets the same 30-day pre-expiry alert as the production custom domain, so the agency can see the rate-limit pressure building before it blocks production renewal.

Vendor status for Railway (platform), Railway Postgres, Railway Redis, plus Cloudflare and Let's Encrypt to distinguish Railway platform incidents from per-tenant SSL configuration failures and from upstream rate-limit windows

Merlonix monitors Railway's status page alongside client SSL and DNS. When a Railway platform incident causes cert provisioning failures across multiple client tenants simultaneously, you see the vendor event — not a cluster of individual SSL alerts that each require separate investigation. When Let's Encrypt reports a rate-limit issue at a wider scope (e.g., LE's renewal API degradation), that's also visible alongside the cert state per-tenant — so you can distinguish "Railway-side cert provisioning broken because of Cloudflare-in-front" from "all Railway tenants experiencing renewal failures because of an LE outage" without spending an hour on root-cause investigation.

What the numbers mean for Railway agencies

Monitoring built for Railway agencies where one client portfolio means a Railway Service with a production custom domain, Cloudflare-in-front for DDoS + WAF + caching that can silently block Let's Encrypt's HTTP-01 validator, and PR Environments mapped to subdomains of the production domain that consume the Let's Encrypt rate-limit budget for every PR opened in the sprint.

Railway agencies managing Railway Services with custom domains, PR Environments with per-PR cert provisioning, and Cloudflare-in-front-of-Railway need monitoring that covers every Railway-attached subdomain — because the Cloudflare- blocked LE validator failure is silent (Railway's dashboard shows the domain as Active because DNS resolves and the service responds, but Railway is serving its catch-all cert), the ingress rotation failure is silent (the agency's long-lived registrar CNAME continues to resolve to an internal endpoint that Railway no longer serves the service from), and the rate-limit-blocked renewal is silent (the production cert continues serving for another 30+ days while Railway's automated renewal silently fails).

< 10 min

Time from DNS change to alert — catches Railway ingress rotations during platform maintenance windows where the agency's long-lived registrar CNAME stays pinned at the old target, Cloudflare proxy-mode changes stripping apex records, and PR Environment subdomains failing to provision certs because the production-domain rate-limit budget is exhausted

30 days

SSL expiry warning lead time — enough time to identify Let's Encrypt rate-limit pressure building from PR Environment provisioning under the production domain (so the agency can throttle PR Environment cert provisioning before production renewal is blocked), a Cloudflare-in-front bot-management rule blocking Let's Encrypt's HTTP-01 validator (so the agency can add a page rule bypassing the WAF for /.well-known/acme-challenge/* before the current cert expires), or a Railway ingress rotation breaking the registrar CNAME — and correct it before clients see browser warnings

11 vendors

Upstream services monitored — Railway platform status, Railway Postgres, Railway Redis, Cloudflare proxy/WAF status, and Let's Encrypt API status included to distinguish Railway platform incidents from per-tenant SSL configuration failures and from upstream rate-limit windows that affect every Railway tenant simultaneously

200 assets

Maximum monitored domains on the Agency plan — covers Railway production custom domains across every client portfolio plus PR Environment subdomains under the production domain (where the per-PR cert provisioning consumes the production-domain rate-limit budget). At 50+ concurrent PRs per client during release sprints, the asset count adds up fast — the Agency plan absorbs it without per-domain fees

Pricing

Flat monthly fee. Every Railway service, every PR Environment, every Cloudflare-in-front layer included.

No per-environment charges. No per-PR fees. Pick the tier that fits your Railway portfolio and PR Environment cadence and monitor every production custom domain and every PR Environment subdomain under it without billing surprises.

See full feature comparison →

Starter

For individual Railway developers managing a small client portfolio with a single production custom domain and a handful of PR Environments per sprint.

$29/ month

10 monitored assets
1 seat
15-min check cadence
SSL + DNS + vendor monitoring
Email + Slack alerts

Start with Starter

Most chosen

Team

For Railway agencies managing multiple client services with Cloudflare-in-front, where PR Environment subdomains under each client's production domain need to be monitored alongside the production custom domain.

$79/ month

50 monitored assets
5 seats
10-min check cadence
SSL + DNS + vendor monitoring
Email + Slack alerts

Start with Team

Agency

For agencies with a full Railway client roster running release-cadence engineering teams with 50+ concurrent PR Environments per client, plus production custom domains across every client's apex and subdomains.

$199/ month

200 monitored assets
15 seats
5-min check cadence
SSL + DNS + vendor monitoring
Email + Slack alerts

Start with Agency

Know when Cloudflare is silently blocking Railway's Let's Encrypt validator — 30 days before the cert expires, not 14 days into Stripe webhook deliveries failing.

Add your first Railway client domain in under two minutes. Production custom domains, PR Environment subdomains, and the underlying Railway service domain are monitored from the same dashboard. 14-day trial, no card required.

Start free for 14 days Read the agency SSL checklist →

Cloudflare-in-front of Railway silently blocks Let's Encrypt's HTTP-01 validator.Railway dashboard shows the domain as Active. Railway is serving its catch-all cert. Browsers reject it.