cert-manager hits Let's Encrypt rate limits during a Friday-night staging redeploy. Renewal silently fails for the rest of the week.
kubectl describe certificate shows Ready=True. The Ingress serves the self-signed default cert.
Kubernetes agencies running cert-manager with multiple Ingress controllers (nginx-ingress, traefik, AWS ALB, GKE Ingress, Istio Gateway) deal with Issuer vs ClusterIssuer scope confusion that hits Let's Encrypt's 50-certs-per-week-per-domain rate limit during staging redeploys, Ingress controller annotation drift where cert-manager annotations from one controller don't work on another (the Ingress accepts the manifest but cert provisioning silently fails), and Secret resource lifecycle issues where GitOps strict-pruning (ArgoCD, Flux) deletes cert-manager-created Secrets during reconcile cycles. Merlonix monitors the served cert directly so silent cert-manager failures surface at the Ingress endpoint where users actually hit.
No credit card for the trial. Cancel any time.
- Check cadence (Agency)
- 5 min
- SSL pre-expiry alert
- 30 days
- Independent DNS resolvers
- 3
- Vendors watched
- 11
Where Kubernetes agencies get caught out
Three failure modes specific to Kubernetes deployments with cert-manager: Issuer vs ClusterIssuer scope hitting Let's Encrypt rate limits during staging redeploys, Ingress controller annotation drift across nginx / traefik / AWS ALB / GKE Ingress, and GitOps strict-pruning deleting cert-manager-created Secrets during reconcile cycles.
Kubernetes agencies deal with cert-manager Issuer vs ClusterIssuer scope confusion that hits Let's Encrypt's 50-certs-per-week-per-domain rate limit during high-frequency staging redeploys (and silently blocks production renewals two weeks later when the rate limit window collides), Ingress controller annotation drift where annotations from one controller (nginx, traefik, AWS ALB, GKE Ingress, Istio Gateway) silently fail when copy-pasted across controller types (the Ingress accepts the manifest but cert provisioning silently fails), and Secret resource lifecycle issues where GitOps strict-pruning (ArgoCD --prune=true, Flux prune: true) deletes cert-manager-created Secrets during reconcile cycles and the loop exhausts Let's Encrypt rate limits within hours.
cert-manager supports two scopes: Issuer (namespace-scoped) and ClusterIssuer (cluster-wide). Agencies setting up a new cluster typically configure a single ClusterIssuer pointing at Let's Encrypt prod and reference it from every namespace's Certificate resources. The staging and production environments share the same domain pattern (staging.example.com, example.com) and the same ClusterIssuer. Staging redeploys recreate Certificate resources frequently (every PR merge, every feature-branch test) — each recreation requests a new cert from Let's Encrypt. The Let's Encrypt rate limit (50 certs per registered domain per week) gets exhausted on Friday afternoon. cert-manager retries every 10 minutes; every retry fails with "rate limited". The cert-manager logs note the failure; the Certificate resource shows Ready=True from the last successful issuance. The production Ingress still serves the existing cert (it's still valid). Production renewal cycle hits the rate limit on Tuesday; production cert expires by the following week
A Kubernetes agency deploys a new client app to a managed GKE cluster. cert-manager is installed via Helm with a single ClusterIssuer pointing at Let's Encrypt prod. Staging is on staging.example.com; production is on example.com. The agency's CI pipeline redeploys staging on every PR merge — typical for an agency running 5 active client projects in parallel. Each staging redeploy recreates the Certificate resource (the GitOps reconciler treats it as part of the application manifest), which requests a new cert. The rate limit is hit at the end of week 2; production renewal happens 30 days later and silently fails for the same rate-limit reason
A Kubernetes agency operates a managed GKE platform across 6 client projects. The cluster has cert-manager installed via the cert-manager Helm chart with a single ClusterIssuer pointing at Let's Encrypt prod (acme-v02.api.letsencrypt.org/directory). Each client project has its own namespace; each namespace has Certificate resources for the production hostname and the staging hostname. The agency's CI pipeline (GitHub Actions deploying via Argo CD) redeploys staging on every PR merge — the GitOps reconciler treats the Certificate resource as part of the application manifest, so a PR merge that touches anything in the namespace triggers a Certificate resource recreation. Each recreation requests a new cert from Let's Encrypt. With 6 active clients merging an average of 8 PRs per week, the agency requests 48+ certs per week against the example.com root domain. Let's Encrypt's rate limit (50 certs per registered domain per week) is hit on Friday afternoon. cert-manager retries every 10 minutes for the remaining 2 days of the week; every retry fails with "rate limit exceeded for example.com". cert-manager logs the failure; the Certificate resource shows the LAST successful issuance state (Ready=True from Monday), not the current renewal failure. The cluster operator runs `kubectl get certificate -A` and sees everything as Ready=True; nobody checks `kubectl describe certificate` or cert-manager pod logs unless something is visibly broken. Two weeks later, production cert renewal fires — staging recreations the previous week have exhausted the rate limit window. Production renewal fails with the same rate-limit error. cert-manager retries every 10 minutes; every retry fails. The production cert expires 30 days later. The agency's discovery happens via a customer report — the cluster operator checks cert-manager logs, sees the rate-limit history, traces back to staging redeploys as the source.
cert-manager works with multiple Ingress controllers but each controller has its own annotation schema and validation rules. The cert-manager.io/cluster-issuer annotation works on all of them (it's a cert-manager annotation, not a controller annotation). But controller-specific annotations differ: nginx-ingress uses nginx.ingress.kubernetes.io/* prefixes, traefik uses traefik.ingress.kubernetes.io/* prefixes, AWS ALB uses alb.ingress.kubernetes.io/*, GKE Ingress uses ingress.gce.io/*. Agencies running multiple controllers (nginx for one app, traefik for another, ALB for AWS-native ones) copy-paste annotations across manifests. The Ingress accepts the manifest (controller-specific annotations are validated leniently — unknown annotations are silently ignored), but cert provisioning silently fails because the cert-manager / controller integration is broken at a different layer
A Kubernetes agency manages a mixed-stack cluster: one client app on nginx-ingress, another on traefik, and a third (AWS-native) on AWS ALB Ingress Controller. An engineer copies an Ingress manifest from the nginx-ingress app to bootstrap a new traefik-fronted app, edits the relevant fields, and applies. The Ingress accepts the manifest; the kubectl describe ingress output looks normal. But the nginx-specific cert-manager annotation (e.g., nginx.ingress.kubernetes.io/ssl-redirect) is silently ignored by traefik; cert-manager doesn't trigger cert provisioning because the cert-manager / traefik integration expects different annotations. The Ingress falls back to the controller's default cert (a self-signed). The team's observability stack reports the Ingress is healthy (the Ingress IS responding); the served cert is the default
A Kubernetes agency operates a cluster with three different Ingress controllers (nginx-ingress for legacy apps, traefik for newer apps, AWS ALB Ingress Controller for AWS-native ones). The engineer who set up the cluster left the agency 6 months ago; the current team relies on copy-paste patterns from existing manifests. A new client onboarding requires a new Ingress for their app — the engineer takes the manifest pattern from a working nginx-ingress app (which has the cert-manager.io/cluster-issuer annotation plus several nginx-specific annotations like nginx.ingress.kubernetes.io/ssl-redirect, nginx.ingress.kubernetes.io/force-ssl-redirect, nginx.ingress.kubernetes.io/proxy-body-size), changes the relevant fields (host, backend service, namespace), and applies. The Ingress is being created in a namespace that's configured for traefik. Traefik accepts the manifest — it silently ignores the nginx-specific annotations (unknown annotations are valid in Kubernetes). cert-manager sees the cert-manager.io/cluster-issuer annotation and tries to provision a cert, but the cert-manager integration with traefik expects a specific annotation pattern (traefik.ingress.kubernetes.io/router.tls.certresolver) that the engineer didn't include because the source manifest didn't have it. cert-manager creates the Certificate resource but the traefik IngressRoute (the CRD traefik actually uses) doesn't reference the resulting Secret. Traefik continues serving the default cert (a self-signed bundled with the controller). The agency's monitoring stack reports the Ingress as healthy (HTTP 200 on /). The HTTPS endpoint serves a self-signed cert that browsers reject; the client's end-users see browser warnings; the client reports to the agency. Discovery requires understanding the controller-specific annotation differences and how cert-manager integrates with each controller's native CRD pattern.
cert-manager stores issued certs in Kubernetes Secret resources in the same namespace as the Certificate. The Secret is created by cert-manager AFTER successful issuance — it's not part of the Git repo. When agencies use GitOps tooling with strict pruning enabled (ArgoCD with --prune=true and --self-heal=true, Flux with prune: true), the reconciler can identify the cert-manager-created Secret as "drift from the desired state" (because it's not in the Git source) and delete it during the next reconcile cycle. cert-manager detects the missing Secret and re-creates it (re-issuing the cert from Let's Encrypt). The Ingress experiences a 30-second window where it falls back to the controller's default cert. Compounded across multiple namespaces and frequent reconcile cycles, the agency exhausts Let's Encrypt rate limits
A Kubernetes agency adopts ArgoCD for GitOps deployment with strict pruning enabled (the recommended best practice for ensuring cluster state matches Git). cert-manager creates Secrets with names like example-com-tls. ArgoCD's reconciler sees the Secret in the cluster but not in Git (because cert-manager creates it dynamically); the reconciler treats it as "extra" and prunes it during the next reconcile. cert-manager observes the Secret is gone and re-issues the cert from Let's Encrypt. The cycle repeats every 3 minutes (the default ArgoCD reconcile interval). After 17 cycles in one hour the rate limit is hit
A Kubernetes agency manages 8 client projects with ArgoCD. The engineering lead enables strict pruning (--prune=true --self-heal=true) for all ArgoCD Applications because the docs recommend it as best practice and the agency's SOC 2 audit asked for "deterministic cluster state matching Git." Each client project has a few Certificate resources in their namespace — cert-manager creates corresponding Secret resources (e.g., client-com-tls) after issuance. These Secrets are NOT in the Git repo; they're a runtime side-effect of cert-manager processing the Certificate resource. ArgoCD's reconciler runs every 3 minutes (the default refresh interval). Each reconcile cycle: ArgoCD compares cluster state to Git, sees the client-com-tls Secret in the cluster but not in Git, identifies it as "drift", and prunes it. cert-manager (which has a watcher on the Certificate resource) sees the Secret has been deleted, marks the Certificate as Issuing again, and requests a new cert from Let's Encrypt. Let's Encrypt issues the cert (within rate limit), cert-manager creates the new Secret. 3 minutes later, ArgoCD reconciles again and prunes the new Secret. This loops every 3 minutes — 20 prune-and-reissue cycles per hour. The Let's Encrypt 50-certs-per-domain-per-week rate limit is exhausted by hour 3. Subsequent re-issuance attempts fail. The Ingress continues serving the controller's default cert (a self-signed); browsers reject it. Discovery happens within an hour because the failure surface is wide — every namespace using cert-manager + ArgoCD strict pruning is affected simultaneously. The fix is to add the cert-manager-created Secrets to the ArgoCD Application's ignoreDifferences list or to disable pruning for those resources. But for the rest of the week, all cert renewals across the cluster are blocked by the exhausted rate limit.
How it works
SSL and DNS monitoring for Kubernetes agencies across cert-manager Issuer/ClusterIssuer scope hitting Let's Encrypt rate limits, Ingress controller annotation drift across nginx / traefik / AWS ALB / GKE Ingress, and GitOps strict-pruning deleting cert-manager-created Secrets during reconcile.
Merlonix monitors SSL expiry and DNS integrity across every Ingress endpoint — app.* and api.* (cert-manager-managed), plus per-environment staging endpoints (where high-frequency redeploys can exhaust Let's Encrypt rate limits) — and catches cert-manager rate-limit failures (the Ingress serves the controller's default cert when re-issuance is blocked), Ingress controller annotation drift (cert-manager creates the Certificate resource but the controller-specific Ingress binding silently fails), and GitOps strict-pruning lifecycle issues (the Secret cert-manager creates gets pruned every reconcile cycle, exhausting rate limits within hours) — before clients see browser warnings.
01
Add Kubernetes Ingress endpoints — apex, www.*, app.*, api.* — with DNS TXT verification that catches cert-manager rate-limit failures, Ingress controller annotation drift, and GitOps strict-pruning lifecycle issues
Verify ownership with a DNS TXT record on the apex domain. All subdomains under that apex — app.* (cert-manager-managed), api.* (cert-manager-managed), plus any per-environment staging endpoints — are added without additional verification. Monitoring every Ingress-attached subdomain catches the cert-manager rate-limit failures (the Ingress serves the controller's default cert when cert-manager can't re-issue), the annotation drift across controllers (the Ingress accepts the manifest but the served cert is the default), and the GitOps strict-pruning loops (the served cert flips between Let's Encrypt-issued and self-signed-default in 3-minute cycles). Under two minutes per client.
02
CNAME monitoring across Ingress controller switches, namespace migrations, and Service ExternalName changes — surfacing the cert provisioning gaps that controller-specific annotation drift creates
Three independent DNS resolvers check every CNAME delegation on every monitoring interval. When a client app migrates from nginx-ingress to traefik (or vice versa) and the agency engineer copy-pastes controller-specific annotations that the new controller silently ignores, the cert provisioning failure surfaces immediately as a served-cert-mismatch alert — well before the Ingress is exposed to user traffic. The Ingress accepts the manifest, kubectl describe ingress looks normal, kubectl get certificate may even show Ready=True from an earlier provisioning attempt — Merlonix watches the served cert at the Ingress endpoint, not the Certificate resource state, so silent failures surface immediately.
03
SSL monitoring 30 days before expiry across every Ingress endpoint — including independent checks per-Ingress-controller so cert-manager rate-limit collisions and Secret lifecycle issues surface at the served-cert layer, not at the Certificate resource layer
Full SSL chain validation on every Ingress-attached subdomain — apex, app.*, api.* — across whichever Ingress controller is fronting it (nginx-ingress, traefik, AWS ALB, GKE Ingress, Istio Gateway). Each Ingress endpoint gets the same 30-day pre-expiry alert. When cert-manager hits Let's Encrypt rate limits (a Friday staging redeploy collision), the existing cert continues to serve until expiry — Merlonix surfaces the renewal-failure state 30 days before expiry rather than waiting for the expiry event. When GitOps strict-pruning is deleting Secrets in a loop, the served-cert oscillation between Let's Encrypt-issued and self-signed-default is caught in the first check cycle.
04
Vendor status for major Kubernetes platforms (GKE, EKS, AKS), Let's Encrypt, plus DNS providers used at the registrar to distinguish managed-K8s platform incidents from per-tenant cert-manager configuration failures
Merlonix monitors managed-K8s status alongside client SSL and DNS. When a GKE regional incident affects cert-manager operation across multiple client clusters simultaneously, you see the vendor event — not a cluster of individual SSL alerts that each require separate investigation to determine whether the root cause is a managed-K8s platform incident, a cert-manager rate-limit exhaustion from a staging redeploy cascade, an Ingress controller annotation mistake from a recent app migration, or GitOps strict-pruning eating Secrets in a loop.
What the numbers mean for Kubernetes agencies
Monitoring built for Kubernetes agencies where one client portfolio means cert-manager managing certs across multiple namespaces, multiple Ingress controllers each with different annotation schemas, and GitOps reconcilers operating on the Secret resources that cert-manager creates dynamically — each with independent failure modes.
Kubernetes agencies managing cert-manager-driven Ingress portfolios across mixed nginx-ingress / traefik / AWS ALB / GKE Ingress clusters, with GitOps tools like ArgoCD or Flux enforcing strict cluster-state-matches-Git invariants on resources cert-manager creates dynamically (Secrets), need monitoring that watches the served cert directly at every Ingress endpoint — because cert-manager rate-limit exhaustion is silent (the existing cert keeps serving until expiry), Ingress controller annotation drift is silent (the Ingress accepts the manifest, kubectl get cert reports Ready=True from an earlier provisioning, the controller serves a self-signed default), and GitOps strict-pruning loops are silent (the Secret gets recreated faster than users notice, but the Let's Encrypt rate limit dies fast).
< 10 min
Time from DNS change to alert — catches cluster-level Ingress migrations (nginx-ingress to traefik) leaving cert-manager unable to bind the issued cert to the new controller, namespace deletions removing the cert-manager Certificate resource entirely, and Service ExternalName changes pointing the Ingress at a backend that doesn't terminate TLS
30 days
SSL expiry warning lead time — enough time to identify a cert-manager rate-limit exhaustion (renewal has been failing every 10 minutes for two weeks; the existing cert is still valid but the next renewal won't succeed until the rate limit window resets), an Ingress controller annotation drift (the served cert is the controller default, not the cert-manager-issued one), or a GitOps strict-pruning loop (the served cert oscillates) — and correct it before the existing cert expires
11 vendors
Upstream services monitored — GKE / EKS / AKS managed Kubernetes platforms, Let's Encrypt status (rate-limit windows + acme-v02.api.letsencrypt.org service health), plus DNS providers used at the registrar included to distinguish managed-K8s platform incidents from per-tenant cert-manager configuration failures
200 assets
Maximum monitored domains on the Agency plan — covers app.* and api.* across multiple namespaces and Ingress controllers, plus per-environment staging endpoints, across a full Kubernetes client portfolio
Pricing
Flat monthly fee. Every namespace, every Ingress controller, every Certificate resource included.
No per-namespace charges. No per-cluster fees. Pick the tier that fits your Kubernetes portfolio and per-Ingress check count and monitor every Ingress endpoint without billing surprises.
Starter
For individual Kubernetes developers managing a small client portfolio with single-cluster cert-manager deployments.
$29/ month
- 10 monitored assets
- 1 seat
- 15-min check cadence
- SSL + DNS + vendor monitoring
- Email + Slack alerts
Team
For Kubernetes agencies managing cert-manager across multiple namespaces and Ingress controllers where annotation drift between controllers is a real risk.
$79/ month
- 50 monitored assets
- 5 seats
- 10-min check cadence
- SSL + DNS + vendor monitoring
- Email + Slack alerts
Agency
For agencies with a full Kubernetes client roster including GitOps tooling (ArgoCD, Flux) where strict-pruning on dynamic Secret resources can exhaust Let's Encrypt rate limits within hours, plus mixed nginx / traefik / AWS ALB / GKE Ingress clusters.
$199/ month
- 200 monitored assets
- 15 seats
- 5-min check cadence
- SSL + DNS + vendor monitoring
- Email + Slack alerts
Know when cert-manager has been failing renewals every 10 minutes for two weeks — 30 days before the existing cert expires.
Add your first Kubernetes client Ingress in under two minutes. cert-manager-managed certs across nginx-ingress, traefik, AWS ALB, and GKE Ingress are monitored from the same dashboard. 14-day trial, no card required.