Until now I have mostly dealt with certificates for a specific host – and there is usually nothing wrong with doing it this way. There are easy to use tools for Let’s Encrypt which automate the retrieval of a certificate for a (sub-) domain. On a docker-compose based, (mostly) manually managed stack you would usually use something like certbot, maybe with an ingress like nginx. When moving to Kubernetes, there’s the cert-manager stack which is of great help automating and integrating certificate challenges for Let’s Encrypt. This post deals with wildcard certificates, DNS challenges and Traefik in Kubernetes and how to achieve this with an example. In my case I based this on K3s in version v1.21.0+k3s1 which comes with Traefik in version 2.4.8. One way to deal with certificates in this environment is to leverage a HTTP challenge with cert-manager, which retrieves a (sub-) domain-specific certificate by deploying an ephemeral endpoint for this specific domain, which is challenged by Let’s Encrypt to verify the owner of said domain.
Challenges and (security) issues
This approach is easy, well-documented and usually good enough. However there are some challenges as well. As it’s a HTTP challenge, a global ingress redirect would break the communication: if Traefik (or another ingress controller) is configured to redirect all HTTP traffic to HTTPS, a challenge which relies on HTTP would be redirected as well and break the necessary verification. This can be solved by using a HTTPS challenge or only redirecting HTTP on specific ingress routes with a Traefik middleware, a topic which is described in more detail in my most recent post. However a real (security) issue may be the transparency of created certificates. This does not apply to a base domain, but the existence of sub-domains will be exposed and may open a possibility for attack. The existing certificates for a domain with all subdomains can be queried with crt.sh.
The DNS challenge
A wildcard DNS challenge with cert-manager will solve the transparency issue to serve certificates with Traefik in Kubernetes. The challenge will not be answered by creating an endpoint on the system behind the domain (as it is done for a HTTP / HTTPS challenge) but by creating a DNS entry which then can be challenged. For this reason a tool which creates a DNS challenge needs access to the DNS records. When using cert-manager in Kubernetes, it needs a token for the DNS provider to create this entry. The cert-manager tutorial on DNS validation describes this in more detail. For this reason I switched to Cloudflare as a DNS provider from Digital Ocean: Digital Ocean only allows API keys (the needed token) for the entire account, even though I only want to grant access to DNS records. The linked tutorial above contains the necessary configuration to retrieve the certificate, however it may be confusing for people with less experience in Traefik and Kubernetes how to use this certificate.
Setup with Cloudflare and Traefik
To use the cert-manager DNS challenge with Cloudflare you’ll have to set up the API token with the necessary permissions. The documentation references the necessary permissions for this. This API token will then be applied to Kubernetes as a secret resource.
apiVersion: v1 kind: Secret metadata: name: cloudflare-api-token-secret namespace: cert-manager type: Opaque stringData: api-token: <myapitoken>
To issue a certificate an Issuer resource is needed. This can be a regular Issuer, which is namespace scoped or a ClusterIssuer which can be used cluster-wide. I decided to use a ClusterIssuer. Notice that an e-mail which is used by Let’s Encrypt to notify you of an ending certificate is needed. You also need to provide your Cloudflare user, which is the e-mail you use for login to Cloudflare. The password (the token) is referenced from the just created secret.
apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-staging namespace: cert-manager spec: acme: email: <my-email-for-letsencrypt-notifications> #server: https://acme-staging-v02.api.letsencrypt.org/directory server: https://acme-v02.api.letsencrypt.org/directory privateKeySecretRef: name: issuer-letsencrypt-staging solvers: - dns01: cloudflare: apiTokenSecretRef: key: api-token name: cloudflare-api-token-secret email: <my-cloudflare-login-email>
The ClusterIssuer is now set up as a “client” to retrieve the certificate. The only thing missing now is the certificate request. This can be applied as a Certificate resource. The Certificate references the ClusterIssuer which is to be used. In my example both resources are created for the staging environment of Let’s Encrypt. Line 9 and 10 include the endpoints, but it is advisable to create different resources for each environment (staging + production). In the Certificate resource below there are 2 dnsNames. We request a certificate for the base domain (example.com) as well as a wildcard certificate for any subdomain (*.example.com).
apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: wildcard-cert namespace: kube-system spec: # Secret names are always required. secretName: wildcard-secret # At least one of a DNS Name, URI, or IP address is required. dnsNames: - "*.example.com" - example.com issuerRef: kind: ClusterIssuer name: letsencrypt-staging
Of course you’ll need to configure the domain in your DNS provider to route traffic for all subdomains to the specific server. If the IP is identical to the base-domain IP, you may just add a new CNAME entry for *.example.com which points to example.com.
The DNS challenge order may take up to a few minutes to complete. To check the status you can always describe each resource to identify issues with the Certificate, Order or CertificateRequest. If the order is completed there will be a new TLS secret by the name specified, in the example above: wildcard-secret. This secret has to be in the namespace Traefik resides, so it can be picked up by Traefik. For this reason I put the generated Certificate resource in the kube-system namespace, which also contains Traefik.
Now you do not want Traefik or the ingress to deliver a specific certificate for each subdomain. Delete any tls-part in the ingress for each service, as it is not needed anymore. When no additional tls properties are specified in the ingress resource, Traefik will serve a self-signed default certificate to each ingress. The last step is now to have Traefik serve the created wildcard certificate instead of the self-signed certificate. This can be overwritten by creating a TLSStore resource with the name default (I’m quite sure it has to be called default, as it will not be picked up by default otherwise) and reference the created secret as the defaultCertificate.
apiVersion: traefik.containo.us/v1alpha1 kind: TLSStore metadata: name: default namespace: kube-system spec: defaultCertificate: secretName: wildcard-secret
Now the Traefik pods have to be restarted for it to pick up the new default certificate. If you access example.com, the certificate should be created for example.com, and for any subdomain (e.g. test.example.com) it should be serving the wildcard certificate: *.example.com. After some playing around, this seems to me to be the easiest way to get wildcard certificates, DNS challenges and Traefik in Kubernetes.