all that jazz

james' blog about scala and all that jazz

Securing Akka cluster communication in Kubernetes

A feature of Akka that we’ve been using in production for some time now but haven’t made a big deal about is Akka remoting’s support for using mTLS certificates that are frequently rotated. This support is designed to work with cert-manager and other Kubernetes based secret providers with an absolute minimum of configuration. All you need is two lines of configuration in Akka’s configuration file, and you’re ready to go on the Akka side.

This feature is important for secure Akka deployments. Akka clusters use a proprietary protocol to communicate with each other. This protocol by default contains no authentication or encryption, and so to prevent malicious hosts from joining your Akka clusters or eavesdropping on your Akka cluster communication, you need to use mTLS to secure the communication.

In a Kubernetes environment, many people turn to a service mesh such as Istio, Linkerd or Consul to authenticate and encrypt their network communications, however unfortunately this is not an option for Akka cluster communication. The goal of a service mesh is to ensure that services do not need to be aware of where and how the services they talk to are deployed, and so service meshes hide this, so the service thinks its only talking to one logical service, while the service mesh handles concerns such as load balancing, encryption, authentication and authorization, canary and A/B deploys, and so on. Akka clusters however need to understand how and where they are deployed, in order to implement their stateful features such as sharding, replication and P2P messaging. So, when deploying a service that uses Akka clustering to a service mesh, the Akka cluster communication must bypass the service mesh.

§Prerequesites

In this blog post, I’ll explain how to provision the certificates needed by Akka using cert-manager. I’ll assume you have a Kubernetes cluster with a standard cert-manager installation.

§Understanding the certificates

First off, let me explain a few concepts. cert-manager has a concept of Certificates and Issuers. A Certificate is a CRD that you deploy that cert-manager will reconcile into a Kubernetes Secret containing a TLS certificate. The Certificate references a Issuer, and the Issuer describes how Certificates that reference it should be issued.

In order to support frequently rotated certificates, Akka can’t just use a self signed certificate, since self signed certificates need to be the same at both ends to authenticate each other properly, and during the time when the certificate is being rotated, two different Akka nodes may have different certificates. Instead, Akka needs certificates issued by a certificate authority (CA). The CA verifies whether a certificate should be trusted or not, so during rotation, both the old and the new certificate can work together, because both are signed by the same CA. So, when we issue our certificates, we’ll use cert-managers CA Issuer type.

The CA Issuer itself needs a certificate to do its signing, and this certificate we’ll also provision using cert-manager. That certificate we’re not going to rotate - its private key never gets shared with anything outside of cert-manager, and so rotating it is not as necessary. Because of this, it will use a self signed certificate, and provisioning that certificate can be done by using a cert-manager self signed Issuer type.

So, in total, we’re going to have two issuers, a self signed issuer that issues certificates for the CA issuer, and then that CA issuer will issue certificates that are frequently rotated for our Akka service to use. The self signed issuer, certificate, and CA issuer can be reused across different Akka deployments - more on that later.

§Kubernetes resources

First we deploy the self signed issuer:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: self-signed-issuer
spec:
  selfSigned: {}

We’re creating this for the whole cluster, self signed issuers don’t have any state or configuration, there’s no reason to have more than one for your entire cluster.

Next we create a self signed certificate for our CA issuer to use that references this issuer:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: akka-tls-ca-certificate
  namespace: default
spec:
  issuerRef:
    name: self-signed-issuer
    kind: ClusterIssuer
  secretName: akka-tls-ca-certificate
  commonName: default.akka.cluster.local
  # 100 years
  duration: 876000h
  # 99 years
  renewBefore: 867240h
  isCA: true
  privateKey:
    rotationPolicy: Always

We’ve created this in the default namespace, which will be the same namespace that our Akka service is deployed to. If you’re using a different namespace, you’ll need to update accordingly.

The commonName isn’t very important, it’s not actually used anywhere, though may be useful for debugging purposes if you’re ever looking into why a particular certificate isn’t trusted by a service. We use a naming convention for common names and DNS names that follows the the pattern <service-name>.<namespace>.akka.cluster.local. The CA uses the same convention without the service name. This convention doesn’t need to be followed, but it makes it easy to reason about the purpose of any given certificate.

Now we create the CA issuer:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: akka-tls-ca-issuer
  namespace: default
spec:
  ca:
    secretName: akka-tls-ca-certificate

This uses the secret that we configured to be provisioned in the certificate above. Finally, we provision the certificate that our Akka service is going to use - we’re assuming that the name of the service in this case is my-service:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: my-service-akka-tls-certificate
  namespace: default
spec:
  issuerRef:
    name: akka-tls-ca-issuer
  secretName: my-service-akka-tls-certificate
  dnsNames:
  - my-service.default.akka.cluster.local
  duration: 24h
  renewBefore: 16h
  privateKey:
    rotationPolicy: Always

The actual dnsName configured isn’t important, since Akka cluster does not actually use these names for looking up the service, as long as it’s unique to the service within the issuer. Akka’s mTLS support will verify that the DNS name supplied by an incoming connection matches the DNS name supplied in its own secret, and reject it otherwise. Again, we’re using the naming convention for the dnsName mentioned above.

This certificate is configured to last for 24 hours, and rotate every 16 hours.

§Configuring Akka

To configure your Akka application, you need to have artery based remoting enabled, which will be the case if you’ve followed the Akka guide for configuring cluster bootstrap in Kubernetes, with the following additional configuration:

akka.remote.artery {
  transport = tls-tcp
  ssl.ssl-engine-provider = "akka.remote.artery.tcp.ssl.RotatingKeysSSLEngineProvider"
}

This instructs Akka to use TLS, with the RotatingKeysSSLEngineProvider, an SSL engine provider that is designed to pick up Kubernetes TLS secrets, and poll the file system for when they get rotated. It also applies authorization by matching the incoming DNS name with the DNS name of its own certificate.

§Configuring the Akka deployment

Having configured Akka and built a new Docker image, you can now configure your Akka deployment. To do this, you need to mount the certificate at the path /var/run/secrets/akka-tls/rotating-keys-engine. This is the default path that the RotatingKeysSSLEngineProvider uses to pick up its certificates. So, add the following volume to your pod:

      volumes:
      - name: akka-tls
        secret:
          secretName: my-service-akka-tls-certificate

And then you can mount that in your container:

        volumeMounts:
        - name: akka-tls
          mountPath: /var/run/secrets/akka-tls/rotating-keys-engine

Your complete deployment YAML, configured as described in our Kubernetes deployment guide, might look like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-service
  namespace: default
  labels:
    app: my-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-service
  template:
    metadata:
      labels:
        app: my-service
    spec:
      containers:
      - name: my-service
        image: my-service:latest
        readinessProbe:
          httpGet:
            path: /ready
            port: management
        livenessProbe:
          httpGet:
            path: /alive
            port: management
        ports:
        - name: management
          containerPort: 8558
          protocol: TCP
        - name: http
          containerPort: 8080
          protocol: TCP
        resources:
          limits:
            memory: 1024Mi
          requests:
            cpu: 2
            memory: 1024Mi
        volumeMounts:
        - name: akka-tls
          mountPath: /var/run/secrets/akka-tls/rotating-keys-engine
      volumes:
      - name: akka-tls
        secret:
          secretName: my-service-akka-tls-certificate

And now you have secured your Akka cluster communication with mTLS using frequently rotated certificates. This will prevent both eavesdropping, as well as malicious services trying to join your Akka cluster.

Note that if you apply this to an existing running cluster deployment for the first time, you will need to do an Akka cluster restart. This is because the new nodes will be unable to connect to the old nodes because the new nodes will be attempting to speak TLS to the old nodes, while the old nodes will not be configured to speak TLS. The easiest way to restart the Akka cluster is to scale the deployment down to 0, and then back up to what it was before.

If you have more Akka services that you wish to deploy in the same namespace, you can reuse the same CA Issuer, you only need to deploy an additional Certificate for each service.

Rotating secrets in Kubernetes

I’m building a multitenant SaaS on top of Kubernetes at the moment, and one principle we’ve gone with is that all secrets should be rotated regularly. I’m surprised by the distinct lack of documentation on best practices for how to supply this configuration in Kubernetes.

Of course, when it comes to certificates, it’s fairly straight forward to rotate a certificate using cert-manager, but even that isn’t quite solved - while you can rotate a services certificate, there’s no straight forward way to rotate the certificate for the CA that it and the other services it authenticates with trusts.

When it comes to authentication that is not based on certificates, such as symmetric encryption keys, passwords, or asymmetric keys used for things like JWTs, there is nothing really out there saying how to do this.

Of course, it’s absolutely possible to do it, and I can come up with multiple different ways in my head to do this without requiring downtime. There are also multiple third party solutions, such as HashiCorp Vault, that makes it possible. But I’d like a Kubernetes native mechanism - afterall, Kubernetes does provide a secret management API, it should be possible to use this in a way that supports secret rotation.

So, I’m going to propose a convention, that perhaps could become a best practice, for how to manage secrets in Kubernetes in a way that is compatible with secret rotation. What I’m proposing is all manual, but it shouldn’t be too hard to build tooling around this approach to make it automatic.

§A few principles

Firstly, for secret rotation to work, I need to outline a few principles.

§Secrets should be read, and reread, from the filesystem

One of the great things about Kubernetes secrets is that when mounted as filesystem volumes, an update to the secret is immediately available to the pods consuming it. No restart or redeploy is needed, all you have to do is update the secret. However, to take advantage of this, the code consuming the secret must read it from the filesystem. It doesn’t work for secrets passed using environment variables. Additionally, the code must, at least periodically, reread the secret from the filesystem, otherwise it won’t pick up the changes.

We’ve had success in Akka configuring this for encryption keys and certificates. When we read the secret, we track a timestamp of when we read it. When we next access the certificate (next time we receive or make a new TLS connection), if the timestamp is more than 5 minutes old, we reread it. This way, we can rotate secrets with zero interruption to the service.

§Secrets must be identified by an id

When it comes to TLS certificates, the id of the secret is built in to the certificate, and TLS implementations can typically be configured with multiple trusted certificates to use. For JWTs, there is a semi built in mechanism, you set the key id in the key id JWT parameter. However, most JWT implementations don’t make you set this by default, and sometimes provide only limited if any support for dynamically selecting a key to use based on the passed in key id.

When encrypting arbitrary data, there’s not usually any built in mechanism to indicate the id of the key used. In my case, when we encrypt small amounts of data, we are encoding using the format <keyid>:<base64ed initialization vector>:<base64ed cypher text>. This allows us to associate each piece of encrypted data with the key that was used to encrypt it.

For passwords, it gets harder again, because generally, only one password can work at a time. One possibility here is to support multiple usernames, and use the username as the key id.

§Rotation mechanism

Having set out our principles by which the code that consumes our secrets will abide by, we can now propose a mechanism for configuring and rotating secrets in Kubernetes.

Kubernetes secrets allow multiple key value pairs. We can utilise this. When these secrets get mounted as volumes, the filename corresponds to the key in the key value pair. Given the name clash between secret keys, and key value keys, I’m going to refer to the key value keys as filenames.

Consider the case where you might have a symmetric key, perhaps used for signing JWTs. Each key will get an identifier, this could simply be a counter, a timestamp, a UUID, etc. When there is only one key in use, the filename might be <key-id>.key. When your code loads the key, it looks in the directory of the volume mount for any files called *.key, and will load them all, storing them in a map of key ids to the actual key contents.

Now, this works great when validating JWTs, since you have a key id before you start, and you just need to select the secret for that key id. However when creating a JWT, if you have multiple keys configured, which one do you use? It’s important that you choose the right one, if the secret has only just been updated to add a second key, other pods may not yet have picked up that second key, and so if you use that second key to sign a JWT and send it to those pods, they may fail to validate it. So, to handle this, we will also support specifying a primary secret, by naming it <key-id>.key.primary. There must only be one primary secret, and it will always be the one used to sign or encrypt data.

So, given this set up, this will be our rotation mechanism:

  • A given secret starts with a single key configured, let’s say its id is r1. The id can be anything, we’re using r to stand for revision, and 1 to indicate its the first key, but the id could be a UUID or anything else. The key will be placed in a Kubernetes secret, with a filename of r1.key.primary.
  • When it comes time to rotate the key, a new key will be generated, and a new id assigned, r2. The kubernetes secret will be updated so that it now has both keys, with r1.key.primary being the filename for the old key, and r2.key being the filename for the second key.
  • Once we are sure that all nodes have picked up the new key, we can now change the new key to be the primary. So, the secret is again updated, with r1.key being the filename for the old key, and r2.key.primary being the filename for the new key.
  • After some time, eg, once we are sure that all JWTs signed by the old key have expired, we will delete the old key, updating the secret so it only contains one key, with a filename of r2.key.primary.

This approach can also work when a secret comes in multiple parts, such as asymmetric keys, or self signed certificates, simply replace key with the name of the thing, so for example, I might have r2.private and r2.public or r2.crt.

§Why bother?

It’s not too hard for someone to implement the above themselves, but why bother proposing it as a best practice? If the above approach was to be adopted by many different people, this would open the following possibilities:

§Secret consumption support

Secret consumers could provide built in support for this convention. For example, a JWT implementation might offer it, users would just need to pass it the directory to find the keys, and it would use them, periodically rereading and consuming the new keys. HTTP servers and clients could do likewise, as could database drivers. Generic libraries that implement the convention could be provided so that arbitrary secret usage, such as for encryption, could easily consume the keys.

§Automatic rotation

If enough consumers were using the convention, tools for automatic secret rotation could be implemented. This could be as simple as an operator that would allow you to configure a secret to be generated and rotated, given parameters such as how frequently to rotate secrets, how long to wait before making the new secret the primary, and then how long to wait before deleting the old secret. Such a tool could also be configured to create and wait for migration jobs, to allow data encrypted at rest to be decrypted and rencrypted using the new key.

§Pros and cons

§Pros

Many of the pros are self evident in the explanation above, but are a few more that I can think of:

§Not Kubernetes specific

The way code consumes keys is not at all specific to Kubernetes. It can work on any platform that can pass keys using a filesystem. This includes development environments where perhaps static keys might be used.

§Kubernetes native

In spite of not being specific to Kubernetes, this mechanism is native to the way Kubernetes works. It uses the in built secret mechanism, and it’s workable without any third party tooling. It will work on any Kubernetes distribution, and if you already understand how Kubernetes secrets work, it’s straight forward to understand how this works.

§No vendor lock in

This also provides for a vendor neutral way to rotate and consume certificates. Today, if using HashiCorp Vault for example, you need to use the Vault client in your code to connect to the Vault server to get keys, which ties your code to Vault. This convention allows whatever is managing the keys to be decoupled from the consumers. This also can be advantageous in development and test environments, you might not be able to run your vendors secrets manager on your local machine for example for licensing or cost reasons, so you can substitute in a different one in those environments.

§Cons

The convention is not without its cons of course.

§Reliance on the filesystem

Some people may object to using the filesystem for distributing secrets, preferring to only pass them through authenticated connections. Of course, at some point some secret needs to be passed to the code - if the code is going to authenticate with a third party secret manager to retrieve secrets, the secret for that authentication needs to be stored somewhere, such as the filesystem or an environment variable.

§Reliance or Kubernetes secrets

Some people may be concerned with the way Kubernetes stores secrets itself. As I understand, I think this is either pluggable, or can be encrypted (I know GKE supports integration with Cloud KMS to encrypt the secrets stored by Kubernetes, for example). But in some circumstances this might not be good enough for people.

§Changes to the way code consumes secrets

The requirement to read secrets from the filesystem may be disruptive for libraries that typically consume secrets from configuration files or environment variables.

§Conclusion

So, does this convention sound useful? Please comment if you have anything to add!

About

Hi! My name is James Roper, and I am a software developer with a particular interest in open source development and trying new things. I program in Scala, Java, Go, PHP, Python and Javascript, and I work for Lightbend as the architect of Kalix. I also have a full life outside the world of IT, enjoy playing a variety of musical instruments and sports, and currently I live in Canberra.