mTLS Security: Are You Doing It Right?

Mutual TLS (mTLS) is increasingly favored for securing system-to-system API calls, offering not only encryption but also mutual authentication. This is particularly effective in system-to-system interactions where both endpoints can be tightly controlled, and certificates can be provisioned in advance. The result is a secure, encrypted communication channel where both parties can trust the legitimacy of their counterpart.

However, achieving this level of security requires more than simply enabling mTLS. Without deliberate attention to implementation details, even a compliant setup can fall short of providing meaningful security.

Common Insecure Implementations

Unfortunately, it’s not difficult to design mutual TLS in an insecure way. You might think you’re adequately protected, but in reality, the control can be easily circumvented. Because TLS is fundamental for secure communications over the internet, there have been many cryptographic attacks against TLS over the years. In this article, I will not focus on the cryptographic weaknesses (you can read about them here) but on weak design and implementation. Let’s start with the common mistakes:

No Revocation check

The client certificate has an expiry, but it’s usually quite a long-lived certificate – in the order of at least a year and commonly 2 or more because generating another certificate and adding it to the client is usually a manual process. This means that if the client certificate is lost or stolen, an attacker could use that still valid certificate to impersonate the client. Because of this it’s a good idea to have a way to revoke that certificate so it can no longer be used.

The old way to solve this was to implement a Certificate Revocation List (CRL) on the Certificate Authority – the server that issued the certificate. If your client certificate details are added to the CRL, then servers are supposed to ignore that client certificate. This means that each server would need to check the revocation list every time they receive a new connection, which would cause an immense increase in traffic to the CA. The newer method is OCSP Stapling where the server periodically checks the CA for revoked certificates.

The problem is that revocation checks in general – CRL or OCSP Stapling – are rarely implemented. It just seems to be something that gets missed in my experience – due to several reasons, mostly related to operational maintenance. Not implementing revocation means if your certificate is compromised, there’s only one way of stopping an attacker from using that certificate – which is to remove the CA from the trust store. Yes, this would invalidate the stolen cert, but also every other cert issued by that CA which could have quite an adverse impact.

Not inspecting the client certificate

Mutual TLS uses a “client authenticated TLS handshake, ” validating the client certificate when establishing the mTLS connection. This process will check if the certificate is valid (has not expired or been revoked), has been issued by a trusted Certificate Authority and that the client still possesses the private key.

The problem is that these checks alone don’t positively identify the client – they just prove that the certificate is issued by a trusted CA. The application accepting the client connection must explicitly check the client certificate to ensure it expects a specific client certificate. This check isn’t part of the client authenticated TLS handshake.

If this check isn’t made, then any client can show any unexpired client certificate issued by any Certificate Authority that is trusted by that server, and that client will succeed in establishing an mTLS connection. This means a threat actor can re-use the certificate issued for another machine or another project entirely and still be allowed to connect.

This behaviour is desired in the more common use case of a web browser connecting to a unauthenticated public web server, where the web browser is somewhat anonymous. But in a system-to-system use case where mTLS would be used and increased security is desired, the client certificate must be validated. This is similar in concept to “certificate pinning” in a mobile app.

Relying on a successful TLS handshake and not checking the certificate details defeats the whole purpose of mTLS – that you can prove the identity of the client system. If the server is configured to check only validity and expiry, a valid mTLS connection will be established which will satisfy compliance checks, but the control is significantly weakened and easily circumvented. This is a good example for the phrase “compliance doesn’t equal security” – where just ticking the box does not provide much additional security but makes people feel all warm and fuzzy inside because they assume it’s more secure.

There is operational overhead in adding this check –the list of permitted domains on the server or container needs to be maintained, including allowing updates when the permitted clients change. This is the primary argument I hear against doing this – that it’s too onerous to implement and maintain.

Using the Client Certificate for Authentication

Most APIs nowadays are authenticated – even if registration and use are free. Public APIs that don’t require authentication are a dying breed and are frequently subject to availability attacks via DDoS.

The thought process around using a single credential for authentication goes like this:

If the client certificate is treated as a login credential and can be securely stored on the client machine – usually in the same way that a clientid & secret (or username & password) is stored; and
If an attacker compromises the client and gains access to the clientid & secret, they will also have access to the client certificate since they are being stored in the same manner; and
The target server supports a variety of different authentication methods – it could be a certificate, a clientid & secret or token-based authentication. Each one of these methods is assumed to be equivalent.

Then it follows that we can save a whole bunch of operational overhead and management by presenting just a single credential to the server for authentication. And if you present a client certificate, you also get to use mTLS, making a client certificate a good pick if you want to use single credential authentication.

And I get it. It’s like saying using two usernames and two passwords for authentication is not muti-factor because you’re just re-using the same factor. However, I would advise against ignoring defense in depth and have multiple controls. Iif there are multiple ways for an attacker to steal credentials, perhaps the method they use only allows them to steal one, which means you just avoided getting popped and live to fight another day.

Because of this, I always recommend using something like basic auth (clientid & secret) or token based auth in addition to a client certificate.

Trusting a public Certificate Authority

When you expose an API externally to your organisation, depending on the nature of the relationship with the consumers of the API, you may decide to use a Certificate Authority that is not under your control. If you have a b2b relationship with your consumer, you may trust them enough to add their private CA key chain to your server trust store, which means they can issue client certificates and connect to your API using mTLS.

But there may be a reason you might not want to trust their organisational CA. Or potentially, your consumer may not have their own CA. In this instance, you might be tempted to use a public CA to issue the X.509 certificates.

This is a recipe for disaster, since anybody with a credit card can also provision a certificate issued by that same public CA and connect to your API. Perhaps this is what you intended, but in general, you should never trust a public CA without a specific reason and understanding the implications of doing so.

Insecure certificate provisioning process

There’s a well-established process to generate a client certificate. Before automation, somebody would log into the client machine and generate a private key – which is supposed to never leave the client. They then generate a Certificate Signing Request (CSR) using the private key and enter all the required information that will be encoded into the client certificate, such as the SAN and DN/CN. The CSR then gets sent to the Certificate Authority, where it’s signed using the CA’s private key, and you’re given your client certificate.

The problem is that commonly, the CSR isn’t generated on the client anymore. You might need a certificate issued for a VM or container that isn’t even running yet. These assets are generated ahead of time, stored separately, and then transferred to where they are needed later.

In large companies, the process for issuing certificates might be fully automated end-to-end. You might generate a CSR and submit that into a ticketing system, and the CA signs your CSR, and you get your cert emailed to you (or perhaps you download it from the ticketing system). Also, the person who generates the client cert may not be the same person who installs the certificate, so you end up having certs and keys hanging around in ticketing systems, stored on email servers and sitting on the local hard disk of engineers.

If your company has an automated process for issuing certificates, what verification or validation process is in place that limits the information that can be encoded in those CSRs? If I submit a CSR for a client SAN or DN that I don’t own, will I be issued a certificate? From what I’ve seen, the answer is yes in many cases. This is great for reducing friction, enabling self-service and decreasing operational complexity but terrible if you desire a secure end-to-end process – because now literally anybody in the organisation can raise a ticket and be issued a valid client certificate.

It’s critical that this process be protected in some way via some sort of peer review check by the manager of the person raising the request, or even better to identify the owner of the asset via your CMDB and have them approve the issuance of a client cert for your specific use case.

Self Signed Certificates and Environment Segregation

Individual, Self-signed client certs might be an easy way to get up and running in non-production environments, but because each self-signed cert needs to be added to the server trust store, this quickly becomes unmanageable. A non-production CA that has implemented a good onboarding process means the root cert can be loaded and all child certificates trusted.

But doing this means environmental segregation must be maintained and checked carefully. You must have separate non-prod and prod CA’s. Your non-prod servers should not trust the prod CA, and production should definitely not trust non-prod.

Certificate Rotation

Certificates do expire, and because the expiry should be checked by the server (if configured correctly) when they expire, the mTLS connection must be rejected. I have seen misconfigured APIs that accepted expired certificates for months after expiry because only signature validation was being performed – which is a serious control failure (but easy to overlook if ‘click ops’ config is being used)

Final Thoughts

Any of these “mis” configurations might be fine in isolation. It might even be a deliberate decision not to implement some of these controls. But the danger is when you ignore these issues in combination.

If you’re using the client certificate for authentication and you’re also not checking the SAN or CN, well now you have an API that can be called by almost any developer in the organisation.

Similarly, not adequately protecting the internal organisational X.509 cert issuing process can be significant. Automation and self-service are great in a large organisation and cut down on a lot of manual processes. Configured insecurely, any staff member can generate and be issued client certs for any server, signed by the internal CA. Insiders no longer need to steal a certificate – they can just request a new one with their desired details.

These decisions can compound and lead to unintended security exposures. When reviewing a design, architects can’t simply check to see if “mTLS” is written on the diagram and automatically assume the integration is secure.

Sometimes – oftentimes even – perfection isn’t warranted. Perhaps the operational overhead is, in fact, too onerous, and after some analysis, it’s determined that there are sufficient compensating controls that bring the identified risk position to an acceptable level. But to make those decisions in an informed way, the exposures must be identified and discussed so that the person tasked with accepting the risk understands what they are accepting. Hopefully, this article has helped you ensure that your secure system lives up to it’s promise.