Encryption is a fundamental part of nearly every modern application, whether you’re storing data, sending data to customers, or sharing data between backend services. Most organizations have a data encryption strategy, and nearly every web page is using HTTPS, thanks to initiatives like Let’s Encrypt.

But setting up encryption isn’t a one-time initiative. Over time, the certificates backing modern encryption expire and need to be replaced. While there are tools to automate this process, they’re not always reliable. Plus, what happens when your services rely on third-party services who are less diligent about rotating certificates?

In this blog, we’ll show you how to test your entire TLS certificate chain for expiring certificates using Gremlin. You’ll also see how to automate this process for all of your services so you can stay ahead of upcoming expiration dates.

How does TLS encryption work?

At a high level, Transport Layer Security (TLS) creates a secure, encrypted connection between two devices. For example, your browser is using TLS encryption to read this blog post. TLS has two main goals: encrypt data so that unauthorized parties can’t access it, and provide verification that the devices are who they claim to be.

TLS is built on a concept called asymmetric key cryptography. As an example, let’s look at gremlin.com. The server hosting gremlin.com has a pair of certificates, which are files containing unique identifiers. One of these certificates is the private key, which is used to decrypt data sent to the server. The other is the public key, which a client (i.e. your browser) uses to encrypt data before sending it to the server. This way, you can send and receive data to and from gremlin.com securely. Think of it like a mailbox: clients can drop mail into your mailbox using your public key, but they can’t unlock the box or see who else has sent you mail. Only you can open your mailbox using your private key and read the mail that was sent to you.

Without TLS, the contents of this blog would be transmitted to your device in cleartext. Anyone capable of monitoring network traffic between your device and gremlin.com will be able to see the full details of your request and the server’s response. This might not be a big deal for a blog, but it’s absolutely a concern for financial institutions, governments, companies, and other groups that need to protect sensitive data.

What is TLS certificate expiration?

TLS certificates have a critical feature: they expire after a set amount of time. To understand why, imagine an attacker gained access to your private key. If certificates never expired, the attacker could impersonate you indefinitely. But because certificates expire, the attacker will only be able to impersonate you up to that date. After the expiration date, most applications will refuse to use the certificate, or they’ll print a big warning message to users like the one below: 

Screenshot of a browser warning message for an expired TLS certificate.

To avoid this, servers must periodically renew and replace their certificates. This is called rotation. A server can request a new certificate from a Certificate Authority, which verifies the server’s identify before issuing a new certificate. This new certificate immediately replaces the old one, and in most cases, the Certificate Authority will revoke the old certificate to prevent impersonations.

Historically, teams manually rotated their certificates. As you can imagine, this created tons of logistical overhead, especially as the size and complexity of deployments grew. Even just one expired certificate could cause a global outage in a service. Modern tools support automatic rotation, but some certificates can still fall through the cracks and get missed or overlooked by the security team.

How do you keep track of expiring TLS certificates?

Security teams have developed several methods for tracking old TLS certificates, but they generally fall into one of two categories: manually, or automatically.

Manually tracking certificates

The old-fashioned way of tracking expiring certificates is manually. This means going through each of your services one-by-one, pulling its certificate, and documenting the expiration date. As a best practice, you can add these dates to a calendar and set alerts to notify you of upcoming expiration dates in advance.

As you can imagine, there are a lot of problems with this approach. For one, it doesn’t scale well, which creates a high risk of missing one or more services and leaving gaps in your documentation. It also doesn’t account for automated certificate renewal processes, which could swap out a certificate right after you’ve finished documenting it. While this method may have been useful in the past, it’s outdated today, and has been mostly replaced by automation.

Using an automated tool

Automated tools can detect ‌resources in your environment that use TLS certificates, document them in a single location, and regularly monitor them for upcoming expirations. Better still, they can look at the entire certificate chain, not just your service’s certificate.

Gremlin also supports this functionality with the Certificate Expiry experiment. In this experiment, you provide Gremlin with the name or IP address of a system, and Gremlin examines its entire certificate chain to make sure none of them are expiring within a given time frame (this defaults to 720 hours, or 30 days). This experiment works for any network-based service, whether it’s a dependency, an internal service, or a third-party SaaS service.

Screenshot of the Gremlin web app showing how to add a hostname to an experiment.

Like other Gremlin experiments, this experiment runs on a target host or container via the Gremlin agent. When it runs, it contacts the hostname from the target and scans its certificate chain. If the certificate is valid past the “Not Less Than” time, the experiment will return successful. Otherwise, it will return a failure. One major benefit of running the experiment from a specific host or container is that it can test for certificate pinning and detect systems that may be using older certificates.

Screenshot of the Gremlin web app showing a failed Certificate Expiry test. The test failed because the cert expires within the 30 day test parameter.

Standardizing and automating certificate expiry tests

A one-time scan for expiring certificates is good, but how can we automate this?

Gremlin has built-in tools for automating tests like these. In Gremlin, you can define a service, which is any workload running on a host, Kubernetes Deployment, or container. For each service, Gremlin creates a suite of reliability tests that you can run in a single click. Gremlin also automatically detects network dependencies that your service communicates with, based on DNS traffic.

Once Gremlin knows your service and its dependencies, it creates the reliability tests. One of these tests is a Certificate Expiry test, which performs the same function as the experiment in the previous section of this blog. After you’ve run one or more tests, the result feeds into the service’s overall reliability score.

Screenshot of the Gremlin web app showing a network dependency with a failed Certificate Expiry test.

As you add services and dependencies, Gremlin creates Certifricate Expiry tests for them as well. This way, you’ll never have to worry about missing a service or losing track of a dependency. As long as you have the Gremlin agent deployed to the host and your services defined in Gremlin, you’ll be able to test and track them without problems.

Conclusion

Modern applications require extensive security, and TLS certificates are a key pillar of those practices. The Certificate Expiry experiment is a fast, easy, and safe way to test your certificate chain in any environment, and on any platform. Once you’ve run the experiment, you can schedule it to run weekly, ensuring you’ll never miss another expiring certificate. If you’re new to Gremlin, you can try out this and all other experiments by signing up for a free 30-day trial.

No items found.
Andre Newman
Andre Newman
Sr. Reliability Specialist
Start your free trial

Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.

sTART YOUR TRIAL