How to use your Gremlin reliability score in Jenkins to ensure reliable releases
Introduction
Adding Gremlin to your CI/CD pipeline is a key step in automating your reliability efforts. We previously wrote a tutorial on how to run a Chaos Engineering experiment as part of a Jenkins pipeline. The result ran a chaos experiment every time you deployed your code to a test environment. But this approach has a limitation: you have to either wait for the test to finish and check the results programmatically, or allow the build process to continue regardless of the results.
This tutorial expands on the previous one by using the Gremlin reliability score, which is a more proactive indicator of reliability. A reliability score is calculated by running a series of experiments (called a Test Suite). The main benefits are:
- We can run these experiments at any time, not just at deployment time.
- The score is standardized across all services, so we can set a single minimum score to apply to all services.
In this tutorial, we'll create a complete Jenkins pipeline that checks a service's reliability score using the Gremlin API. We'll compare the score against a required minimum score, and if it passes, we'll promote it to production. You'll learn how to create API keys in Gremlin and use the Gremlin API. And while this tutorial uses code specific for Jenkins, you can use the same concepts with any CI/CD tool.
Overview
This tutorial will show you how to:
- Use the Gremlin REST API
- Create a Jenkins Pipeline using Groovy
- Check and compare a service's reliability score using the Gremlin API and Groovy
Prerequisites
Before starting this tutorial, you’ll need the following:
- A Jenkins instance (refer to this tutorial to see how to deploy Jenkins via Docker)
- A Gremlin account (log into an existing account or sign up for a free 30-day trial)
- A service that has had at least one test ran on it. Refer to our Quick Start Guide if you don't yet have one.
Step 1 - Download the Jenins pipeline template
The first step is to define the Jenkins pipeline. We already wrote a simple Groovy file that you can download from GitHub. Copy and paste the contents of the file to your computer, or use the "Download raw file" button. Alternatively, you can copy the file contents from the code block below:
Step 2 - Create a Gremlin API key and add it to the file
In order to use Gremlin's REST API, we need to add our authentication details to the script. You'll need two things:
Once you have the API key, paste it into the following line in the <span class="code-class-custom">releasePipeline.groovy</span> file:
Save the file.
Step 3 - Retrieve your Gremlin team ID and service ID
You'll need two additional pieces of information from Gremlin: your team ID and the service ID. The team ID is the unique ID for your Gremlin team, and the service ID is the unique ID of the service you want to check the score for.
We'll start with the team ID. To get the team ID, look in the bottom-left corner of the Gremlin web app. You'll see your name, and underneath that, your team name. Click the icon next to the team name to copy your team ID to your clipboard. From there, open your <span class="code-class-custom">releasePipeline.groovy</span> file and paste it in the following line:
For the service ID:
Save the file.
Step 4 - Add the Groovy file to your Jenkins pipeline
In this step, we'll create a pipeline using our Groovy file. But before we do, there's one last tweak we need to make: we need to set the score threshold.
The score threshold is the minimum reliability score the service must have before it can deploy to production. This is defined in the <span class="code-class-custom">minScore</span> variable. In the sample file, we set <span class="code-class-custom">minScore = 80.0</span>, which means the service must have a score of at least 80% to deploy. Anything below this score will stop the pipeline and raise an error. You can change this threshold to any value between 0 and 100 by editing this line:
Now we're ready to add this file to our Jenkins Pipeline. To do this:
- Open your Jenkins web application.
- From the Dashboard, click on New Item.
- Enter a name for the pipeline (e.g. "[service name]-gremlin-release-gate").
- Select Pipeline as the type, then click OK.
- Click the Pipeline tab at the top of the page to scroll down to the Pipeline section.
- Enter the contents of the Groovy file in the Script text area.
- Click Save.
Step 5 - Run your Jenkins pipeline
After you click Save in the previous step, click Build Now to run the pipeline. Gremlin will retrieve the service's score, check if its value is greater than or equal to <span class="code-class-custom">minScore</span>, and if so, will mark the build as successful. Otherwise, it will mark it as failed.
From here, you can make changes to better integrate the pipeline into your build process. Instead of hard-coding values like your service ID, use environment variables instead so you can pass different IDs for each service, and use credentials for storing your Gremlin API key.
We've also included a section in the Groovy script where you can enter commands for deploying your service to production. This runs immediately after Jenkins compares the service's reliability score against <span class="code-class-custom">minScore</span>:
Lastly, you can change the "failure" condition to perform other steps, such as notifying the service's owner by sending an email or calling a service like PagerDuty. You can also track the status of your builds by integrating with a monitoring tool like Datadog and alert on failed builds that way.
Conclusion
Congratulations on setting up a reliability gate in Jenkins! This will ensure that your service only gets pushed to production if it meets your minimum reliability scores.
To ensure your scores stay up to date, make sure to autoschedule reliability tests on your service to run at least once a week. Going longer than one week without re-running a test will cause that test to expire, reducing your score. Remember that you can also use the Run All button to re-run all of the service's tests and regenerate its score.
Avoid downtime. Use Gremlin to turn failure into resilience.
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.