Deploying Failure Flags on the Istio service mesh via Envoy
The Failure Flags Envoy extension lets you run Gremlin experiments on services connected by the Istio service mesh. This is an Envoy HTTP filter extension that instruments inbound and outbound HTTP connections and makes them available as Failure Flags, which you can then run experiments on.
Deploying the Envoy extension
Prerequisites
Before deploying the extension, you’ll need the following information from the Gremlin web app:
- Your Gremlin team ID
- Your Gremlin team’s certificate
- Your Gremlin team's private key
You can get your team ID from the Gremlin web app's Team Settings page. To download your certificate and private key, follow the instructions in Download a certificate pair.
Create a secret for your Gremlin credentials
The first step is to create a Kubernetes secret, which allows the extension to authenticate and communicate with the Gremlin API.
Before applying this manifest, add your Gremlin authentication information to the gremlin_team_id
, gremlin_team_certificate
, and gremlin_team_private_key
fields. Each of these values must be base64 encoded. For example, if your team ID is 355a3e8b-af61-49c8-8518-09b379379945
, the base64-encoded value is MzU1YTNlOGItYWY2MS00OWM4LTg1MTgtMDliMzc5Mzc5OTQ1Cg==
.
Save the file, then apply it by opening a terminal and running kubectl apply -f <filename>.yaml
.
Configure your services to allow Gremlin experiments
Before you can run experiments on your services, you need to configure them to use the Envoy extension. The extension will only run experiments on services that have been configured this way.
First, add the following to the Pod’s annotations section. These instruct Istio on how to find and configure the plugin.
Next, add the following to your Pod’s volumes section:
Last, add the following to your Pod’s initContainers section:
Patch your Envoy filter
The final step is to patch Envoy itself. This creates a new Gremlin filter for Envoy to proxy traffic through, and is also where experiments are applied.
Before applying the patch, make sure to replace the following values:
namespace
: the namespace to deploy the filter to.spec.configPatches.match.listener.portNumber
: the port number where the Pod’s service is exposed.spec.configPatches.patch.value.typed_config.plugin_config.value.cloud/region
: The cloud environment and region where the service is deployed, respectively.
Once you apply the patch, Envoy will automatically incorporate the filters.
Running service mesh experiments
Service mesh experiments work differently from regular Failure Flags experiments. After you patch Envoy and your Pods, Gremlin automatically creates Failure Flags for each service. Gremlin also creates two new default Failure Flags:
envoy-inbound
impacts network traffic going to your service.envoy-dependency
impacts network traffic going from your service.
For example, consider an application with three services: service_a
, service_b
, and service_c
. A simplified service map might look like this:
Imagine you want to run an experiment that injects latency in all traffic from service_a
to service_c
. To create this experiment:
- Start a new Failure Flags experiment by opening the experiment creation screen.
- Enter an Experiment Name.
- In the Failure Flag Selector drop-down, select the service that you want to impact traffic to. In this example, this would be
service_c
. - In the Service Selector drop-down, select the service that will run the experiment. In this example, this would be
service_a
. In other words, the experiment will run onservice_a
and impact traffic toservice_c
. - Configure the rest of the experiment by following the running Failure Flags experiments docs.
Alternatively, if you select envoy-dependency
as the Failure Flag, the experiment will impact traffic outbound from service_a
to its dependencies. Likewise, selecting envoy-inbound
impacts traffic received by service_a
.
Service selector attributes
By default, Gremlin adds the following labels to each service identified by the Envoy extension. These labels can be used as selectors for limiting experiments to specific services.
region
identifies the cloud region the service is running in (e.g., us-east-1 for AWS).cloud-region
is the combination of theregion
andcloud
values that you set in your Envoy filter configuration.gremlin-runtime-type
identifies the type of Gremlin agent that detected the service. Possible values includeEnvoy
,AWSLambda
,AWSECSContainer
, etc.gremlin-runtime-version
identifies the version of the Gremlin agent.
Failure Flag selector attributes
By default, the Gremlin Service Mesh Extension adds the following labels to each service: accept
, accept-encoding
, Content-Type
, Content-Length
, method
, and path
. These labels can be used as selectors for limiting experiments to only impact specific types of requests.
Troubleshooting
For additional troubleshooting, the Gremlin extension can generate logs. To enable logging, add spec.configPatches.patch.value.typed_config.plugin_config.value.gremlin_debug: true
to your Gremlin Envoy filter manifest:
FAQ
Where is the container image for the Gremlin Service Mesh extension hosted?
The gremlin-envoy plugin container is available on Docker Hub.
Do I need to add a Failure Flag SDK to my applications before I can use Gremlin Service Mesh Extension?
No. You don’t need to make any additional changes to your applications.
Do I need to replace my existing service mesh to use Gremlin Service Mesh Extension?
No, but installation means augmenting your existing Istio service mesh.
Is this safe to use in production environments?
Yes. Gremlin Service Mesh Extension only affects network traffic when you run experiments.
Will this let me conduct network experiments in Gremlin Reliability Management?
Gremlin Service Mesh Extension allows for network experiments via Failure Flags. Failure Flags are currently available for Gremlin Fault Injection only. Contact your account executive for details on the availability of Failure Flags with Gremlin RM.