Failure Flags > Deploying Failure Flags on the Istio service mesh via Envoy

Deploying Failure Flags on the Istio service mesh via Envoy

Supported platforms:

N/A

The Gremlin Service Mesh Extension lets you run Gremlin experiments on services connected by the Istio service mesh. This is an Envoy HTTP filter extension that instruments inbound and outbound HTTP connections and makes them available as Failure Flags, which you can then run experiments on.

‍

Note

Note: Envoy v1.30 or later is required. Instructions are also available on Docker Hub.

‍

Deploying the Envoy extension

Prerequisites

Before deploying the extension, you’ll need the following information from the Gremlin web app:

Your Gremlin team ID
Your Gremlin team’s certificate
Your Gremlin team's private key

You can get your team ID from the Gremlin web app's Team Settings page. To download your certificate and private key, follow the instructions in Download a certificate pair.

‍

Create a secret for your Gremlin credentials

The first step is to create a Kubernetes secret, which allows the extension to authenticate and communicate with the Gremlin API.

Before applying this manifest, add your Gremlin authentication information to the gremlin_team_id, gremlin_team_certificate, and gremlin_team_private_key fields. Each of these values must be base64 encoded. For example, if your team ID is 355a3e8b-af61-49c8-8518-09b379379945, the base64-encoded value is MzU1YTNlOGItYWY2MS00OWM4LTg1MTgtMDliMzc5Mzc5OTQ1Cg==.

YAML


apiVersion: v1
kind: Secret
metadata:
  name: gremlin-service-mesh-config
  namespace: [namespace]
type: Opaque
stringData:
  config.yaml: |
    gremlin_team_id: [base-64-encoded-team-id]
    gremlin_team_certificate: [base-64-encoded-team-cert]
    gremlin_team_private_key: [base-64-encoded-team-private-key]

‍

Save the file, then apply it by opening a terminal and running kubectl apply -f <filename>.yaml.

‍

Configure your services to allow Gremlin experiments

Before you can run experiments on your services, you need to configure them to use the Envoy extension. The extension will only run experiments on services that are configured this way.

First, add the following to the Pod’s annotations section. These instruct Istio on how to find and configure the plugin.

YAML


sidecar.istio.io/userVolumeMount: '[{"name":"gremlin-plugin","mountPath":"/gremlin/filter"}, {"name":"gremlin-service-mesh-config","mountPath":"/gremlin/config"}]'
sidecar.istio.io/userVolume: '[{"name":"gremlin-service-mesh-config", "secret":{"secretName":"gremlin-service-mesh-config"}}]'
sidecar.istio.io/logLevel: 'info'

‍

Next, add the following to your Pod’s volumes section:

YAML


volumes:
  - name: gremlin-plugin
    emptyDir: { }

‍

Last, add the following to your Pod’s initContainers section:

YAML


initContainers:
- name: install-gremlin-plugin
  image: docker.io/gremlin/envoy-plugin:1.0
  imagePullPolicy: Always
  volumeMounts:
    - name: gremlin-plugin
      mountPath: /gremlin/filter

‍

Patch your Envoy filter

The final step is to patch Envoy. This creates a new Gremlin filter for Envoy to proxy traffic through, and is also where experiments are applied.

Before applying the patch, make sure to replace the following values:

namespace: the namespace containing the application(s) you want to apply the filter to.
spec.configPatches.match.listener.portNumber: the port number that your service is sending/receiving traffic on.
spec.configPatches.patch.value.typed_config.plugin_config.value.cloud/region: The cloud environment and region where the service is deployed, respectively. These can be used as service selectors when creating experiments.

YAML


apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: gremlin-http-filter
  namespace: [namespace]
spec:
  configPatches:
    - applyTo: HTTP_FILTER
      match:
        context: ANY
        listener:
          portNumber: [port where the service is exposed]
          filterChain:
            filter:
              name: "envoy.filters.network.http_connection_manager"
              subFilter:
                name: "envoy.filters.http.router"
      patch:
        operation: INSERT_BEFORE
        value:
          name: envoy.filters.http.golang
          typed_config:
            "@type": "type.googleapis.com/envoy.extensions.filters.http.golang.v3alpha.Config"
            library_id: gremlin-http-filter
            library_path: "/gremlin/filter/gremlin-http-filter.so"
            plugin_name: gremlin-http-filter
            plugin_config:
              "@type": type.googleapis.com/xds.type.v3.TypedStruct
              value:
                region: [your region] # e.g. us-east-2
                cloud: [name of your cloud] # e.g. aws

‍

Once you apply the patch, Envoy will automatically incorporate the filters.

‍

Running service mesh experiments

After you patch Envoy and your Pods, Gremlin automatically creates Failure Flags for each service it detects. Gremlin also creates two new Failure Flags:

envoy-inbound impacts network traffic going to your service.
envoy-dependency impacts network traffic going from your service.

For example, consider an application with three services: service_a, service_b, and service_c. A simplified service map might look like this:

Imagine you want to run an experiment that injects latency in all traffic from service_a to service_c. To create this experiment:

Start a new Failure Flags experiment by opening the experiment creation screen.
Enter an Experiment Name.
In the Failure Flag Selector drop-down, select the service that you want to impact traffic to. In this example, this would be service_c.
In the Service Selector drop-down, select the service that will run the experiment. In this example, this would be service_a. In other words, the experiment will run on service_a and impact traffic to service_c.
Configure the rest of the experiment by following the running Failure Flags experiments docs.

Alternatively, if you select envoy-dependency as the Failure Flag, the experiment will impact traffic outbound from service_a to its dependencies. Likewise, selecting envoy-inbound impacts traffic received by service_a.

‍

Service selector attributes

By default, Gremlin adds the following labels to each service identified by the Envoy extension. These labels can be used as selectors for limiting experiments to specific services.

region identifies the cloud region the service is running in (e.g., us-east-1 for AWS).
cloud-region is the combination of the region and cloud values that you set in your Envoy filter configuration.
gremlin-runtime-type identifies the type of Gremlin agent that detected the service. Possible values include Envoy, AWSLambda, AWSECSContainer, etc.
gremlin-runtime-version identifies the version of the Gremlin agent.

‍

Failure Flag selector attributes

By default, the Gremlin Service Mesh Extension adds the following labels to each service: accept, accept-encoding, Content-Type, Content-Length, method, and path. These labels can be used as selectors for limiting experiments to only impact specific types of requests.

‍

Troubleshooting

For additional troubleshooting, the Gremlin extension can generate logs. To enable logging, add spec.configPatches.patch.value.typed_config.plugin_config.value.gremlin_debug: true to your Gremlin Envoy filter manifest:

YAML


...
plugin_name: gremlin-http-filter
plugin_config:
  "@type": type.googleapis.com/xds.type.v3.TypedStruct
  value:
    gremlin_debug: false
    ...

‍

Privileges required

Privilege	Description
CLIENTS_READ	Allows reading all client information within the team
CLIENTS_WRITE	Allows editing all client information within the team
EXPERIMENTS_RUN	Allows running an experiment within a team
EXPERIMENTS_READ	Allows reading all experiment information within a team
EXPERIMENTS_WRITE	Allows creating or updating an experiment for a team

‍

FAQ

Where is the container image for the Gremlin Service Mesh extension hosted?

The gremlin-envoy plugin container is available on Docker Hub.

‍

Do I need to add a Failure Flag SDK to my applications before I can use Gremlin Service Mesh Extension?

No. You don’t need to make any additional changes to your applications.

‍

Do I need to replace my existing service mesh to use Gremlin Service Mesh Extension?

No, but installation means augmenting your existing Istio service mesh.

‍

Is this safe to use in production environments?

Yes. Gremlin Service Mesh Extension only affects network traffic when you run experiments.

‍

Will this let me conduct network experiments in Gremlin Reliability Management?

Gremlin Service Mesh Extension allows for network experiments via Failure Flags. Failure Flags are currently available for Gremlin Fault Injection only. Contact your account executive for details on the availability of Failure Flags with Gremlin RM.

Deploying Failure Flags on Kubernetes

Running Failure Flags experiments