Service meshes like Istio have become an essential way to securely and reliably distribute network traffic, especially with ephemeral, service-based architectures such as Kubernetes.

However, their constantly shifting nature can interfere with targeting specific services for resilience tests. Infrastructure-based testing is designed to target specific IP addresses, allowing precision testing of applications, VMs, and nodes. Unfortunately, this kind of targeting has limited granularity when dealing with service meshes, where their constantly shifting nature and built-in resilience capabilities can lead to shifting IP addresses.

The Gremlin Service Mesh Extension takes an application-level approach, allowing you to accurately target specific services on a service mesh with the same safety and ease you’ve come to expect from Gremlin reliability and resilience testing.

Gremlin Service Mesh Extension is now in private beta. Contact our team to find out how to join the beta and enable service mesh targeting for tests!

More granular targeting for application-level testing

Instead of the IP address targeting used by infrastructure-based targeting, Gremlin Service Mesh Extension uses the http path for targeting based on application-level data. In practice, this means more granular targeting where specific interactions with an application can be targeted instead of simply targeting all traffic from a service.

Both of these targeting approaches are crucial for completely understanding the reliability of your application, and combining this additional granularity with other tests will give you a more holistic, complete view of your reliability posture.

As an example, we can look at the common use case of an API gateway where several services are behind a single reverse proxy. On an infrastructure level, it’s important to know what happens to your systems if the entire gateway is unavailable or fails, and IP targeting would be used to find this out.

But it’s also important to know what happens if specific services are unavailable or return bad responses. By targeting the http path for that application, you can fail just the one service rather than the entire gateway, giving you a more precise understanding and a greater capability to track down issues before they cause outages.

Currently available experiments

The private beta includes three experiments for service mesh:

Network

  • Latency - Introduce latency into your service mesh networks.
  • Blackhole - Drop all traffic to a service or application to simulate it failing.

Application

  • Unexpected service response codes - Throw a specified error or response from the application.

These experiments can be used separately or in combination for testing use cases like:

  1. Test how your applications or services behave when a database or service dependency is unavailable.
  2. Validate your automated recovery/failover systems.
  3. Validate your monitoring and alerting configurations.
  4. Test how the rest of your system and services behave when a service fails or returns a bad response code.

Gremlin Service Mesh Extension is available for Istio 1.22.x (Envoy 1.30) today, with additional versions coming in the next few months.

Gain confidence in the resilience of your service mesh

Once installed, you can configure and launch experiments quickly from the Gremlin UI or via the API. Like with Failure Flags, the sidecar sits safely next to your Istio service mesh without adding an additional point of failure. Additionally, the sidecar is failsafe: if it can’t connect to Gremlin’s control plane, then your service mesh will function normally.

Find out more about deploying the Gremlin Service Mesh Extension in our Docs or get a demo and see if for yourself!

No items found.
Categories
Gavin Cahill
Gavin Cahill
Sr. Content Manager
Start your free trial

Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.

sTART YOUR TRIAL
Book a demo

Schedule a time with a reliability expert to see how reliability management and Chaos Engineering can help improve the reliability, resilience, and availability of your systems.

Schedule now