
Test serverless and application-level reliability with Failure Flags
It’s been a year and a half since Failure Flags was released. Since then, customers have used Failure Flags to run thousands of tests for applications running on serverless, container, and service meshes. (Check out this blog post to see how easy it was for a major retailer to set up and test a critical service on AWS Lambda in less than 30 minutes.)
We’ve also been hard at work improving Failure Flags capabilities and ease of use, which is why it’s time to officially announce it as out of Beta!
Let’s take a brief look at Failure Flags, how it works, and some of the significant improvements from the last year.
Resilience tests for serverless, containers, Kubernetes, and service mesh
Failure Flags lets you run tests on the application level, which is essential for managed services where the infrastructure layer is abstracted away from your control. It does this by using three components: the Gremlin SaaS API, the Failure Flags Sidecar or Lambda Extension, and one of the SDKs integrated into your application code.
This combination allows you to run tests on applications using:
- AWS Lambda
- AWS ECS
- Kubernetes
- Istio Service Mesh (via Envoy)
The interaction between these three pieces is essential for safety and making sure your application isn’t impacted except during a test. Once installed, the sidecar/extension and SDK will sit passively unless an experiment is running in Gremlin, so it can safely remain in any application.
Create application errors, latency, and data issues with Failure Flags
Failure Flags gives you the capability to create latency, cause specific error codes, and, in Node.js, modify data such as variables. Application errors caused by these issues represent the bulk of the problems that teams deal with day-to-day, including:
- Incorrect or corrupt data
- Customer-specific failures
- Lock-contention on hot data
- Breaking API changes
- Unexpected API responses
- Partial service failures
- Message double-delivery or ordering issues
Beyond specific errors, you can use Failure Flags to test how your application interacts with other parts of your system, allowing you to verify key system parts like observability and alert configuration or automated recovery systems.
Run Failure Flags in Node.js, Python, Java, Go—and .NET
You can install Failure Flags using an SDK. These are designed to be fail-safe if the agent is misconfigured, can't communicate with your application, or can't communicate with the Gremlin API. That means you can leave the SDK in your application without worrying about it impacting anything outside of your experiment’s parameters.
Failure Flags has SDKs for the most common serverless and managed container languages, including Node.js, Python, Java, and Go. Now, we’re pleased to announce that the .NET SDK is also now available!
Set up Failure Flags experiments in the Gremlin UI
When Failure Flags first launched in Beta, experiments had to be manually set up using JSON, but that changed in the second half of 2024. Now, you can select your Failure Flag, attributes, services, and effects using drop-down boxes. There’s still a JSON tab if you would prefer to create experiments that way, and any changes you make in one are reflected in the other.

Failure Flags is GA…and we’re just getting started
With all the customer-led improvements and optimizations over the last year, we’re pleased to make Failure Flags Generally Available for all customers.
And we won’t stop here! We’re currently hard at work on even more Failure Flags improvements to help make it easier for you to not only run application-level resilience tests, but also to standardize those tests so you can scale your efforts across your organization.
Want to see what the fuss is about? Check out the interactive Failure Flags walkthrough below, or contact us to set up a demo!

Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.
sTART YOUR TRIALSee Gremlin in action with our fully interactive, self-guided product tours.
Take the tourHow a major retailer tested critical serverless systems with Failure Flags
Find out how Gremlin helped a major retailer test region failover for a critical service built on AWS Lambda using Failure Flags.


Find out how Gremlin helped a major retailer test region failover for a critical service built on AWS Lambda using Failure Flags.
Read more