WEBINAR

Incident Repro & Playbook Validation with Chaos Engineering

Learn how you can use Chaos Engineering to reproduce high-severity incidents, ensure your post-incident fixes are working as expected, and validate that your incident management playbooks are up to date.

On-demand

Watch now

Thank you for registering for this on-demand event. You will receive an email momentarily with a link to watch the session.

About this webinar

In this live session, we will explore how Gremlin can be used to determine whether your system is resilient to specific, high-severity outages. You will learn how you can use Gremlin and FireHydrant together for incident management and incident reproduction.

You’ll also have the opportunity to have your questions answered by our experts during our Q&A segment.

Agenda

First, Tammy and Bobby will introduce an example of a real-world, high-severity incident
Then, you will see how you can reproduce the outage conditions using Gremlin
Next, we will explore how you can use FireHydrant to improve your incident management program
Finally, you will see how Gremlin and FireHydrant can be used together to ensure your systems are resilient to specific types of real-world outages

About the speakers

Tammy Butow

Principal SRE

Gremlin

Tammy Butow is a Principal SRE at Gremlin where she works on Chaos Engineering, the facilitation of controlled experiments to identify systemic weaknesses. Gremlin helps engineers build resilient systems using their control plane and API. Tammy previously led SRE teams at Dropbox responsible for Databases and Storage systems used by over 500 million customers. Prior to this Tammy worked at DigitalOcean and one of Australia's largest banks in Security Engineering, Product Engineering and Infrastructure Engineering.

Robert "Bobby Tables" Ross

CEO

FireHydrant

Bobby is the co-founder and CEO of FireHydrant.io, an incident response tool. He also previously worked as a staff software engineer at Namely, and built things at DigitalOcean. Bobby has always had an interest in incident response ever since he started maintaining production systems. He likes bleeding edge tech and making software that helps teams build better systems.

Check out other webinars from Gremlin

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

get started