How to Use Gremlin Scenarios to Reproduce the AWS S3 Outage
Overview
Gremlin is a simple, safe and secure service for performing Chaos Engineering experiments through a SaaS-based platform.
This tutorial will show you how to use Gremlin Scenarios to reproduce the AWS S3 Outage.
If you are interested in learning more about the outage, we have shared a detailed analysis of the 2017 S3 Outage on our Gremlin Blog. A good question to ask yourself at every postmortem is “how do we ensure this never happens again?”
- Step 1 - Create a Sample App
- Step 2 - Build and run your Sample app using Docker and NGINX
- Step 3 - View your Sample app in your browser
- Step 4 - View your Sample app via VNC
- Step 5 - Running the Gremlin Unavailable Dependencies Scenario to reproduce the S3 outage
- Step 6 - Viewing the result of the Gremlin Unavailable Dependencies Scenario
Prerequisites
- An Ubuntu 18.04 host with the Gremlin agent and Docker
- A Gremlin Account (sign up here)
Step 1 - Create a Sample App
Connect to your host with ssh and create a Dockerfile:
Create the following index.html file:
Save the index.html file.
Step 2 - Build and run your Sample app using Docker and NGINX
Next, build the Dockerfile by running the following:
Now we can run our image by using
Step 3 - View your Sample app
Now you can see your sample running @ <span class="code-class-custom">your_server_ip:8080</span>
Step 4 - View your Sample app via VNC
The Gremlin Unavailable Dependencies Scenario uses a Blackhole attack. We will be using this Blackhole attack to disallow images stored in S3 from loading. To see the results of this Blackhole Network attack we will be using a service called VNC.
On your host, install the Xfce and TightVNC packages:
To complete the VNC installation run the following command, you will be prompted to enter a password:
Next, test the VNC connection on your local computer. Run the following command which uses port forwarding :
Now you can use a VNC client to connect to the VNC server at <span class="code-class-custom">localhost:5901</span>. You’ll be prompted to authenticate. Use the password you set up earlier. You can use the built-in program for Mac called Screen Sharing or VNC Viewer to view your Xfce Desktop.
On your host you will need to ensure you have a browser installed, install Firefox by running the following command:
Before you move onto the next step, ensure that you are able to view the sample app using VNC. Connect to <span class="code-class-custom">localhost:5901</span> and click on <span class="code-class-custom">Applications > Internet > Firefox Web Browser</span>:
Using Firefox, navigate to localhost:8080, you should see the following:
Now we’re ready to run the Gremlin Unavailable Dependency Scenario to reproduce the S3 Outage.
Step 5 - Running the Gremlin Unavailable Dependencies Scenario to reproduce the S3 outage
First, navigate to Recommended Scenarios within the Gremlin UI and choose the Unavailable Dependency Scenario:
Next, select Add Targets and run. Then select your host using the local-hostname option:
Then click customize:
Next, you will click Add Attacks, this will take you to the Gremlin Attack configuration. To reproduce the S3 outage modify the default scenario to include 1 x Blackhole attack impacting AWS S3 us-east-1. You will need to make these changes in the Providers section of the Blackhole attack, see the screenshot below:
Next, click Unleash Scenario. Your Gremlin Scenario will now be running and it will begin to reproduce the S3 Outage:
Step 6 - Viewing the result of the Gremlin Unavailable Dependencies Scenario
To view the S3 Outage being reproduced open your VNC viewer and reload your Firefox tab, you will notice that the images stored in us-east-1 on S3 no longer load:
Conclusion
This tutorial has demonstrated how you can use the Gremlin Recommended Scenario “Unavailable Dependency” to reproduce the AWS S3 Outage. This demonstrates that our sample app is not resilient to this outage.
To ensure we could reliably handle this scenario, we could run this Gremlin Scenario again after rectifying this situation with one of many options:
- S3 failover
- Multi-cloud storage
- Multi-CDN
- Handling image errors in the browser using React
Avoid downtime. Use Gremlin to turn failure into resilience.
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.