Highlight Chaos Engineering experiments with AppDynamics and Gremlin webhooks

Taylor Smith
Technical Product Marketer
Last Updated:
August 11, 2020

Chaos Engineering with Gremlin is a powerful way to tune your monitoring to ensure you are gathering actionable data and to train your teams to leverage these tools so that observability expertise isn't siloed in your organization. AppDynamics is an application performance management tool used by companies worldwide to monitor their workloads. Used in combination, these two tools can help you lower your mean time to detection (MTTD) and increase the availability of your applications. This tutorial will walk you through how you can correlate an attack from Gremlin to its impact in AppDynamics.

Prerequisites

Step 1: Make a custom event role in AppDynamics

First, you need to add a role for Gremlin to post custom events to AppDynamics. Go to Settings -> Administration

Select Roles -> Create. Provide a Name: <span class="code-class-custom">Gremlin Events</span>. Then, select Applications -> Check “View” and click on “Edit” and check “Create Events”.

Click “Save”.

Step 2: Make a custom event user

Now you need a user that has the Gremlin Events role. Go to Users -> Display Users from “AppDynamics”, click “Create”.

Enter Username: gremlin, Email: <span class="code-class-custom">{your_email}</span>, Name: <span class="code-class-custom">Gremlin Events</span>, Password: <span class="code-class-custom">{password}</span>.

Roles -> Add “Gremlin Events”

Click “Save”.

Step 3: Gather your account name and app ID

You’ll need to get your user path and endpoint to send to AppDynamics. In AppDynamics, go to Settings -> License

Go to Account. Take note of your Account Name next to “Name”, you’ll use that in the next step.

Open the Applications dashboard and select the application you wish to experiment on. In the URL, you’ll find <span class="code-class-custom">application={app_id}</span>. You can see mine below is <span class="code-class-custom">10691</span>. Grab that app_id number for Step 5.

Step 4: Encode your login key

Now that you’ve gathered that information, you’ll need to encode it for the Authorization header in the webhook. In your terminal (Mac or Linux) enter:


echo -n 'gremlin@{Account Name)':'{password}' | base64

or Command Prompt (Windows) enter:


[Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes(‘gremlin@{account_name}:{password}’))

Save that output for the next step.

Step 5: Create 2 Gremlin webhooks

The next step is to create two webhooks - one for when the Gremlin attack starts and one for when it finishes. Go to Settings -> Team Settings

Select Webhooks -> New Webhook. Enter the Name <span class="code-class-custom">AppD Basic Webhook Start</span>, your Description and the following URL with your own controller’s address:


https://{controller_address}/controller/rest/applications/{app_id}/events?severity=INFO&asummary=gremlinStart&eventtype=CUSTOM&customeventtype=gremlinStart

Then add a header key:value with the key <span class="code-class-custom">Authorization</span> and the value using the key generated from the previous step:


Basic {your_encoded_key}

And select “Attack Running” and Save.

Add a second webhook with the Name <span class="code-class-custom">AppD Basic Webhook Finish</span>, your Description and the following URL with your own controller’s address:


https://{controller_address}/controller/rest/applications/{app_id}/events?severity=INFO&summary=gremlinFinish&eventtype=CUSTOM&customeventtype=gremlinFinish

Then add a header key:value with the key <span class="code-class-custom">Authorization</span> and the value using the key generated from the previous step:


Basic {your_encoded_key}

And select “Attack Finished” and Save.

Step 6: Create a dashboard in AppDynamics

You’ll need a way to visualize the attack. In AppDynamics, go to Dashboard & Reports -> Create Dashboard. Enter the Name <span class="code-class-custom">Gremlin Attack Dashboard</span>.

Click “Add a Widget” -> “Time Series Graph” and click the + sign under Data. Under “Select Data Source” select “Servers” and under “Select a Metric” select “Hardware Resources|CPU|%Busy” and Save.

Under Events, select “Show Events” and the Data Source select the application you grabbed the app_id from in Step 4. Under “Filter Criteria”, unselect all items and select “Custom”. Click Save.

Click “Add Widget” again and select “Health Rules & Events” then “Event List”.

Under Events select Show As “Timeline” and the Data Source as your application you chose in Step 4. Under Filter Criteria, unselect all then select Custom. Click Save.

Your simple dashboard is all set up.

Step 7: Run a CPU attack

In Gremlin, go to Attacks -> New Attack. Select the host(s) where you have the AppDynamics agent(s) installed. Select “Choose a Gremlin” and Resource -> CPU. Set the length to <span class="code-class-custom">300</span> seconds, CPU Capacity of <span class="code-class-custom">60</span>%, and <span class="code-class-custom">All Cores</span>.

Click “Unleash Gremlin” and head back over to your AppDynamics dashboard. In the dashboard, you can see where the attack started and the CPU spike and when it finished and the CPU wound down.

Conclusion

The CPU attack is a great first attack, but using Gremlin and AppDynamics together, you can do many more experiments, like tracing the impact of a little backend latency to front end latency to watch for exponential latency. Additionally, using Gremlin, you can test your thresholds to tune your AppDynamics alerting to prevent noisy alerts. Fire up an attack and make sure your alerts fire at the appropriate time. Target random hosts to make sure you cover your application.

We look forward to seeing what you build!

No items found.
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.
start your trial

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

Product Hero ImageShape