How to install Gremlin on ECS
This advanced installation guide will walk you through installing Gremlin docker containers in your ECS environment, and verifying that you can run a CPU attack against the freshly installed Gremlin agents. In the verification steps we will be creating a container to run htop
exposed as a web interface via port 8888
, which will allow us to visualize changes in real time as a simple CPU attack is run against the container.
Prerequisites
- Functional ECS cluster, built in the region of your choice, utilizing EC2 backing instances. Fargate backed ECS is not currently supported.
- Private Subnet's in the ECS VPC that route through a NAT-GW. Gremlin will be deployed in those Private Subnet's
- Certificate based authentication should be used, and the certificates should be made available to the Gremlin daemon in
/var/lib/gremlin
.
Additionally, to get the most from this installation guide you should already be familiar with running Gremlin as a container. You can reference Install Gremlin in a Docker Container for help getting started with with Gremlin and Docker.
Step 1: Create the Task Definition
- Copy the provided JSON task definition into a text editor and supply vaulues for
my-team-id
,my-team-secret
,my-aws-account
. Additionally, review the Task Definition's limits and CPU architecture values to ensure they match your target environment. - In the AWS management console navigate to
Task Definitions
the ECS service, and chooseCreate New Task Definition
- Select
EC2
for the launch type compatibility and clickNext Step
- Scroll down to the bottom of the page and click the button
Configure via JSON
- Paste the edited task definition into the JSON text field and click the
Save
button
Step 2: Create the Daemon Service Definition
- In the AWS management console navigate to
Clusters
in the ECS service - Select the cluster you want to deploy Gremlin into
- On the
Services
tab, click theCreate
button
On the Configure service
page, set the parameters as follows:
- Select the
Launch Type
compute option. - Select the
EC2
launch type. - Select the
Service
application tpye. Task Definition
->Family
:gremlin
Task Definition
->Revision
:latest
Service type
:DAEMON
- The rest of the defaults are acceptable, click
Create
Step 3: Verify the Installation
- In the AWS management console navigate to the
Clusters
in the ECS service - Select the cluster you just deployed Gremlin into
- On the
Services
tab, you should now see theGremlin
service - Verify that
Desired tasks
matches the number of ECS hosts in your cluster - Verify that
Running tasks
matches the number ofDesired tasks
. Note that it can take several minutes for the ECS scheduler to launch Gremlin to full capacity - Once the
Gremlin
service is running at full capacity, navigate to https://app.gremlin.com/clients/infrastructure - You can search via the tag
platform=ecs
to verify that the Gremlin control plane can see the freshly launched ECS daemons - Navigate to https://app.gremlin.com/attacks/new and click on the
Containers
tab - Verify that you are seeing the application containers and tags currently running on your ECS cluster being imported into the Gremlin control plane
Step 4: Create a HTOP Elastic Container Repository with image
This will create a docker container that exposes htop via shellinaboxd on port 8888. htop is an interactive process viewer for Unix systems. We'll use htop as the target for an attack in step 8. Using htop isn't a requirement for installing Gremlin in Docker, but for this tutorial, using it makes it easier to see the impact of our attacks.
- In the AWS management console navigate to
Repositories
in the ECS service - If you don't already have a repository, click
Get started
at the top; otherwise clickNew repository
- In
Repository name
type inhtop
, then clickNext step
- Take note of the endpoint to push your docker image to, then click
Done
- SSH into an instance in your AWS environment with the AWS command line tools and docker installed (e.g a jump box)
- Authenticate docker client against ECR:
sudo $(aws ecr get-login --no-include-email --region us-east-1)
- Create and change directory to
~/docker-htop
;mkdir -p ~/docker-htop; cd ~/docker-htop
- Create the docker file:
cat <<< 'FROM alpine:latest
RUN apk --no-cache add --update htop && rm -rf /var/cache/apk/*
RUN apk --no-cache add --repository http://dl-cdn.alpinelinux.org/alpine/edge/testing shellinabox
ENTRYPOINT ["shellinaboxd", "-t", "-p8888", "-s/:nobody:nogroup:/:htop"]' > Dockerfile
- Create the docker image :
sudo docker build -t htop .
- Tag the image. To push to the repository, you'll need the end point details from creating the repository:
sudo docker tag htop:latest <<ACCNTID>>.dkr.ecr.us-east-1.amazonaws.com/htop:latest
- Push the container to ECR, again you'll need the end point details from creating the repository:
sudo docker push <<ACCNTID>>.dkr.ecr.us-east-1.amazonaws.com/htop:latest
Step 5: Create the HTOP Task Definition
- In the AWS management console navigate to
Task Definitions
the ECS service, and chooseCreate New Task Definition
- Select
EC2
for the launch type compatibility and clickNext Step
- On the
Configure task and container definitions
page, set the parameters as follows:Task Definition Name
:htop
Task Role
: Leave blankNetwork Mode
: Leave as<default>
Task execution role
: Leave asnone
Task memory (MiB)
:128
Task CPU (unit)
:128
- Click
Add container
, and in theAdd container
modal enter the following information, leaving defaults unless otherwise specified:Container name
:htop
Image
:<<ACCNTID>>.dkr.ecr.us-east-1.amazonaws.com/htop:latest
Private repository authentication
: Leave uncheckedMemory Limits (MiB)
:Hard limit
128
Port mappings
:Host port
:8888
;Container port
:8888
- Scroll down to the
Docker Labels
section and enter appropriate key-value tags, at a minimum we suggestapp:htop
- Click the
Add
button
- Scroll down and click the
Create
button
Step 6: Create a service definition for HTOP
- In the AWS management console navigate to the
Clusters
in the ECS service - Select the cluster you just to deployed Gremlin into
- On the
Services
tab, click theCreate
button - On the
Configure service
page, set the parameters as follows:Launch type
:EC2
Task Definition
->Family
:htop
Task Definition
->Revision
:latest
Cluster
-> The cluster you wish to deploy intoService name
:htop
Service type
:REPLICA
Number of tasks
: 1
- Click
Next step
to bring you to theSet Auto Scaling
page, andNext step
again - Review the service details to ensure accuracy, and if everything looks good click
Create Service
Step 7: Open HTOP
- In the AWS management console navigate to the
Clusters
in the ECS service - Select the cluster you just to deployed Gremlin into
- On the
Services
tab, click thehtop
service we just created - Click on the
Tasks
tab - Click on the task-ID for the running HTOP task
- Expand the htop container by clicking on the arrow next to the container name
htop
- In the
Network bindings
section, click on the providedExternal Link
- If the external link does not work, you may need to go into the security group associated with your ECS cluster and open port
8888
- If the external link does not work, you may need to go into the security group associated with your ECS cluster and open port
Congratulations, you should now see the htop
interface in your web browser. Leave this open, as we'll be referring to it in the next step.
Step 8: Run a test attack
- In a new browser window, open the link to the Gremlin Attacks UI: https://app.gremlin.com/attacks
- Click
New Attack
- Select the
Containers
tab - Select the HTOP container we created as your target by clicking the checkbox next to the container ID, you should be able to find this based on the
Docker
key-value pair we added,app:htop
- Click
Choose a Gremlin
- Select the
Resource
category andCPU
attack - Default values should be fine, click
Unleash Gremlin
to launch the attack - Observe in the open
htop
browser window that you can see the increased CPU load on the docker container.
Additional ECS Configurations
Now that you've ran a basic attack in ECS, there may be some advanced configuration that we want to make aware of:
networkMode
- This option determines which network space we would like to affect. In our example, we have it set toawsvpc
which means the task can only affect the awsvpc interface. Some other options are:host
,bridge
, ornone
. For more information, please consult the AWS guide on Network mode.pidMode
- This parameter allows you to configure the container to share their process ID with either the host or other containers in the task. By default, the setting is not stated. It may prove useful when performing process killer attacks to set this parameter tohost
. For more information, please consult the AWS guide on PID mode.
Conclusion
You now have Gremlin up and running in your ECS environment, and validated its functionality against a running htop
container. For security, you should remove the htop
container from your running cluster, as it's an unsecured metric view into your running environment.
Feel free to expand this to other ECS environments and have fun running Chaos Experiments!
Gremlin Task Definition JSON
Avoid downtime. Use Gremlin to turn failure into resilience.
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.