Platform > Amazon CloudWatch Health Check

Amazon CloudWatch Health Check

Supported platforms:

N/A

Gremlin offers several ways of creating AWS Health Checks: either adding an Amazon CloudWatch monitor or alarm, or by automatically creating Health Checks using Intelligent Health Checks. In both cases, the first step is to grant Gremlin permission to access your CloudWatch environment if you haven't already done so.

‍

Authenticating Gremlin to AWS

Before creating an AWS Health Check, you’ll need to grant Gremlin permission to read your CloudWatch environment. Gremlin supports two methods of authenticating to AWS: using an IAM role, or using a service account. IAM roles are the recommended method, as they allow you to grant access without sharing your AWS credentials. We’ll explain both methods below, starting with IAM.

Note

Gremlin requires the cloudwatch::DescribeAlarms permission in order to use CloudWatch alarms as Health Checks.

‍

Authenticating Gremlin to AWS using an IAM role

Gremlin can authenticate using IAM in one of two ways:

Automatically by deploying a Cloud Formation template. This is the easiest and fastest way to create the necessary permissions.
Manually creating IAM policies and roles for Gremlin. This is slower, but gives you greater control over the created resources.

‍

To authenticate Gremlin using an IAM role:

Log into the AWS Console and navigate to IAM (or click on this link). Keep this screen open in a separate browser window or tab.
In a different browser window or tab, open the Health Checks page in the Gremlin web app, click + Health Check, then select AWS from the Integrations drop-down.
Under Authentication, select IAM Role.
Choose the method you want to use to grant Gremlin permissions.
1. If you want to let Gremlin create the permissions for you using Cloud Formation, select Cloud Formation, then click Launch Stack. Follow the instructions, then continue after the "configure the IAM role manually" section.
If you want to configure the IAM role manually, select Manual.
1. In the AWS Console, click on Policies in the left-hand navigation menu.
2. Click Create policy.
3. Change the Policy editor type from Visual to JSON.
4. Enter the JSON shown under the "Policy JSON" heading below, then click Next:
5. Give the policy a name, such as “gremlin-policy”. Review the changes, then click Create policy.
6. After creating the policy, click Roles in the left-hand navigation menu, then click Create role.
7. Select Custom trust policy, then enter the text shown under the "Custom trust policy JSON" heading below.
8. Click Next.
9. On the Permissions policies screen, search for the policy you just created. Click on the checkbox next to its name to select it, then click Next.
10. Click Next.
11. Enter a name for your role, such as “gremlin-role”. Review the changes, then click Create role.
Select your newly created IAM role from the list and look for the ARN field. You’ll see an alphanumeric string starting with “arn:aws:iam”. Copy this string and paste it into the AWS IAM Role ARN field in the Gremlin web app.
In the Gremlin web app, click Save to finish creating your authentication.

Policy JSON

JSON


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:Describe*",
                "cloudwatch:Get*",
                "cloudwatch:List*",
                "route53:GetAccountLimit",
                "route53:GetChange",
                "route53:GetCheckerIpRanges",
                "route53:GetDNSSEC",
                "route53:GetGeoLocation",
                "route53:GetHealthCheck",
                "route53:GetHealthCheckCount",
                "route53:GetHealthCheckLastFailureReason",
                "route53:GetHealthCheckStatus",
                "route53:GetHostedZone",
                "route53:GetHostedZoneCount",
                "route53:GetHostedZoneLimit",
                "route53:GetQueryLoggingConfig",
                "route53:GetReusableDelegationSet",
                "route53:GetReusableDelegationSetLimit",
                "route53:GetTrafficPolicy",
                "route53:GetTrafficPolicyInstance",
                "route53:GetTrafficPolicyInstanceCount",
                "route53:ListCidrBlocks",
                "route53:ListCidrCollections",
                "route53:ListCidrLocations",
                "route53:ListGeoLocations",
                "route53:ListHealthChecks",
                "route53:ListHostedZones",
                "route53:ListHostedZonesByName",
                "route53:ListHostedZonesByVPC",
                "route53:ListQueryLoggingConfigs",
                "route53:ListResourceRecordSets",
                "route53:ListReusableDelegationSets",
                "route53:ListTagsForResource",
                "route53:ListTagsForResources",
                "route53:ListTrafficPolicies",
                "route53:ListTrafficPolicyInstances",
                "route53:ListTrafficPolicyInstancesByHostedZone",
                "route53:ListTrafficPolicyInstancesByPolicy",
                "route53:ListTrafficPolicyVersions",
                "route53:ListVPCAssociationAuthorizations",
                "route53:TestDNSAnswer",
                "elasticloadbalancing:DescribeListeners",
                "elasticloadbalancing:DescribeLoadBalancerAttributes",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeRules",
                "elasticloadbalancing:DescribeSSLPolicies",
                "elasticloadbalancing:DescribeTags",
                "elasticloadbalancing:DescribeTargetGroupAttributes",
                "elasticloadbalancing:DescribeTargetGroups",
                "elasticloadbalancing:DescribeTargetHealth",
                "elasticloadbalancing:DescribeLoadBalancerPolicies",
                "elasticloadbalancing:DescribeLoadBalancerPolicyTypes",
                "elasticloadbalancing:DescribeInstanceHealth",
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribePolicies",
                "ec2:DescribeRegions",
                "ec2:DescribeAvailabilityZones"
            ],
            "Resource": "*"
        }
    ]
}

‍

Custom trust policy JSON

JSON


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::157733958145:role/GremlinReliabilityAnalyzer"
                ]
            },
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "$YourCompanyID"
                }
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

‍

Authenticating Gremlin to AWS using a Service Account

To authenticate Gremlin using a service account:

Open the AWS Console and log into your AWS account.
Navigate to Identity and Access Management (IAM), or click this link.
Select Users from the left-hand navigation menu.
Select the user you want to use as the service account, or create a new user. This user must have access to read CloudWatch alarms.
On the user’s account page, select the Security credentials tab.
Under Access keys, click Create access key.
1. Select Third-party service as the use case.
2. Read the Confirmation, then check the box and click Next.
3. Enter a Description for the key, such as “Gremlin service account”.
4. Click Create access key. Keep this screen open.
In the Gremlin web app, enter your AWS account ID in the AWS Account ID field. You can find this by clicking on your organization name in the top-right corner of the AWS console.
Copy the value from the Access key field in AWS to the AWS Access Key ID field in Gremlin.
Copy the value from the Secret access key field in AWS to the AWS Secret Access Key field in Gremlin.
Click Save to validate and save your new AWS authentication.

‍

Using Intelligent Health Checks

For AWS services that are mapped to an Elastic Load Balancer (ELB), Gremlin can automatically create Health Checks for you. To enable Intelligent Health Checks:

Open the service's Settings page and navigate to Health Checks.
Double-check the Mapped ELB field to ensure the correct ELB is mapped to this service. Using the wrong ELB will result in inaccurate testing scores.
Click on the checkbox labeled Use Intelligent Health Checks for this service. Gremlin will generate a set of Health Checks for your service.

These Health Checks can be used instead of—or in tandem with—regular Health Checks, but they can't be used in Scenarios.

Enabling Intelligent Health Checks in Gremlin.

‍

Adding an AWS CloudWatch alarm as a Health Check

Instead of using Intelligent Health Checks, you can also use any CloudWatch alarm as a Health Check. To add a CloudWatch alarm as a Health Check:

Open the Gremlin web app and navigate to Health Checks, or click this link.
Click + Health Check.
From the Observability Tool drop-down, select AWS. If you’ve already authenticated Gremlin to your AWS account, select your account ID from the AWS Account ID box. Otherwise, follow the instructions above. Click Next.
Enter a Name for the Health Check. We recommend using the same name that you use in CloudWatch.
Select Create a Health Check from an AWS CloudWatch Alarm URL.
Open the alarm you wish to use in the AWS Console, then copy its URL from your browser window.
Go back to the Gremlin web app and paste the URL into the Monitor or Alert URL box.
Click Test Health Check to confirm that Gremlin can access your monitor, and that it’s reporting back as healthy.
Click Create Health Check.

Alternatively, you can define custom success criteria for your Health Check by using the AWS API directly.

After entering the name of your Health Check, select Create a Health Check from AWS API.
Copy and paste the URL of your CloudWatch alert into the Monitor or Alert URL box.
Click Test Connection to confirm that Gremlin can access your monitor. Gremlin will also show the HTTP response code and the JSON body of the response.
Set the Success Evaluation Criteria. This is the criteria Gremlin will use to determine whether the alert is healthy, or if it’s in an alarm state. By default, Gremlin checks the value of `.DescribeAlarmsResponse.DescribeAlarmsResult.MetricAlarms[0].StateValue` to see if it equals OK. You can use any field here and compare it to any value. You can also specify the HTTP status code to look for, and set a maximum response timeout.
Click Create Health Check.

Retrieving the URL for a CloudWatch Health Check

Creating a new CloudWatch Health Check in Gremlin

Confirming the validity of a CloudWatch alarm

‍

Health Checks