In order to use Gremlin on your systems, you'll need to install the Gremlin Agent. The Gremlin Agent is an executable that you install onto the resources you wish to run tests on (i.e. hosts, containers, and Kubernetes clusters). The Gremlin Agent authenticates with Gremlin's backend servers (also called the Gremlin Control Plane), which then allows you to use the Gremlin web app, REST API, or CLI to view your systems and run tests.
A service is a discrete unit of functionality provided by one or more systems in your environment. For example, a web server deployed as a load balancer for your backend systems is a service. In Gremlin, services are the units used to test and measure the reliability of your system. This page will show you how to add, manage, and test your services using the Gremlin web app.
A Health Check checks the state of systems before, during, and after an experiment, Scenario, or reliability test. They're used to monitor the state of your systems to ensure they're still operating within your expectations. Health Checks also provide a level of safety when running tests: if your systems become unstable, unresponsive, or unhealthy, Health Checks will automatically halt ongoing tests and return your systems to normal operation.
Reliability tests test a specific behavior of your service, such as autoscaling CPU and memory, zone and host redundancy, and dependency failures. While a test is running, Gremlin continuously monitors your service's state using its Health Checks. If any of your Health Checks become unhealthy during a test, then the test is immediately halted and marked as a failure. Otherwise, it's marked as passed.