- NewFailure Flags now supports services running on the Istio service mesh (via Envoy). Learn more in our docs.
- NewYou can select Argo Rollouts when running experiments and creating services. These will appear in the Kubernetes tab as “Rollouts.”
- NewYou can now select custom targets when creating a service via AWS Elastic Load Balancers (ELBs).
- NewWe’ve updated the Failure Flags experiment creation screen. You can now define Failure Flags experiments using drop-downs, instead of entering JSON. You can still define experiments using JSON by selecting “Custom” for the experiment type.
- InfoWe’ve improved the accuracy of our dependency detection algorithm. Gremlin will now only suggest dependencies that have open network connections.
- NewYou can now create and assign custom roles to users via customizable role-based access controls (RBAC). This lets you control which actions users can perform, including which experiments they can run.
- NewPrivate Network Integration (PNI) agents can now be scoped to individual teams. This lets you deploy and use different PNI agents for each team.
- InfoNetwork experiments have been disabled for Gremlin agent versions 2.44.0 to 2.48.1. Please upgrade to a more recent version.
- NewGremlin has a new onboarding process for AWS users that makes it easier to add services and create Health Checks.
- NewGremlin can now automatically generate Health Checks for AWS services added via an Elastic Load Balancers (ELB).
- NewThe Well-Architected Cloud Test Suite is a brand new test suite for workloads running on cloud platforms. It’s built on the existing Gremlin Recommended Test Suite, with additional tests that cover cloud-specific reliability risks.
- NewAdded several AWS-specific Detected Risks: Load Balancer Availability Zone Redundancy, Load Balancer Deletion Prevention, and Cross-zone Load Balancing.
- NewInclude and exclude specific Detected Risks in your custom reliability test suite
- NewReliability Management is now available for Windows-based services
- NewNew Experiment type: Process Exhaustion for Windows
- NewThe Gremlin agent now supports OpenShift 4.14
- NEWNew team setting: limit running experiments during certain time windows
- NEWAPI Audit Logs wrappers: Convenience APIs for auditing test activity
- NEWNew Experiment type: Thread Exhaustion for Linux simulates running processes on a target to consume process IDs (PIDs).
- NEWDNS-based dependency discovery for faster and more accurate discovery
- Info A number of UX improvements and bug fixes
- NEWSecret Management: Gremlin now integrates with AWS Key Management Service for agent authentication
- NEWThe Company Summary report can now be filtered by tags, enabling custom views and reporting of risks, services, and scores
- Info Detected Risks are now included in the reliability score for each service Scenarios page.
- New You can now mark a service dependency as a single point of failure to prevent tests being run against it
- New New company setting: limit running experiments during certain time windows
- New New company setting: disable “Target All”option from Experiment and Scenario screens
- New Custom Test Suites - You can now create custom RM Test Suites using Scenarios. This also lets you customize how reliability scores are calculated.
- New Parallel Scenarios - Gremlin now supports running multiple experiments in parallel within a Scenario.
- New CI/CD Integration - We've added CI/CD examples for running a Scenario, running an RM test, and getting an RM score using GitHub actions and Jenkins pipelines.
- Info Improved view for Company Summary reports (previously called the Dashboard). Plan usage has been moved to Company Settings.
- Info Scenarios can now be deleted from the Scenarios page.
- New Gremlin now supports delegation of Namespaces to a Team for service creation (manual and automatic)
- New Added Detected Risks for automatically detecting high-priority reliability concerns in a Kubernetes environment.
- New Launched the beta release of Failure Flags, Gremlin's new framework for running Chaos Engineering experiments on AWS Lambda functions, serverless workloads, and containers.
- New Added service annotations, which lets you automatically register your Kubernetes services in Gremlin by adding a simple annotation.
- New Added web app support for managing multiple services simultaneously. This also includes adding a Health Check to multiple services. The Services list has been reworked to reflect this change.
- New AWS CloudWatch Update Improved search and automated health check creation.
- New Label targeting support for Kubernetes in FI allows for finer grained targeting (and Service creation heuristics) based upon labels within Kubernetes in FI
- New Configurable Session TTL and Session Renewal Options
- New Fixed issue for some customers requesting shared access to new namespaces from the experiment targeting page
- Fix Fixed issue for TEAM_VIEWER role missing access on some endpoints for viewing attacks
- New Certificate Expiry test added to the list of available Fault Injection experiments.
- New Amazon CloudWatch added as an observability tool for Health Checks.
- Info Renamed "attacks" to "experiments" (this does not change the Gremlin REST API
/attacks
endpoint).
- New Shared Health Checks - Teams can now share Health Checks with each other at the Team level, making it easier to collaborate and ensure system availability.
- New Datadog Health Checks - Datadog customers can now access a searchable drop-down menu of monitors, making it easier to find and select the right monitor for their needs.
- New Additional Providers for Network Tests - We have extended the set of network IP ranges that we are collecting, providing more comprehensive network test coverage.
- New Added the ability to create a service without having to add Health Checks. You will still need to add Health Checks to run reliability tests, however.
- New The Gremlin Kubernetes Agent now supports ARM64.
- New Attacks can now target Kubernetes resources with restricted network access.
- New Gremlin now supports targeting
DeploymentConfig
objects in OpenShift. - New Added an overall reliability score trend line to the Company Dashboard, as well as the ability to export the Company Dashboard to PDF.
- New The screen shown when running a reliability test now shows a visual timeline of the test, including when health checks were performed.
- Info "Status Checks" and "Golden Signals" have been renamed to "Health Checks".
- New First public release of the Gremlin Reliability Management (RM) API. Click here to see the relevant API documentation.
- New Made several improvements to the RM Services dashboard, including a trend line graph showing each service's reliability score over the past 30 days.
- New Added the ability to review and add suggested dependencies to Gremlin RM services.
- New Added the ability to disable API key usage at the company level.
- New Gremlin now supports running network attacks on multiple NICs.
- New Added a "Run All Tests" button to the service creation process and on the service overview page.
- New Added support for additional Datadog regions, specifically the EU Region.
- New Added the ability to flag services as Production, which adds additional warnings before you can run tests.
- New In Gremlin Fault Injection (FI), newly discovered agents will automatically join ongoing attacks (as long as they meet the targeting criteria).
- Info The Getting Started page will now automatically refresh to show newly detected agents.
- New Added the ability to assign an owner to a service. By default, this is the user who created the service. This can be changed in the Service Settings screen.
- Info We increased the process collection frequency for new accounts. This means new Gremlin users will no longer need to wait up to an hour before being able to create their first service.
- New New users can now sign up for a 30-day trial of Gremlin Reliability Management.
- New Added the ability to integrate with load testing tools.
- New Added support for integrating with Datadog's EU Region for Health Checks.
- New Added an in-app Getting Started page to Gremlin Reliability Management.
- New Process Discovery is enabled by default for new agent installations.
- New Added the option to Autoschedule all reliability tests.
- New Gremlin Reliability Management Platform released! Includes a pre-defined suite of reliability tests and service reliability scores to proactively measure reliability posture.
- New A Rerun Scenario button is now available for every Scenario on the GameDay Runs tab.
- Info Attachments on the GameDays Summary tab are now limited to 50 MiB.
- New GameDays are now available in Gremlin - you can perform all GameDay activities from planning, running, and sharing results in Gremlin. See GameDays Overview for more information.
- New Jira integration was added to Scenario Runs and GameDay Summaries. See Tracking results in Jira for details.
- New Updated the Gremlin Datadog integration to include Kubernetes objects for observability of Gremlin attacks in Datadog. Tags added include cluster, namespace, specific object (deployment, statefulset, replicaset, and so on), and pod name/ID.
- New Custom Azure tags are now supported for the Linux agent, version 2.22.5.
- New You can now run multiple network attacks on the same target specifying different network devices.
- Info The Application Level Fault Injection (ALFI) feature was deprecated.
- New On the Attacks page, unhealthy Kubernetes pods now have a visual warning indicating that they should not be targeted. Targets containing unhealthy pods now have a visual warning as well.
- New Gremlin now supports network attacks for IPv6.
- New Error codes from killed executions are now shown on the Attack Details page.