How reliability differs between monolithic and microservice-based architectures
Microservices have forever changed the way we build applications. Tools like Docker and Kubernetes made microservice-based architectures widely accessible to software developers, and cloud platforms like Amazon EKS made deploying containers fast and inexpensive. They've also enabled even small engineering teams to deploy code faster, leverage fault tolerance and redundancy, scale more efficiently, and take full ownership of their services from development all the way into production. Between 2020 and 2026, the percentage of organizations using containers in production is expected to rocket from under 10% to 90%.
But just because microservices are popular doesn't mean they're easy to adopt, especially for developers who are more familiar with monolithic applications. DevOps teams need to consider how their methods of building, maintaining, and operating applications will change. Otherwise, not only will you miss out on many benefits of microservices, but your applications will be even more susceptible to reliability risks.
Key differences between monoliths and microservices
First, let's consider how microservice-based applications differ from monolithic applications.
In a monolith, the entire application is developed, packaged, and deployed as a single unit. If a developer modifies any part of the application, the entire package needs to be rebuilt, retested, and redeployed. Since the monolith is one unit, scaling or replicating it means redeploying the entire application, resulting in wasted resources.
Conversely, microservices split up an application's functionality into many small, lightweight, independently operable units called services. These services communicate over networks via APIs, but are otherwise separate from each other. This means teams can update, restart, scale, and replicate individual services without impacting others, and while only consuming the resources their service needs. This makes them much more efficient and manageable, despite the increased complexity.
Reliability concerns when using microservices
Microservices have unique reliability risks that might not be obvious at first. There are five in particular that we'll look at:
- Servers are no longer highly specialized and heavily customized, but disposable and interchangeable.
- Applications are short-lived and ephemeral.
- Data persistence becomes much more difficult.
- Communication between functions takes place over the network instead of locally.
- Engineering teams need to be trained on an entirely new way of developing and supporting applications.
Servers are disposable and interchangeable
Before orchestration tools and cloud computing, it was common for system administrators to manually provision, configure, and maintain servers using tools like SSH and Remote Desktop. This led to servers that were highly customized, hard to recreate, and critical to the business. With cloud computing, this perception started to change. Now, servers are temporary commodities that can be rented and returned more easily than a library book.
This is perfect for containers, which are the most common method of deploying microservices. Since a container will run the same way no matter where it's deployed, we don't need to worry about configuring individual servers. We just need to configure the container itself, then deploy it to our cluster. Orchestration tools like Kubernetes handle the behind-the-scenes work of finding a node to deploy to, or scaling up the cluster to make room. This makes it much easier to add or remove servers based on price, capacity, or customer traffic, but it requires a shift in how we think about and manage servers.
Containers are ephemeral
Applications no longer run indefinitely. Instead, services constantly start and stop as developers push updates, as nodes are added to—and removed from—the cluster, and as services scale up and down in response to customer traffic. Kubernetes also monitors the status of containers and automatically restarts them if they become unresponsive or return errors. Developers can take advantage of this to build applications that automatically restart after a crash, critical bug, or other problem that would normally take an application offline.
Data persistence becomes more difficult to manage
Ephemerality introduces the problem of stable, persistent data storage. Since servers can start and stop at any time, and containers can run on any node, storing application data on a local hard drive can lead to data loss. Instead, we need distributed filesystems like Gluster, cloud storage services like Amazon Elastic Block Storage, or more traditional networked filesystems like NFS to make persistent storage available cluster-wide. Kubernetes supports a wide range of storage types that appear transparent to the containers using them.
Switching from local to network communication adds latency
Passing data between functions in a monolithic application is usually extremely fast and reliable, since the entire application is running on the same host. However, sharing data between services in a microservice application is much less predictable, since nearly all communication takes place over the network. We have to assume that two services are running on different hosts at any given time, which means added latency. We also need to account for the increased risk of the two services losing connection to each other due to host outages, network outages, or one of the services needing time to scale up. All of this requires changes to our applications to make them more fault-tolerant.
Engineering teams need educating and training
Moving from a monolithic architecture to microservices is a significant shift in how engineers conceptualize, build, and maintain applications. Teams need time to learn the new architecture, embrace best practices for building in it, identify common bugs and "gotchas" to avoid, and much more. All of this adds to development time and increases the risk of failure modes being introduced, at least for the initial onboarding period.
For example, if a team is used to programming one specific component of a standalone monolithic application, network communication could be an entirely new domain for them. They may need to learn about REST APIs, RPC, and fault tolerance before touching their first container. And of course, there are reliability best practices specific to Kubernetes such as Chaos Engineering and scanning for risks.
Conclusion
Migrating from monoliths to microservices can have incredible benefits, but it also comes with risks. Identifying and addressing these risks will help streamline your migration, improve the experience for your customers, and allow your developers to work more effectively.
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.
sTART YOUR TRIALTo learn more about Kubernetes failure modes and how to prevent them at scale, download a copy of our comprehensive ebook
Get the Ultimate GuideIf you're adopting Kubernetes, you need Chaos Engineering
When Ticketmaster started their Kubernetes migration, they had to address a huge problem: whenever ticket sales opened for a popular event, as many as 150 million visitors flooded their website, effectively causing distributed denial of service (DDoS) attacks. With new events happening every 20 minutes and $7.6 billion in revenue at stake, outages could mean hundreds of thousands in lost sales.
When Ticketmaster started their Kubernetes migration, they had to address a huge problem: whenever ticket sales opened for a popular event, as many as 150 million visitors flooded their website, effectively causing distributed denial of service (DDoS) attacks. With new events happening every 20 minutes and $7.6 billion in revenue at stake, outages could mean hundreds of thousands in lost sales.
Read moreUpdated January 24, 2020
Read more