Strategies for migrating to Kubernetes
Migrating to a new platform can often feel like navigating a maze of technical challenges, especially when the platform is as complex as Kubernetes. Kubernetes has a vast number of features designed to help with deploying and managing large applications, but learning how to use it effectively can be just as challenging as moving your workloads over. This doesn’t mean it’s impossible, of course, and there are several strategies for easing this process.
In this blog, we’ll focus on three migration techniques: rehosting (also known as “lift and shift”), refactoring, and replatforming. We’ll explain how each one works, how it’s done, and what the drawbacks are.
Rehosting (“Lift and shift”)
Rehosting, or “lift and shift,” essentially means taking an application as-is and redeploying it onto another platform. In the case of Kubernetes, this means packaging your existing application into a container image and deploying it onto a Kubernetes cluster. The application itself doesn’t change significantly, only how it's deployed.
There are benefits to this approach, such as not having to rewrite the entire application to fit the new platform. It's also the fastest migration method, letting you move from your current platform onto Kubernetes without having to maintain both systems side-by-side. Later, once the team is comfortable managing the application on Kubernetes, you can consider refactoring your services to better fit the Kubernetes model.
The downside is that your application isn’t as optimized as it could be. While it might be running on Kubernetes, it’s not designed for Kubernetes. Monolithic applications generally take longer to start and stop than containers, and often aren’t built to scale dynamically or have multiple instances running simultaneously. This is especially important since redundancy is a core feature of Kubernetes. It can also cause problems when migrating or scaling containers across multiple nodes.
Lastly, Kubernetes allocates computing resources as efficiently as possible by “packing” containers onto nodes. Smaller containers are easier to deploy and scale, since they have lower capacity requirements than larger containers. Larger containers—like those running monolithic applications—are more difficult to deploy, since their resource requirements are much higher. You may end up with large chunks of unusable capacity, simply because your containers are too demanding.
Refactoring
Refactoring involves rewriting an application to take better advantage of a different architecture. Kubernetes is significantly different from traditional deployment models: it’s container-based, it's designed primarily for stateless applications (although it does support workloads with persistent data), and it has built-in redundancy and scalability controls. Traditional applications are often deployed into custom-tailored environments, are stateful by default, and rely on external tooling for redundancy and scalability.
So what could a refactor entail? The first step would be splitting off application functionality into standalone modules, which you can then package into containers. Kubernetes heavily relies on network communication, so part of this process will involve creating well-defined APIs for each module. Because each module is now independent, they can also fail independently of each other, which means engineers will need to build fault detection and tolerance into their modules in case any of their dependencies become unavailable.
Refactoring gives you the most bang for your buck, but it’s also the longest and most difficult of the three strategies. Not only are you rewriting your application, but your engineers are likely still learning Kubernetes and don’t yet understand all of its nuances and limitations. Engineers are also likely to rely on their existing knowledge of monolithic application design when working in Kubernetes, which can result in containers that straddle the line between microservices and monoliths.
When refactoring, it’s especially important to prioritize reliability testing throughout the development process. This helps proactively uncover failure modes so that our engineers can address them before they happen in production. In turn, this saves our customers and our incident response teams from the stress of a live system outage.
Replatforming
Instead of migrating or rewriting the entire application all at once, replatforming involves migrating one component at a time. Deploying a new feature is the perfect use case for this method: instead of building the feature into your monolithic application, build it as a microservice, deploy it using Kubernetes, and use API calls to communicate with your monolith. This lets you start using Kubernetes right away without having to rewrite or rehost your current application, while giving you time to learn how to effectively use and manage Kubernetes. Once you’ve proven the success of the initial deployment, you can then start replatforming existing parts of your application, with the goal of migrating all of them.
Replatforming has a number of benefits. For one, it greatly reduces the risk of a failed migration by containing it into smaller modules. Where a failed refactoring could have significant costs in engineering time and effort, a failed replatformed module is a much lower investment. Replatforming also lowers the impact that reliability issues could have on the application by limiting them to individual services.
Still, there are reliability risks that can pop up during this process. Running reliability tests after each replatforming helps uncover problems that otherwise would’ve remained hidden until the end of the process. Testing throughout the process leaves you with a resilient, well-understood deployment model that's designed to take advantage of Kubernetes' reliability features.
Which Kubernetes migration method is best?
Of these three choices, there’s no one clear “best” option. Lift-and-shift gets teams onboard with Kubernetes quickly, but at the risk of not taking full advantage of Kubernetes’ scaling and redundancy features. Refactoring lets teams get the most out of Kubernetes by adapting their applications to it, at the cost of upfront development time. Replatforming gives the best of both worlds, but adds complexity as teams have to manage two different deployment models simultaneously.
Regardless of the method you choose, remember that a migration is an ongoing process. It won’t be completed in a single weekend, and there will be surprises. Take the time to understand how Kubernetes’ architecture differs from more traditional architectures, and how to best leverage its redundant, scalable, and fault-tolerant design.
If you’d like to dig deeper into Kubernetes migration best practices, check out our ebook: Improving reliability during Kubernetes migrations.
Gremlin's automated reliability platform empowers you to find and fix availability risks before they impact your users. Start finding hidden risks in your systems with a free 30 day trial.
sTART YOUR TRIALTo learn more about Kubernetes failure modes and how to prevent them at scale, download a copy of our comprehensive ebook
Get the Ultimate GuideUpdated January 24, 2020
Read more10 Most Common Kubernetes Reliability Risks
These Kubernetes reliability risks are present in almost every Kubernetes deployment. While many of these are simple configuration errors, all of them can cause failures that take down systems. Make sure that your teams are building processes for detecting these risks so you can resolve them before they cause an outage.
These Kubernetes reliability risks are present in almost every Kubernetes deployment. While many of these are simple configuration errors, all of them can cause failures that take down systems. Make sure that your teams are building processes for detecting these risks so you can resolve them before they cause an outage.
Read moreIf you're adopting Kubernetes, you need Chaos Engineering
When Ticketmaster started their Kubernetes migration, they had to address a huge problem: whenever ticket sales opened for a popular event, as many as 150 million visitors flooded their website, effectively causing distributed denial of service (DDoS) attacks. With new events happening every 20 minutes and $7.6 billion in revenue at stake, outages could mean hundreds of thousands in lost sales.
When Ticketmaster started their Kubernetes migration, they had to address a huge problem: whenever ticket sales opened for a popular event, as many as 150 million visitors flooded their website, effectively causing distributed denial of service (DDoS) attacks. With new events happening every 20 minutes and $7.6 billion in revenue at stake, outages could mean hundreds of thousands in lost sales.
Read more