Wednesday, October 12, 2016

Containerize your application to achieve Scale

This post talks in general about Containers; their evolution and contribution in scaling systems.

Once upon a time, applications used to run on servers configured on bare mettle sitting in companies own data centers. Provisioning used to take anywhere from few days to few weeks. Then came Virtual Machines which use hardware visualization to provide isolation. They take time in minutes to create as they require significance resource.  Then finally; here, comes a brand new guy in the race, which takes 300 ms to couple of seconds to bootstrap a new instance; yes I am talking about containers. They don't use hardware virtualization. They interface directly with the host's linux kernel.


Managing VMs at scale is not easy.  In-fact, I find difficult to manage even couple of VMs :D So just imagine how difficult it would be for companies like Google and Amazon which operate at internet scale.

Two features which has been part of Linux Kernel since 2007 are cgroups and namespacesEngineers at Google started exploring process isolation using these kernel features (to manage and scale their millions of computing units). This eventually resulted in what we know today as containers. Containers inherently are light weight and that makes them super flexible and fast. If containers even think of misbehaving, they can easily be replaced by another brand new container because the cost of doing so is not high at all. This means, they need to be run in a managed and well guarded environment. Their small footprint help in using them for specific purpose and they can easily be scheduled and re-arranged/load balanced. 

So one thing is quite clear, Containers are not brand new product or technology. They use existing features of OS. 

With containers the actual problem of making every component of a system resilient and bullet proof doesn’t hold good. This seems contradictory - we want to make systems more resilient but containers themselves are very fragile. This means any component deployed in them automatically becomes non-reliable. 
We can design our system with assumption that containers are fragile. If any instance failed - just mark it bad, replace it with a new instance. With containers the real hard problems are not isolation but orchestration and scheduling.

Read more in details on Containers vs VMs


Containers are also described as jail which guards the inmates to make sure that they behave themselves. Currently, one of the most popular container is Docker. And at the same time there are tools available to manage or orchestrate them (one of the most popular one is Kubernetes from Google).


Happy learning!!!