Virtual Machines, Containers, and Orchestrators: What Do Those Terms Mean?
Terminology is a wall that is hard to see over. It is easy to get confused by the cascade of terminology coming out of various software providers, and sometimes it’s not obvious what the Next Big Thing is until long after it established itself. By then, most people are chasing the next “Next Big Thing” and not leading the pack. So, let’s look at one of the most important shifts in Information Technology: the growth of virtualization.
Why Were Physical Machines A Problem? Servers, servers and more servers.
In the 1990s and into the early 2000s, each time a company needed to provide an electronic service to its employees, partners, or customers, it usually needed to buy a new server to run that service. When I started my career, if a company wanted to run an ecommerce website from its offices, it would need to buy at least a webserver and a database server. In addition, it would need to have a second web and database server for its web development team to work on a future release, and often a “Quality Assurance” or “Test” environment, to which the developers would practice deploying the website before going live. Before hosted email was prevalent, you would often see an email server in these companies, as well as file servers, print servers, accounting servers, and often some industry-specific servers. Each of these would require its own physical space, either a boxy “tower” or on a rack in a specified server room. They would each usually cost between several hundred and several thousand dollars. The practical upshot was that even core services would cost thousands of dollars to roll out even before software and licensing costs, which was a major obstacle to many companies.
Out of necessity, many companies would stuff multiple services on the same server: File servers would also serve as print servers, email and web hosting would run on the same machine, and so on. This led to users often complaining about how problems with one service, like editing documents, could suddenly cause all the printers in the building to stop working. This could lead to misconceptions about what services were intertwined; I was once asked by a very concerned user, quite earnestly, if the bathrooms would still work during a scheduled email outage.
The Rise of Virtual Machines
Although computer emulation has been around almost as long as computers have, it was only in the 21st Century that it became a mainstay of IT. The general idea is simple: if a computer has enough processing power, it can perfectly simulate another, smaller computer. That’s essentially what virtualization is: a big computer simulating many small ones. This is valuable for a number of reasons, but the first is the ability to leverage economies of scale and downtime. Emulation allowed for the separation required to prevent issues with one service effecting other services, while also cutting down on the need for hardware.
Most servers don’t work at full power all the time; your webserver only needs to think when somebody asks it for a web page, your print server is idle when nobody’s printing anything, and even when they’re working, they usually only use a fraction of their resources. If you have one big computer simulating those smaller servers, those small bits of downtime add up, and it means that to run 10 virtual servers, you might only need 1 server that’s three times more expensive than the average one it’s replacing. You end up with a 70% cost savings on the capital cost of hardware. That alone was enough of an argument for VMware to start selling its first enterprise-class virtualization solution in 2001.
Now, virtualization brought other benefits and risks. Administration and provisioning became much easier. When the IT department needed to add a new server, they only had to figure out how big that server needed to be, not to order new parts, have them delivered, and physically assemble and deploy a new machine. In addition, because the computers were completely virtual, so were their peripherals. Where a network failure or other problem with a machine would sometimes require a technician to venture physically into the datacenter to plug a monitor and keyboard into a physical box, with a virtualized solution, that could be done remotely from anywhere. Better yet, when more computing power was eventually needed, another large server could be ordered to expand the growing cluster of machines, and it didn’t need to be sized to a particular task, but could be bought at whatever price point provided the most computational bang for your buck without worrying about specific software requirements.
In terms of risks, let’s talk eggs. With virtualization, you are putting more eggs in fewer baskets. If something did happen to the physical machine running all of these virtual computers, whether it be a power issue or a hard disk failure, many services would go down simultaneously, and would remain down until the problem could be solved. Through the 2000s, we saw a proliferation of technologies designed to deal with these issues: redundant power, redundant storage, load balancers, instant fail-over, and many other high-priced, specialized technologies.
In addition, especially early on, there were problems finding qualified people who knew how to manage all these virtual baskets and all their delicate eggs. There are complications anytime you have many different systems using the same hard disk, or network connection, or anything else. Learning how to balance the needs of email, webservers, databases, and other systems when they’re using a shared pool of resources is not trivial. Many IT workers found themselves having mysterious problems because they treated virtualized servers just like physical servers. Careers and companies suffered during this training phase.
Finally, there was the overhead. Although modern-day virtualization servers are marvelously efficient, there are still significant overhead costs associated with simulating the smaller servers, not to mention the costs of running the “parent” operating system that the virtualization system itself runs on. In addition, these virtual machines tended to consume a lot of resources, as every single one needed its own copy of its operating system, its own discrete hard drive, its own backups, and so on. While these limitations could be managed aggressively to be more efficient, doing so was expensive in terms of IT effort. To solve this problem, a similar but lighter solution was needed. Enter operating system-level virtualization, known in the industry today as “containerization.” Yes, I know, wonky terminology at work.
Although containers in one form or another have existed since the 1980s, it was the rise of “Docker” in 2013 that catapulted them to their current prominence. The idea behind containers is that we don’t need to actually simulate an entire computer to create an independent environment; we just need to make sure that a group of programs running on a server is contained in its own batch of resources, such as limiting how much CPU, disk, memory, or network they can use. This keeps resource hungry programs safely in a container and not using their vampire-like powers to suck other programs dry. The result is this: You have all the best parts of a virtual machine, without the overhead of needing to run an entire separate simulated computer.
A lot of marketing and articles written about containers early on described them as essentially virtual machines, which led to a great deal of skepticism within the IT community. After all, we already had virtual machines for a decade or more, why did we need a version of virtual machines that forced you to use the same operating system as the host?
Where containers shone through, though, was in speed and efficiency of deployment. We’re best served here with an example. Let’s say a system engineer is trying to deploy a new webserver. The developers have their repository where the webpage is stored, and they update every few days. The system engineer creates a short script that basically describes how this web server should be built: Start with a copy of Apache (a popular web server software). Copy the files from the developer’s repository to the container. Start serving to the internet. Now, that tiny script, three lines long, can be sent to any of his container servers, and that server will start serving the website in question. He can send it out to as many servers as he wants. If he needs to update it, or the developers tell him they need another piece of software installed, he can just update that script and send it out again to each server. There are no provisioning or other issues for the systems engineer to resolve; it just works across his environment.
Containers made it vastly easier to deploy services across generalized virtual clusters, and to keep them up to date. Rather than having each virtual machine have 60 or 100 GB of hard disk space provisioned to it, container platforms would download a couple gigabytes of “image” from a public server that contained most of what they needed, and the actual customized part of application a company ran would be a couple of kilobytes of human-readable data. An entire architectural deployment could be attached to an email.
As a result, containers became smaller and smaller, and more and more numerous. To protect your website, you could run it on 4 or 5 servers, and just have your gateway routers randomly choose which one to send a particular request to. Even if one or two of those servers had to be taken down for maintenance, you wouldn’t lose any service. One container would run a database, one would run a webserver, and as webservices proliferated, sometimes many would run various APIs and other utilities.
Eventually it became clear that higher-level structures were needed. Containers needed a way to represent dependencies on each other, to check if other containers were running, for start-up processes to be automated, and for monitoring systems to be able to detect and correct problems in individual containers. In short, we needed to orchestrate this new ecosystem of containers we’d created. Cue the next term in our terminology game.
Container Automation Orchestration
In the beginning, container automation would only work with startup containers and assign them resources in a certain order. These were basically scripts that would start up a couple of containers, and give them links to the disk they needed, and to each other. This was great in that it provided reproducible scripts that could start or stop whole clusters of services on an individual server, but most of these early solutions only worked on a per-server level, or only on specific encoded sets of servers. If you added new machines to your containerization environment, you had to copy, modify, or otherwise maintain your scripts. Many of these tools were just existing automation tools turned to the task of maintaining containers.
A new approach came in the form of the wave of Container Orchestration tools, the leader of which is now Kubernetes. These tools allow administrators to more or less provide resources to a cluster of container servers, identifying those resources with tags that indicate “this disk is fast” or “this network goes out to the internet.” The administrators also provide the orchestration tools with descriptions of container sets (tagged with things like “this container needs a fast disk”) and can trust the tools to figure out how to get each container the resources it needs, and to intercede if a container runs into issues. These systems are smart enough to detect when a container has crashed, or a virtualization server has gone down, and restart that container on a different, more available node to ensure the service continues to server customers, employees, partners, or other users.
This orchestration marks a transition from systems administrators telling a computer to do specific things in a specific order, to describing how they want the system to be configured and trusting the orchestrator to figure out how best to do that with the resources it has available.
At the end of the day, we’re doing the same things we were doing in the 90s. We’re trying to provide our users with the tools they need to do their jobs, live their lives, communicate, and collaborate. The evolution from physical machines to virtual machines to containers has been a quest of how to do more and more work with fewer resources overall: fewer servers, fewer hours of labor, fewer shipments of components across the ocean.
In doing so, though, complexity has risen dramatically as well. Whereas in the 1990s, an administrator could tell easily what a particular program was doing at any particular time, what disk it was accessing, and where it was sending its data, now there are two, three, or more levels of abstraction between the person working with a service and the actual hardware it runs on. The process makes planning, creating, and deploying much easier, but pays a cost on the support end of diagnosing when problems occur. Containerization in undoubtedly the wave of the future of commercial computing, and the economic benefits will continue to drive its adoption in ever-smaller sections of the computing world.
Give us a call at Deep Core Data, to discuss how containerization and orchestration are going to change your business, and how your company can get ahead of those changes.