April 1, 2020
TABLE OF CONTENTS
Last November, at the end of lunch with Emily (one of my mentees), I used the items on our table to express some of the fundamental concepts of Kubernetes. The table itself became a region. The napkin holder joined the stacks of jelly and butter as master nodes; our plates were workers. Sometime after explaining the purpose of the Kubernetes scheduler, but mercifully before I was forced to redeploy our silverware as availability zones and subnets, she asked me a question I spent the next several weeks thinking about.
“What does Kubernetes accomplish that can’t be achieved with an auto-scaling group and an Elastic Load Balancer?” In other words, why would this open-source project be a better choice than a popular (and proprietary) industry standard?
To some, this may be a startling question, a relic from a bygone era—2012 or worse! Others might nod their heads, expecting an answer that didn’t amount to much (though I expect they may have broadened the scope of the alternative technology stack). Levvel is a business and technology consultancy. Describing enhanced capabilities available from new technologies and the tradeoffs inherent in adopting them is a major component of our service to our clients. With businesses in all verticals investing more in technology to drive revenue and lower costs, the accelerating rates of technical change mean it is even more important to find facts to aid in decision making. A similar question prompted my initial research into containers, some years ago now, and I was pleased that she asked.
I was less pleased at the prospect of having to conjure an appropriately complete answer! I pledged to take the question with me to Kubecon to answer the best I could, and I hope you, dear reader, may benefit from my answer, as well.
I spent the first half of my career doing software development. After being the lead on a project that was difficult to keep running at “Web Scale™,” I’ve spent the second half learning about the operational challenges of supporting software, with an eye towards making the right thing to do easy for development teams. They have enough to worry about already, in my view.
The second half of my career began with a part-time support shift at a full-service hosting company whose motto at the time was, “You write it, we run it.” Not exactly a groundbreaking concept—it matched the technology organizations in place at a large majority of software engineering shops at companies of all sizes. Running software doesn’t have a lot in common with writing it. The skills used day-to-day by the engineers differ; the experiences accumulated differ; the technical expertise gained is different. You can see why the organizations diverge and specialize.
Today, we have some evidence that once the domain of the operating software (and the complexity of the organization that produces it) becomes large enough, this contract begins to fail. I still remember handling the support ticket filed by an engineering team that included s3fs in their architecture (1). When operations can’t support what development has produced, that is a real problem, and its solutions might be very expensive. In my experience, this kind of problem can happen anywhere. Notice also that my own example comes from “the cloud.” The introduction of cloud architectures has not eliminated this family of problem—it has expanded and complicated them!
Into this landscape comes container architectures in general, and Kubernetes specifically. Its technical requirements imply discarding “you write it, we run it.” A software shop may still have development teams and operations teams, certainly, but their mutual responsibilities change pretty significantly. Not every organization is ready to undergo “digital transformation” without knowing what they’ll get out of it, so let’s consider one of our hallmark examples of when container architecture makes sense.
We frequently encounter development shops that want to take a small team to refactor and expand one of their software offerings. To do so, they’ll need a new subdomain, a database with a schema they control for some limited amount of time, and isolated access to some number of support systems. This team plans to make changes no one wants affecting production accidentally. The operations team responds with a list of the tasks that they will need to do to support this effort. It’s very common for the operational effort level to outstrip the development effort. Essentially the contract in these cases is “you write it, we run it three weeks from now.” When we observe this sort of problem at a client shop, we do recommend they consider a container architecture targeting dev, test, and pre-production.
What’s the reasoning behind this recommendation? Our recommendation introduces some of the disruptive aspects of Kubernetes with respect to technical skill and organization; we need to receive a commensurately sizable benefit to potentially outweigh the costs of this disruption. In this use case, Kubernetes shifts some of the responsibilities borne solely by operations entirely into developer control. This isn’t feasible without additional training and preparation on the part of the development team, but our experience indicates that usually, senior development staff can handle it, with some support.
If, in the above example, our operations team has already delivered a Kubernetes cluster to the development staff, this request is a non-event. The development team already has what they need to independently deploy and manage this software. Whether or not the CI/CD pipeline can be updated by the developers independently as well, we are certainly in a different order of magnitude with respect to expending effort to delivering on this team’s initiative. In our experience, developers are usually happy to deal with additional complexity for what they consider “quality of life” benefits from increased agency regarding their software development life cycle.
(1)Why an operator cannot be expected to support s3fs in production is left as an exercise to the reader—or if enough people ask, I can explain!
On the operations side, however, even though the fundamentals of how software behaves in the physical world does not change, the tools, techniques, and instincts about how to apply them change drastically. Though our experience is that few operations engineers experience outfitting the development team above an exciting, fulfilling assignment, not every engineering team experiences a lighter, more fulfilling load as a short-term benefit. Many of their existing investments in knowledge and toolkit require revisitation.
Some operations team leaders may also push back on the characterization of the “load lightening” as the ecosystem surrounding Kubernetes clusters is changing as rapidly as anything publicly documented. For those that are readiest for the challenge, however, operations can begin to drive business value creation independently of developers. For operations managers constantly under pressure to do less with more, Kubernetes may look like a way to get ahead of the curve and stop being considered a cost center. Radical indeed.
KubeCon 2019 gave me an answer that I would never have considered sitting at the Redeye Diner with Emily. She and I both commonly work with n-tier, RDBMS-backed web applications. For those workloads, you can make the case that preferring Helm and Dockerfiles to, say, Packer and Ansible is a matter of taste. There are many critical technology initiatives that have little or no similarity to web application development, however, and Kubernetes supports those too!
One in particular worth mentioning is Kubeflow, the “Machine Learning Toolkit” for Kubernetes. Machine learning problems do not need to sit around for a human user to form an intention and begin interacting with the software. In ML workflows, engineers have access to some impressively large dataset, perform computationally intensive interactions on that dataset to some end, test their results, and either go back to square one or talk to people about them. Updating the notebook for their software simulations is not that similar to deploying web services to an active user base. For this use case, the notion of being able to target Kubernetes as a platform instead of a specific cloud provider offering is very significant! One aspect of the lunch table metaphor is still useful here—if the food all tastes the same, go eat at the cheapest diner you can find! Pick a new one every week or day, or however long it takes you to get satisfied.
The existence of a uniform platform that sits atop a set of compute, storage, and networking resources is significant. It wasn’t realistic for something like OpenStack to unify APIs and assimilate product offerings across providers like AWS, Azure, and Google Cloud. Today, however, those providers are all offering up a Kubernetes service. Will market pressure result in helping keep the offerings uniform with respect to “write once, run anywhere”? Machine learning is far from the only kind of workload to benefit from that uniformity, so we’ll have to see. Even if the offerings do have divergences, the total costs of being a customer of a particular provider are much clearer. Let us not forget the competitive technologies that Kubernetes outpaced on its way here.
I took this photo during the opening keynote, and there are whole sections not pictured. KubeCon growth is explosive, with over 12,000 attendees for 2019’s event in San Diego. Engineers and small teams should begin experimenting. Do you want to be one of them?
Kubernetes isn’t necessary. There are plenty of effective large infrastructures that have been built without it. Properly managed teams of sufficiently skilled individuals can achieve results with or without it. There are new technical vistas, but oceans of new complexity that comes along with them. So many of the value judgments of Kubernetes from the organizational perspective can only be evaluated in a specific context. No two clusters will be identical, after all.
But if you’re a young engineer, isn’t the idea of developing to a serverless deployment more freeing than requiring, say, Heroku? To an operations engineer early in their career, isn’t the idea of being able to wield Prometheus and Grafana for monitoring and dashboard construction an appealing place to begin? So much engineering energy is going to raise the level of architectural abstractions available. But Kubernetes is not the first platform that’s received this level of attention—is there anything out there that gives us a sense of whether this energy will be harnessed instead of wasted?
I found one KubeCon keynote informative along these lines. The (Open)Telemetry keynote is not just a harrowingly complex live demo, but also a story about how what might have been a very difficult interpersonal problem was addressed by those involved. The OpenCensus and OpenTracing projects had both reached the “incubation” stage, meaning they had some degree of support from the broader community. Since this is open-source, problems of pride and prestige can be particularly pernicious, but they found a way to move forward together. If this is a sign of how this community can work together, then the future is bright indeed.
Jim Van Fleet
Principal Engineering Consultant
Jim Van Fleet is a Principal Engineering Consultant at Levvel. After an early career focused on helping startups successfully grow and scale, Jim now applies those lessons to the software development lifecycles at Fortune 500 companies.
This report highlights the symptoms of outdated legacy systems for organizations, and offers vital business solutions and strategies on legacy modernization and improving business success.
In this new video series from Levvel, our experts discuss the disruption happening in the insurance industry, common pain points, stories from the field, and the opportunities for established insurers to modernize and level the playing field.
Southern Fried Agile hosted its annual Agile-focused conference in Charlotte, NC, to bring together the Agile community. Organizations struggle to adopt Agile; however, it’s important to understand and adapt to become a better business.
Levvel Research analyzed legacy dependence’ impact on an insurer’s ability to meet business objectives and make software changes quickly, and our findings reveal key links between underlying infrastructure, process, culture, and time-to-market.