How AWS and cloud computing shifted the complexity of our infrastructure
We often talk about how engineering and operating software has gotten more "complex." But how has it gotten more complex? Understanding the “how” will help inform the solutions to our current and future complexity. A significant contributing factor to the source of this complexity is the rise of Everything as a Service driven by cloud computing and APIs. These abstractions didn't just appear out of nowhere, though. Instead were highly influenced by the rise and adoption of Amazon Web Services.
In order to understand the hockey stick of growth that cloud computing and APIs have had, you must first look at Amazon Web Services. While you can make an argument that multiple factors led to the point where compute and storage were cheap enough to make AWS possible, AWS itself was a significant inflection point.
In the early 2000s, AWS was created because of a recurring need within Amazon— a faster technology department. Merchant.com was helping third-party vendors build online shopping sites on top of Amazon's e-commerce engine and experiencing less than desired time to market. Simultaneously, inside Amazon, many internal development teams kept bumping into the same problem. There was a need for internally scalable, reliable infrastructure services.
After 10 years of Amazon, they had built up a good amount of infrastructure competence. So, they decided to develop solutions that would also work for external partners. To do so they would need an effective way to communicate with them over HTTP. Their code didn't always sit on the same resources, and there needed to be a good plug-n-play solution. The answer was: APIs.
Before AWS, infrastructure looked totally different. The idea of managing infrastructure via HTTP and the internet was not common. Andy Jassy, CEO of AWS, said to Forbes in 2016 that "if you believe developers will build applications from scratch using web services as primitive building blocks, then the operating system becomes the Internet." This was a brand new approach to development at the time. These primitive building blocks have come a long way since AWS' creation in 2006. It is hard to put a number on the growth of web APIs, but ProgrammableWeb has been the canonical source from early on.
As you can see below, web APIs have skyrocketed since 2006. They've grown to be an essential part of how we build applications on top of the infrastructure—all of which are controlled by APIs.
Source: ProgrammableWeb (July 17, 2019): APIs show Faster Growth Rate in 2019 than Previous Years
Today, APIs are the building blocks of the internet, which has shifted our approach to architecting software. Software is now developed and operated in a totally different way than before cloud computing and APIs.
Like any industry that has grown and evolved, a supply chain forms around it. In industries that create physical goods, manufacturers don't produce every aspect of their product in house. For example, automotive manufacturers like Ford, Toyota, BMW, and others are not making every piece of the vehicle themselves. They buy the metals from metal companies, computer chips from semiconductor companies, and tires from a tire company, who in turn is buying rubber from yet another company. There are hundreds of suppliers in the chain. This helps the industry be more efficient and productive.
Software companies now have their own supply chain. It allows the industry to segment and for companies to focus on their core competencies. Infrastructure companies, like AWS, are just one of those suppliers. Today’s software supply chain is heavily made up of APIs. Like a car part, we take chunks of code and piece them together to make finished applications that run on componentized infrastructure. As Jeff Lawson, co-founder and CEO of Twilio, has said, "this shift to component software is the next big leap in the industry's evolution. This is the next great era of software—the one that comes after SaaS."
The confluence of these technologies means that the complexity we see in 2021 is different than before. APIs are an abstraction of complicated systems, but just because APIs with great developer and operator experiences exist does not mean complexity doesn’t. Complexity is neither created nor destroyed — it's just shuffled around.
Instead of managing our own infrastructure, we manage the APIs that make cloud computing possible. Today, managing complexity is about managing the relationships between the different vendors, their APIs, and the components we use. These can be internal or external. Internally, often it is about managing microservices. Externally, it is about managing third-party services that we consume. We might be using one or many forms of Infrastructure as a Service, Platform as a Service, Software as a Service, or half a dozen other "as a Service" acronyms.
While cloud computing and APIs may have solved one set of problems we had while managing our own infrastructure end-to-end before — such as specific scalability, security, and reliability challenges — it has also created a new set of problems. For example, we no longer host our own SMTP email servers, but now we have to care about what happens if Twilio SendGrid, Amazon SES, or Mailgun have an outage that we have little control over. Or what happens if our CDN (Content Delivery Network), cloud database provider, or ticketing system goes down now? Failure will happen, so it becomes more critical that we handle this failure gracefully and fully understand the failure. Do we write brittle glue code that is hard to operate? Or take into consideration what it means to operate someone else's code?
The importance of “vendor engineering” is greater today than before since now we are basically outsourcing code, infrastructure, and operations to vendors. We have to evaluate vendors to not only gauge capability and fit but also balance the cost of integration and operations. As Charity Majors, co-founder and CTO of Honeycomb, says, we have to “learn to manage the true cost of ownership, and to advocate and educate internally for the right solution, particularly by managing up to execs and finance folks.” We cannot just blindly choose a service because of the lower price tag and hope for the best the possible costly impact on our own systems.
Lastly, we have to consider how all of these internal and external services talk to each other. How do they pass data from one service to another? What is our glue code like? Are they too tightly coupled? Are they orchestrated together, creating a pleasant symphony? Or are they all acting out of note and trampling on top of each other, not only creating a bad developer and operator experience but a bad user experience too? These are the new challenges we will have to find creative solutions for as organizations continue their shift to a cloud-dominant infrastructure.
I recently gave a keynote at DevOpsDays Texas on this topic and more, including the importance of glue work while using cloud computing and APIs that you can check out here.