We were focused on building a rudimentary prototype. I wondered: is this the database we want for the foreseeable future?
When I joined Transposit, we were focused on building a rudimentary prototype of our product. We deployed to AWS and chose DynamoDB as a turnkey database solution. Eventually, we looked to transition from prototype to initial release and I wondered, “is DynamoDB the database we want for the foreseeable future?”
We used our database to persist and retrieve user data. DynamoDB satisfied our needs, so I was not inclined to switch. Further, like many startups, we dreamed our product would be popular and we liked that DynamoDB promised to scale with our business.
Still, I researched “how to choose a database”. My efforts proved unfruitful. Amazon pushed me into a decision by summarizing DynamoDB as “a great fit for mobile, web, gaming, ad tech, IoT, and many other applications.” (I’m surprised they don’t mention the kitchen sink.) We ended up trying DynamoDB in our maturing product.
Over the next couple months, I found that DynamoDB promised scale by under-promising canonical database features. Amazon doesn’t clearly explain this tradeoff in their marketing materials, but their 2007 paper highlights it as an explicit design goal. The paper summarizes DynamoDB as a perfect fit for heavily-trafficked services “that have very high reliability requirements and need tight control over the tradeoffs between availability, consistency, cost-effectiveness and performance.”
Amazon offers its shopping cart service as an ideal consumer. It needs DynamoDB to handle peak traffic and does not expect consistency at that scale. Instead, the cart service itself handles inconsistencies. If it ever reads a single cart and finds that there are multiple, conflicting versions, it merges them. This strategy ensures a user will never observe that their cart has dropped an added item, but might observe that their cart still contains a removed item or duplicate items.
I took another look at our product and decided we would never have peak traffic like Amazon and never want to handle inconsistencies this way. In fact, our product had grown in complexity and what we actually needed were relational database features. Two months later, we had abandoned DynamoDB for a relational database. Below, I highlight the database features we missed while using DynamoDB and how we now use these features to build Transposit.
DynamoDB provides no means of performing a transactional change to persisted data. It foregoes this feature to better achieve its design goals. An implementation of transactions would require a level of coordination antithetical to its distributed architecture.
Many applications are willing to sacrifice their ability to perform a transactional change in favor of high reliability at scale. These applications rarely expect to access a single piece of data concurrently and rarely relate multiple pieces of data in a single query. They will tolerate occasional data inconsistencies as an infrequent annoyance. (Amazon’s shopping cart service is a great example.)
We cannot implement Transposit without transactions. We concurrently access data when development teams collaborate using our product. We relate data in our queries to correctly implement complex feature. We promise our customers that our product will never behave erratically and this guarantee is bolstered by the guarantee that our persisted data is always consistent.
DynamoDB provides no means of ensuring the data it stores matches an expected structure.
For some development, this lack of structure is a feature. When the structure of data changes, the database does not need to be administered; instead, an application can be written such that it handles any structure of data it might encounter.
This lack of structure can also be a plague. Every change to data results in more complexity in the application. Even worse, application bugs can silently persist incorrectly structured data. This bad data may cause difficult-to-debug problems in the future and might be impossible to clean up programmatically.
As a growing business, Transposit prioritizes engineering agility. We are willing to administer our database schema if it helps us curb complexity in our code. Further, Transposit customers expect our product to be bug free. We would rather fail when we try to persist incorrect data than spend hours debugging a downstream failure for a customer.
DynamoDB supports single-table query semantics that include query by key, query via a secondary index, and query via a full table scan. Multi-table query semantics (like join or nested queries) must be implemented by the application.
Like many other NoSQL databases, DynamoDB was never designed to handle multi-table queries: the isolated nature of each piece of data permits its distributed design. Because data does not need to be combined, the database runs free of memory pressure and bottlenecks that often beset relational databases under high load.
Transposit frequently iterates on application features that consume our persisted data. Inevitably, these features consume related data in our database. Multi-table query semantics allow us to ship these features quickly without assuming the complexity of implementing joins or nested queries in our code.
With DynamoDB, we would have had a slew of data query and consistency problems. Instead, we adopted MySQL.
MySQL provides semantics that allow us to avoid costly mistakes and quickly develop new features. We are willing to spend more time administering our database if it means spending less time scratching our heads during development and debugging.
We are not afraid to concede DynamoDB’s ability to scale quickly. If we do reach a size where MySQL constricts our operations, our microservices architecture will allow us to adopt an alternative data store for some subset of our load. (Also, hooray for us getting that many users!)
If Transposit were building a consumer product that persisted lots of data, DynamoDB might be a great choice. It would be able to scale quickly when our product went viral. Consumers would forgive occasionally incorrect data and bugs as long as our product worked well a majority of the time.
If you’re interested in DynamoDB, I suggest you skim Amazon’s 2007 paper. It clarifies the tradeoffs you’ll be embracing in your product.