This made me think.
I understand the point. Dealing with a distributed database will always be harder than to deal with a single SQL one. But I also think there’s some “cost” that Mike is not taking into account.
The cost of non linear throughput is only one of the several variables involved in the decision. Some projects have higher chances to hit the limits of a single machine database like PostgreSQL. Hence here are my thoughts:
- The higher the chances, the more it makes sense to start off with a distributed database from the beginning.
- The hardest it is to predict data growth, the more it makes sense to start off with a distributed database.
Databases which are built for scale have different characteristics from the ones built to run on a single operating system. To leverage those characteristics your architecture has to change inevitably. The cost to change things after may very high for several reasons. First off it may not be easy to migrate a huge 600GB Postgres database with little or no downtime. Secondly, you’ll be deploying at the same time an important change your app’s code. Not exactly the kind of operation a team dreams by night. You have to plan such change way before it’s actually required.
That being said I know there are tons of business being profitable with a single (maybe replicated) instance of MySQL or Postgres. If you are lucky enough you can architect your app to fit in that category. If you are not, then my advice is to pick a distributed database from the beginning only if storing a lot of data is part of your value proposition. Once you’ve coded all what’s due to deal with the eventual consistency issues and other oddities you won’t regret it.
Otherwise, if you are starting something new and you don’t exactly know where you are going, then use Postgres or MySQL and you’ll be on track much faster.