“We’re going to store data the way it’s stored naturally in the brain.”
This is a phrase being heard more often today. This blog post is inspired by a short rant by Babak Tourani (@2ndhalf_oracle) and myself had on Twitter today.
How cool is that!!
This phrase is used by companies like MongoDB or Graph Database vendors to explain why they choose to store information / data in an unstructured format. It is new, it is cool, hip and happening. Al the new compute power and storage techniques enable doing this.
How cool is that!!
Well, it is… for the specific use-cases that can benefit from such techniques. Thinking of analytical challenges, where individual bits of information basically have no meaning. If you are analyzing a big bunch of captured data, which is coming from a single source like a machine, or a click-stream or social media, for instance, one single record basically has no meaning. If that is the case, and it is really not very interesting if you have and retain all individual bits of information, but you are interested in “the bigger picture”, these solutions can really help you!
How cool is it, actually?
If it comes to the other situations where you want to store and process information… where you do care about the individual records (I mean, who wants to repopulate their shopping cart on a web-shop 3 times before all the items stick in the cart) there are some historical things that you should be aware of.
Back in the day when computers were invented, all information on computers was stored “the way it’s stored naturally in the brain”.
Back in the day when computers were invented, all we had were documents to store information.
This new cool hip and happening tech is, if anything, not new at all…
Sure, things changed over the last 30 years and with all the new compute power and storage techniques, the frayed ends of data processing have significantly improved. This makes the executing of data analysis, as described above, actually so much better!! Really, we can do things to data, using these cool new things, that we never dreamt possible, 30 years ago.
But these things remain the “frayed ends of data processing”.
If you do have requirements like filling your shopping cart once, and it works all the way through check-out…
If you do have requirements where some kind of “transaction” is required (like buying something, like your bank account, like two actions that are dependent of each other)…
You need transactions…
I know, “transaction” is boring, old-fashioned and a seemingly surpassed entity…
But, I promise you, you will want those things, if you actually have to process something in your application in a way that makes real-world sense.
This was solved ages ago
For that, indeed 30 years ago (which is such a long time, most of the cool young dudes and dudetes developing applications today were not even born), the relational database theory was invented to solve the inherent issues that document based databases bring if you want to introduce these transactions to your application.
Document databases brought these issues back in the day… They bring these issues today!!!
Please believe me, they bring these issues today! This is the reason – contrary to the messages by non-relational database vendors – applications developers find that they need to add actual transactional capabilities to their applications, to either work in real life of bring any kind of scalability to them.
Imagine building an application and actually being successful with it! Isn’t that the dream of every application project? How boring is it then, to find that you are unable to meet demands? Not because you are understaffed or because you lack compute-resources? But simply because your application, based on these data storage methodologies, cannot keep up? Document database is data storage, not data processing.
For that, you would need the likes of PostgreSQL. Postgres is (also) free, it is Open Source… it is even Community Open Source, how cool is that? No annoying vendor telling fantasy stories about what Postgres can do, unlike MongoDB for instance.
Coming back to the opening phrase, We’re going to store data the way it’s stored naturally in the brain.
It is kind of dumb to use a computer to store data like it would be stored in the brain. The human brain is not designed to process YUGE amounts of data, simply because the structure is not designed to accommodate that. Period.
To process large amounts of data, you need structures, either when you store the data or when at the moment you want to start doing stuff with it. Structuring data when you store it, is by far the cheapest method. Technologies like JSON data storage add sufficient flexibility to that, and engines like Postgres have no trouble what so ever processing such data.
Finally, the programs these vendors use to “store data the way it’s stored naturally in the brain” are written in computer-code, also not “naturally like the brain”. Would we need to revert to medieval clerks to start recording the data in these documents? No, I guess not.
Be hip-and happening,
Be efficient and scalable,
Use relational database techniques…