After the introduction chapter, chapter two is first things, first. When talking about data-intensive applications, front and center is the data itself and how it is stored.
When I first learned databases around the turn of the century, SQL was pretty much the only game in town. Other types of databases existed besides relational ones, but I rarely saw them in the wild. Only within the last decade did I begin encountering NoSQL document databases, so reading this chapter, which covers several non-relational databases, was an eye-opener.
During the design process, much thought has to be given to not only how to store the data but how the data will be used. Does all the data live together and will be pulled by a key? A Document DB like MongoDB. Is the data a graph like a Facebook friend list, then a Graph DB like Neo4j. No fixed schema? Maybe Document DB, maybe Relational with blobs. Will the data be pulled using an SQL query or MapReduce functions? Do you even need a separate database process/server or can you use a single tenant SQLite3 instance?
There are a lot of design considerations because the database isn't a junk drawer but an integral part of a system that needs to match algorithms and system design.
A lot of performance improvements that I have made in my career have come from reducing the number of database calls (and their I/O), as well as optimizing queries with planners and indices. It affects both back-end and front-end development. You can write the cleverest code imaginable, but if the application spends three seconds retrieving data from the database, your code will seem slow and clunky.
Which database challenges have you encountered in your projects? What hard lessons have you learned?
No comments:
Post a Comment