NoSQL data stores have become more popular as a means of handling the petabytes of data created by user-generated content, GIS data, and the traffic logs of high-volume websites and apps. By storing data in “documents” with text encoded in formats like JSON, rather than in related “tables,” NoSQL repositories save up-front development time. They avoid the need for defining database tables and allow for very quick retrieval of data. They also require far less storage space, thus making them a cost-effective way to store massive amounts of data.
For major data content companies like McGraw-Hill, Pearson, and Houghton-Mifflin, NoSQL databases play an important role in their content management processes because they can:
- Handle petabytes of data efficiently and cost-effectively
- Emulate the features of relational databases (with some effort), and
- Enable analysis via Hadoop and MapReduce data processes
While the NoSQL approach has advantages over relational databases, the lack of structure makes NoSQL a poor choice for the kind of user-oriented search and retrieval interfaces designed to return fewer, pinpoint accurate records. For instance, in related tables you can include or exclude groups of information related to the primary data through joins in queries while in a schema-less repository you need to analyze the document structure to decide what you want to include or exclude. This makes interface design more complex.
Looking into the crystal ball, it seems like NoSQL could eventually play as important a role in data management as XML. NoSQL databases are evolving to support SQL-like query languages for better data analysis. They’re now often referred to “Not Only SQL” data stores because they play an important role in hybrid data management solutions.