NoSQL / MongoDB Basics
As a software developer or architect, you would have encountered and used NoSQL. Even if you haven't used it, it is good to have the basics of NoSQL and MongoDB at your fingertips. A couple of times, I faltered while discussing NoSQL / MongoDB. Hence this compilation of basic points, as a ready reference.
The basics that you need to have at your fingertips are: when would you choose NoSQL, what is Brewer's Theorem and how it helps in making this choice, what are the various types of NoSQL databases with examples of each, the basic concepts of MongoDB, some unique features of MongoDB, and have you heard any criticism of it.
For the first question of when would you choose NoSQL, if you google, you will get a lot of links; the three that I read are:
http://blogs.shephertz.com/2013/06/20/a-developers-dilemma-when-to-use-nosql/
http://www.itworld.com/article/2833291/essential-reading-for-choosing-a-nosql-database.html
http://www.informationweek.com/big-data/big-data-analytics/nosql-newsql-or-rdbms-how-to-choose/a/d-id/1297861
Basically the main points are, high data volumes, scale out horizontally, schemaless data. The third link presents the selection criteria in the form of a table:
"it's impossible for a distributed computer system to simultaneously provide all three of these guarantees:
Source: [1]
Here's the visual depiction:
Source: second link of itworld given above, article by Matthew Mombrea.
Moving on to the next question of what are the various types of NoSQL databases with examples of each, here is the answer:
Key-Value stores
Redis, BerkeleyDB, Risk
Document Databases
MongoDB, CouchBase. Similar databases: Lucene and Solr / ElasticSearch (both built on top of Lucene)
Column-based stores
Cassandra, HBase
Graph databases
Neo4J, OrientDB
XML Databases
Mark Logic, eXist-db, xDB
Source: [2]
Moving on to the next question, what are the basic concepts of MongoDB:
Indexing
MongoDB supports generic secondary indexes, allowing a variety of fast queries, and provides unique, compound, and geospatial indexing capabilities as well.
Stored JavaScript
Instead of stored procedures, developers can store and use JavaScript functions and values on the server side.
Aggregation
MongoDB supports MapReduce and other aggregation tools
Fixed-size collections
Capped collections are fixed in size and are useful for certain types of data, such as logs.
File storage
MongoDB supports an easy-to-use protocol for storing large files and file metadata.
Source: [3]
Finally if someone asks, have you heard any criticism of MongoDB, here is an article that seems to be popular and I keep bumping into it quite frequently in my reading.
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
Sarah writes well, I wish I could write like that. She makes her posts interesting with diagrams and pictures. Curiously, though the title of article is titled "Why you should never use MongoDB", there is advice there in when to use MongoDB.
The crux of the article is captured in:
Sources:
[1] DZone Refcardz, "Getting Started with NoSQL and Data Scalability" by Eugene Ciurana.
[2] https://keefcode.wordpress.com/2013/12/04/nosql-databases-how-to-choose/
[3] MongoDB: The Definitive Guide by Kristina Chodrow and Michael Dirolf, 2010. Oreilly Media, Inc. ISBN: 978-1-449-38156-1.
Note - answers taken verbatim.
The basics that you need to have at your fingertips are: when would you choose NoSQL, what is Brewer's Theorem and how it helps in making this choice, what are the various types of NoSQL databases with examples of each, the basic concepts of MongoDB, some unique features of MongoDB, and have you heard any criticism of it.
For the first question of when would you choose NoSQL, if you google, you will get a lot of links; the three that I read are:
http://blogs.shephertz.com/2013/06/20/a-developers-dilemma-when-to-use-nosql/
http://www.itworld.com/article/2833291/essential-reading-for-choosing-a-nosql-database.html
http://www.informationweek.com/big-data/big-data-analytics/nosql-newsql-or-rdbms-how-to-choose/a/d-id/1297861
Basically the main points are, high data volumes, scale out horizontally, schemaless data. The third link presents the selection criteria in the form of a table:
Table 1: 10 Selection Criteria For Choosing Database Types
Moving on to the next question of what is Brewer's Theorem and how it helps in making this choice. Essentially this theorem states that you can only guarantee two of Consistency, Availability and Partition Tolerance. More formally,Characteristic | RDBMS | NoSQL | NewSQL |
ACID compliance (Data, Transaction integrity) | Yes | No | Yes |
OLAP/OLTP | Yes | No | Yes |
Data analysis (aggregate, transform, etc.) | Yes | No | Yes |
Schema rigidity (Strict mapping of model) | Yes | No | Maybe |
Data format flexibility | No | Yes | Maybe |
Distributed computing | Yes | Yes | Yes |
Scale up (vertical)/Scale out (horizontal) | Yes | Yes | Yes |
Performance with growing data | Fast | Fast | Very Fast |
Performance overhead | Huge | Moderate | Minimal |
Popularity/community Support | Huge | Growing | Slowly growing |
"it's impossible for a distributed computer system to simultaneously provide all three of these guarantees:
- Consistency (all nodes see the same data at the same time)
- Availability (node failures don't prevent survivors from continuing to operate)
- Partition tolerance (no failures less than total network failures cause the system to fail)
Source: [1]
Here's the visual depiction:
Source: second link of itworld given above, article by Matthew Mombrea.
Moving on to the next question of what are the various types of NoSQL databases with examples of each, here is the answer:
Key-Value stores
Redis, BerkeleyDB, Risk
Document Databases
MongoDB, CouchBase. Similar databases: Lucene and Solr / ElasticSearch (both built on top of Lucene)
Column-based stores
Cassandra, HBase
Graph databases
Neo4J, OrientDB
XML Databases
Mark Logic, eXist-db, xDB
Source: [2]
Moving on to the next question, what are the basic concepts of MongoDB:
- A document is the basic unit of data for MongoDB, roughly equivalent to a row in a relational database management system (but much more expressive).
- Similarly, a collection can thought of as the schema-free-equivalent of a table.
- A single instance of MongoDB can host multiple independent databases, each of which can have its own collections and permissions.
- MongoDB comes with a simple but powerful JavaScript shell, which is useful for the administration of MongoDB instances and data manipulation.
- Every document has a special key, "_id", that is unique across the document's collection.
Indexing
MongoDB supports generic secondary indexes, allowing a variety of fast queries, and provides unique, compound, and geospatial indexing capabilities as well.
Stored JavaScript
Instead of stored procedures, developers can store and use JavaScript functions and values on the server side.
Aggregation
MongoDB supports MapReduce and other aggregation tools
Fixed-size collections
Capped collections are fixed in size and are useful for certain types of data, such as logs.
File storage
MongoDB supports an easy-to-use protocol for storing large files and file metadata.
Source: [3]
Finally if someone asks, have you heard any criticism of MongoDB, here is an article that seems to be popular and I keep bumping into it quite frequently in my reading.
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
Sarah writes well, I wish I could write like that. She makes her posts interesting with diagrams and pictures. Curiously, though the title of article is titled "Why you should never use MongoDB", there is advice there in when to use MongoDB.
The crux of the article is captured in:
Whether you’re duplicating critical data (ugh), or using references and doing joins in your application code (double ugh), when you have links between documents, you’ve outgrown MongoDB. When the MongoDB folks say “documents,” in many ways, they mean things you can print out on a piece of paper and hold. A document may have internal structure — headings and subheadings and paragraphs and footers — but it doesn’t link to other documents. It’s a self-contained piece of semi-structured data.
If your data looks like that, you’ve got documents. Congratulations! It’s a good use case for Mongo. But if there’s value in the links between documents, then you don’t actually have documents. MongoDB is not the right solution for you. It’s certainly not the right solution for social data, where links between documents are actually the most critical data in the system.
Sources:
[1] DZone Refcardz, "Getting Started with NoSQL and Data Scalability" by Eugene Ciurana.
[2] https://keefcode.wordpress.com/2013/12/04/nosql-databases-how-to-choose/
[3] MongoDB: The Definitive Guide by Kristina Chodrow and Michael Dirolf, 2010. Oreilly Media, Inc. ISBN: 978-1-449-38156-1.
Note - answers taken verbatim.
Comments
Post a Comment