Monday, September 15, 2014

MongoDB-Next generation of database



I recently got a chance to work, or rather tinker/play with a document database, MongoDB which is a next generation of database. The cool think about this database is that it does not follow traditional methods of storing data, which is in the form of tables/records and fields. That's why MongoDB is also known as NoSql database.

What is NOSQL database?

NoSQL, or 'Not Only SQL', represents the new class of data management technologies designed to meet the increasing volume, velocity, and variety of data that organizations are storing, processing, and analyzing.

Compared to relational databases, NoSQL databases are more scalable and provide superior performance. NoSQL databases address the opportunities that the relational model does not, including:

  • Large volumes of structured, semi-structured and unstructured data
  • Agile sprints, quick iteration, and frequent code pushes
  • Flexible, easy to use object-oriented programming
  • Efficient, scale-out architecture instead of expensive, monolithic architecture


Document Databases

When relational databases were introduced into the 1970s, data schemas were fairly simple and straightforward, and it made sense to conceive objects as sets of relationships. For example, an article object might be related to a category (an object), a tag (another object), a comment (another object), and so on.

Because relationships between different types of data were specified in the database schema, these relational databases could be queried with a standard Structured Query Language, or SQL. But the environment for data, as well as programming, has changed since the development of the SQL database:

The emergence of cloud computing has brought deployment and storage costs down dramatically, but only if data can be spread across multiple servers easily without disruption. In a complex SQL database, this is difficult because many queries require multiple large tables to be joined together to provide a response. Executing distributed joins is a very complex problem in relational databases.

The need to store unstructured data, such as social media posts and multimedia, has grown rapidly. SQL databases are extremely efficient at storing structured information, and workarounds or compromises are necessary for storing and querying unstructured data.

Agile development methods mean that the database schema needs to change rapidly as demands evolve. SQL databases require their structure to be specified in advance, which means any changes to the information schema require time-consuming ALTER statements to be run on a table.

In response to these changes, new ways of storing data (e.g. NoSQL databases) have emerged that allow data to be grouped together more naturally and logically, and that loosen the restrictions on database schema. One of the most popular ways of storing data is a document data model, where each record and its associated data is thought of as a “document”. In a document database, such as MongoDB, everything related to a database object is encapsulated together. Storing data in this way has the following advantages:

  • Documents are independent units which makes performance better (related data is read contiguously off disk) and makes it easier to distribute data across multiple servers while preserving its locality.
  • Application logic is easier to write. You don’t have to translate between objects in your application and SQL queries, you can just turn the object model directly into a document.
  • Unstructured data can be stored easily, since a document contains whatever keys and values the application logic requires. In addition, costly migrations are avoided since the database does not need to know its information schema in advance.
  • Document databases generally have very powerful query engines and indexing features that make it easy and fast to execute many different optimized queries. The strength of a document database’s query language is an important differentiator between these databases.


MongoDB:

MongoDB (from "humongous") is an open-source document database, and the leading NoSQL database. Written in C++, MongoDB features:

Document-Oriented Storage »
JSON-style documents with dynamic schemas offer simplicity and power.

Full Index Support »
Index on any attribute, just like you're used to.

Replication & High Availability »
Mirror across LANs and WANs for scale and peace of mind.

Auto-Sharding »
Scale horizontally without compromising functionality.

Querying »
Rich, document-based queries.

Fast In-Place Updates »
Atomic modifiers for contention-free performance.

Map/Reduce »
Flexible aggregation and data processing.

GridFS »
Store files of any size without complicating your stack.

MongoDB Management Service »
Monitoring and backup designed for MongoDB.

Partner with MongoDB »
Reduce cost, accelerate time to market, and mitigate risk with proactive support and enterprise-grade capabilities.

In the next post, I will write about installation, some basic commands and how to write a file using one of the popular package available in Python Prgoraming Language.

No comments: