Monday, September 15, 2014

MongoDB-Next generation of database



I recently got a chance to work, or rather tinker/play with a document database, MongoDB which is a next generation of database. The cool think about this database is that it does not follow traditional methods of storing data, which is in the form of tables/records and fields. That's why MongoDB is also known as NoSql database.

What is NOSQL database?

NoSQL, or 'Not Only SQL', represents the new class of data management technologies designed to meet the increasing volume, velocity, and variety of data that organizations are storing, processing, and analyzing.

Compared to relational databases, NoSQL databases are more scalable and provide superior performance. NoSQL databases address the opportunities that the relational model does not, including:

  • Large volumes of structured, semi-structured and unstructured data
  • Agile sprints, quick iteration, and frequent code pushes
  • Flexible, easy to use object-oriented programming
  • Efficient, scale-out architecture instead of expensive, monolithic architecture


Document Databases

When relational databases were introduced into the 1970s, data schemas were fairly simple and straightforward, and it made sense to conceive objects as sets of relationships. For example, an article object might be related to a category (an object), a tag (another object), a comment (another object), and so on.

Because relationships between different types of data were specified in the database schema, these relational databases could be queried with a standard Structured Query Language, or SQL. But the environment for data, as well as programming, has changed since the development of the SQL database:

The emergence of cloud computing has brought deployment and storage costs down dramatically, but only if data can be spread across multiple servers easily without disruption. In a complex SQL database, this is difficult because many queries require multiple large tables to be joined together to provide a response. Executing distributed joins is a very complex problem in relational databases.

The need to store unstructured data, such as social media posts and multimedia, has grown rapidly. SQL databases are extremely efficient at storing structured information, and workarounds or compromises are necessary for storing and querying unstructured data.

Agile development methods mean that the database schema needs to change rapidly as demands evolve. SQL databases require their structure to be specified in advance, which means any changes to the information schema require time-consuming ALTER statements to be run on a table.

In response to these changes, new ways of storing data (e.g. NoSQL databases) have emerged that allow data to be grouped together more naturally and logically, and that loosen the restrictions on database schema. One of the most popular ways of storing data is a document data model, where each record and its associated data is thought of as a “document”. In a document database, such as MongoDB, everything related to a database object is encapsulated together. Storing data in this way has the following advantages:

  • Documents are independent units which makes performance better (related data is read contiguously off disk) and makes it easier to distribute data across multiple servers while preserving its locality.
  • Application logic is easier to write. You don’t have to translate between objects in your application and SQL queries, you can just turn the object model directly into a document.
  • Unstructured data can be stored easily, since a document contains whatever keys and values the application logic requires. In addition, costly migrations are avoided since the database does not need to know its information schema in advance.
  • Document databases generally have very powerful query engines and indexing features that make it easy and fast to execute many different optimized queries. The strength of a document database’s query language is an important differentiator between these databases.


MongoDB:

MongoDB (from "humongous") is an open-source document database, and the leading NoSQL database. Written in C++, MongoDB features:

Document-Oriented Storage »
JSON-style documents with dynamic schemas offer simplicity and power.

Full Index Support »
Index on any attribute, just like you're used to.

Replication & High Availability »
Mirror across LANs and WANs for scale and peace of mind.

Auto-Sharding »
Scale horizontally without compromising functionality.

Querying »
Rich, document-based queries.

Fast In-Place Updates »
Atomic modifiers for contention-free performance.

Map/Reduce »
Flexible aggregation and data processing.

GridFS »
Store files of any size without complicating your stack.

MongoDB Management Service »
Monitoring and backup designed for MongoDB.

Partner with MongoDB »
Reduce cost, accelerate time to market, and mitigate risk with proactive support and enterprise-grade capabilities.

In the next post, I will write about installation, some basic commands and how to write a file using one of the popular package available in Python Prgoraming Language.

Saturday, September 13, 2014

Probability and the problem of plenty

Hi,

I am writing an off beat thought here, and want to deal with a mathematical technique called probability, which means chance in a layman's language and the problem of plenty, which again means, many in a layman's language.

Before, all of you even start guessing that  about what I am going to talk, I will clarify, the problem of plenty is about applying to job positions in the multinational corporations.

Now, don't get me wrong. I have absolutely nothing against the corporations, and also, the way people, especially Indians think. This is just a thought which came to my mind while searching for new jobs.

To begin with, I will explain about probability, please don't get scared, as I am not going to cover the nitty-gritty of this subject, which itself is a paradox, and I believe that the probability that probability understands itself is zero. Now, the definition of probability from wikipedia:

'Probability is used to quantify an attitude of mind towards some proposition of whose truth we are not certain.[2] The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The certainty we adopt can be described in terms of a numerical measure and this number, between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty), we call probability.[3] Thus the higher the probability of an event, the more certain we are that the event will occur. A simple example would be the toss of a fair coin. Since the 2 outcomes are deemed equiprobable, the probability of "heads" equals the probability of "tails" and each probability is 1/2 or equivalently a 50% chance of either "heads" or "tails". '


Scared? Confused? Ha ha, don't worry, as I have said, that probability that probability understands itself is zero. I will tell you the most simplest of the thing about probability while dealing with this paradox : "PROBABILITY WILL NEVER BE LESS THAN ZERO AND GREATER THAN ONE".

Just keep zero and one in mind, and your life will be easy, whatever outcomes we obtain from the total number of universal set is probability.

So suppose, I have two coins, and coins have HEADS AND TAILS ,so

Total number of the Event E = {2}, as heads and tails

Sample count of  Heads in a 'FAIR' coin H = {1}

Sample count of  Tails in a 'FAIR' coin T = {1}

Probability of obtaining HEAD while a 'FAIR' coin is flipped P{H | E} = 1/2 = 50%

I guess this is sufficient.

Ahh, now the original matter. While I have been applying for new positions for my job search, which I guess is considered very normal for an IT guy of India, in order to increase my package, I started thinking on this that people 'ADVISE', which is another 'INDIAN' trait, me to keep on applying and then you will get through. So keep on applying here means that I must keep on posting my portfolio or resume to various companies, and then, I will get short-listed by them, and then they will follow an algorithmic procedure to hire me.

Wait, you will tell me that what's so wrong with it, that it made me to write a blog, that too, using a topic like probability. I will say, why not apply probability to give a proof that applying profiles to hoards of companies to seek out employment actually reduces our chances of getting an employment.

Yes, you have read it RIGHT, the more we will apply to companies, the more it will reduce our employability chance.

How?

Now, consider a case, I apply to one company, so my chances  of getting entry into that company or organization will be 1, as there is only one sample event and one event which will occur, so it is 100%, similarly, chances of not getting into that organization will be zero, which I guess is fair.

I apply to two companies, so chances of getting call of an interview is (1/2), chances of not getting an interview call (1/2) as in probability, the sum of the outcomes is always 1.

This gives rise to interesting cases now. I have got a call of interview, I will do research about that firm, could be possible that I like it and I decide to attend the interivew.

So chances of all those three outcomes would be (1/3) * (1/3) * (1/3), and this is out of the 50% of the chances, so overall probability (1/2) * (1/27) = (1/54).

Probability of not happening this would be 1- 1/54 = 53/54 almost equal to 1.

                                                                     53/54 > 1/54.


We can clearly see that the chances of getting selected in an organization decreases as we apply to more firms. If the situation is getting out of control by applying to two firms, then imagine what would be the result if we apply for jobs to more than 10 firms.

But Indians keep on applying for jobs in the portals thinking that more the application, more the chances of getting selected.

This was just an attempt on my side of trying to prove a hypothesis which I had thought off, and could be possible that I may be wrong.

Happy Reading!



FizzBuzz Program to test developers


This is not to be-little anyone, but since I am a Software Engineer and also love writing code, hence I get disheartened and feel sad when I find the truth. I do know it. And also, this has nothing in relation with India IT people because this thing holds true for all the software engineers across the world.

I recently came across a blog post which mentions about the horror that coders cannot program.

And what more, the real problem is that coders cannot write simple program.

What's the simple program which majority of the developers struggle to write?

The "Fizz-Buzz test" is an interview question designed to help filter out the 99.5% of programming job candidates who can't seem to program their way out of a wet paper bag. The text of the programming assignment is as follows:
"Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”."

It took me 5 minutes to figure out this program.

In fact, those who consider this test as easy and a piece of cake, would reflect negative thought that how do they view the work of coding and the development/engineering a software

But why is it so hard?

Before trying to describe that why is it so hard, here's the solution, it is not a scale able though

>>> def FizzBuzz():
...     for i in range(1,100):
...             if (i%3 == 0 and i%5==0): print("FizzBuzz")
...             elif(i%3==0): print("Fizz")
...             elif(i%5==0): print("Buzz")
...             else: print(i)


I have used windows 8 environment. Python must be installed for this.

Now call the FizzBuzz() function from the python interpreter.

The above solution works as we intend to.

As I have mentioned above, many programmers/coders struggle to write this simple program?

What could be the reason?

As a professional programmer, I have realized that many software engineers develop software which involves calling libraries, routines, data processing languages SQL etc. The usage of loops, recursion, solving a problem by designing an algorithm,  using an existing algorithm such as linked list, trees, etc does not happen in the majority of the software development. Barring the biggies, the rest of the software development is either writing an app by calling libraries or customizing ERP package. And we know how dead an ERP application is, it involves zilch amount of technical knowledge.

The only way to improve as a programmer is to write code consistently.  It is not writing code for a day, then coming back to touch it after a week. No art/craft or for that matter, no subject cannot be mastered without practicing, and it is the perfect practice which is required. Yes, in the beginning write code which sucks, then tinker with it, then learn the materials, hit the documents, ask people, but practice the art of coding.

Unfortunately, coding is termed as a dirty job and if things go wrong, then the coders are blamed first. But why is the situation so horrible? Because the bosses of the software industry knows the language of the business and don't know anything about the technology world. Their only viewpoint about technology is that this particular piece of package software is hot, let us put resources on this and mint money. And by the time the technology becomes obsolete, the bosses would have extracted wealth from the resources whose skills becomes outdated in this fast paced technology world. 

Coders must stand up now and fight for their rights. It is coders who bring good things in the IT world. Until and unless coders will de-value themselves, they will always be ruled by the bosses who cannot write 'FizzBuzz' program.

P.S.: If you are a Software Developer/Engineer and stumble upon this post, I will request to solve this FizzBuzz problem. Be honest with yourself, do not look or cram the solutions, and just do an introspect about ourselves, because, even good software engineers do struggle on this.