Sunday, 2 December 2012

Apache Hadoop Big Data Analytics Tool and Technology: Defination, Advantages, Disadvantages

Apache Hadoop Big Data Analytics Tool and Technology: Defination, Advantages, Disadvantages

In this article, we will discuss some point about Apache Hadoop Big Data Tool and Technology available to us for managing the real time big data. We will also look on advantages and disadvantages of Apache Hadoop Big Data Analytics.

What is Hadoop Big Data Analytics Tool and Technology

Hadoop is the best tool available today for processing and storing herculean amounts of big data . Hadoop throws hundreds or thousands of computers at the big data problem, rather than using single computer.

Hadoop makes data mining, analytics, and processing of big data cheap and fast. Hadoop can take most of your big data problems and unlock the answers, because you can keep all your data, including all of your historical data, and get an answer before your children graduate college.

Apache Hadoop is an open-source project inspired by research of Google. Since you were wondering, Hadoop is named after the stuffed toy elephant of the lead programmer's son. This explains the preponderance of pachyderms wherever Hadoop is mentioned.

In Hadoop parlance, the group of coordinated computers is called a cluster, and the individual computers in the cluster are called nodes.

Advantages of Hadoop Big Data Analytics Tool and Technology

1. Hadoop is cheap. Hadoop is an open-source Apache project, which means anybody is free to use it. Hadoop runs on commodity hardware (i.e. normal everyday computers), so you don't have to buy million-dollar specialized database machines.

2. Hadoop is fast. Hadoop can deal with terabytes of data in minutes, and with petabytes in hours. Hadoop is the only way that companies with gigantic amounts of data like Facebook, Twitter, Yahoo, eBay, and Amazon can cost-effectively and quickly make decisions.

3. Hadoop scales to large amounts of big data storage. Need to add more space? Just add more hard drives to a node, or even add more nodes to your cluster. You never shut down Hadoop.

4. Hadoop scales to large amounts of big data computation. Is your cluster slow? Just add more nodes to spread out the computation. Hadoop scales almost linearly in many cases - this means you can halve the time it takes to do a job by doubling the number of compute nodes.

5. Hadoop is flexible with types of big data. Are you dealing with structured data? Great. Do you have semi-structured or unstructured (document-oriented) data? Lovely. Hadoop stores and processes any kind of data.

6. Hadoop is flexible with programming languages. Hadoop is natively written in Java, but you can access your data in a SQL-inspired language called Apache Hive. If you want a more procedural language for analysis, there is Apache Pig. If you want to get deep into the framework, you can custom-analyse your data by writing code in Java, C/C++, Ruby, Python, C#, QBASIC or anything else.

Disadvantages of Hadoop Big Data Analytics Tool and Technology

1. Plain Hadoop is hard to to set up. Have you tried setting up this thing? Your best bet may be to kidnap some professors and press them into your service.

2. Plain Hadoop is hard to manage. How do you do anything? Where is the graphical user interface? Oh, there is none.

3. Plain Hadoop is hard to keep alive. Hadoop has various single points of failure. When Hadoop collapses, you lose data and you lose time. That hurts.

4. Plain Hadoop is hard to use. Seriously, this is not a joke. Even adding up a list of numbers is painful.

5. Plain Hadoop is not secure. Your files are not secure and users can easily corrupt or steal data. I hope you trust everybody.

6. Plain Hadoop is not optimized for your hardware. Hadoop does not run at full capacity for your hardware, which is like being stuck in second gear.

No comments:

Post a Comment