Monday, 6 October 2014

Ubuntuing And Hadooping

Been ubuntuing and hadooping for some days now. But first a digresssion. I have this funny, non-scientific correlation : If you go to a place that has a bunch of programmers, be it a startup or a technical conference, and if you see a predominance of MacBooks, then that place is mature and on par with Silicon Valley. If you find a predominance of Ubuntu, then it is an ecosystem on the rise. If you find the bunch of techies without laptops and the few there have Windows, then it is a laid back place which has potential (read Hyderabad) and a lot has to be done :)

Now, don't get upset with my statement. Or start a flame. I said, funny and unscientific, didn't I? Back to the topic of this post.

I have this Compaq laptop that I had bought way back in 2009, which was lying idle because I spend the bulk of my personal time on my iMac. The laptop came with Windows Vista and I did not care to upgrade it to 7 or 8. Every computer has roles and responsibilities and there is no reason for any device in your house to be idling.

So I decided to return to Ubuntu, with which I had played a few years ago. The latest version is 14.04 and installation was a breeze. All it took was to download the image (iso) file, download & run the universal USB installer from pendrivelinux.com, and boot up the system. Only one hiccup was to make sure to remove the pen drive from the USB port when it prompts for system re-start.

The universal USB installer makes a bootable pen drive. The steps are simple : "Simply choose a Live Linux Distribution, the ISO file, your Flash Drive and, Click Install." So total three steps in which the second step has three sub-steps. Left me thinking whether it's possible for the Canonical guys to give a one-click install to make your laptop a dual-boot Windows - Ubuntu system. In these days and age, even three steps seems too long.

The only real problem I faced was that vimeo videos were not playing in my Firefox browser. I browsed and did a lot of sudo apt-get install. I don't know which one actually worked, but the last one I ran was sudo apt-get install ubuntu-restricted-extras. That seems to have done the trick.

As I spent more time on Ubuntu, there was something smooth and silky feel about using it. Be it the default background for the terminal & desktop which is a gradient purple or the background options for gedit, or the way Firefox looks, I found myself spending more time on Ubuntu than my favorite mac OS. So much so that I wrote my latest short story on LibreOffice Writer. The cool thing I liked further was that my favorite Droid Sans font got transferred to my Wordpress page without me having to do any Javascript tweaking.

Perhaps there's a lesson there. If you want your developers to stick to their laptops, make the whole experience, what I call smooth and silky. Once you have awareness and adherence, you can think of productivity.

And then came the convenience of technical productivity. Whether it is through the Ubuntu Software Center or with the sudo apt-get install command, I could install pretty much the languages, frameworks, tools that I needed. This, without having to go to any web page, downloading a zip/tar file, unzip, set environment variable and all that fuss.

With the tech ecosystem availabile, what best can the laptop be used for? The in-thing or one of the in-things these days is Big Data. Oh ok, it's been in for a little more than two years now, and though it's been predicted that IoT will over take Big Data in business transaction [1], it's still early days for one to get on to the Big Data / Analytics bandwagon. So I decided to play with Hadoop on Ubuntu.

When you want to learn a new framework or language, it's always important to get to have good introductory tutorials to hand hold and walk you through the initial steps. For the frameworks, learning curves decrease and adoption rates increase if well-written correctly-paced tutorials and instructions are available. I'd written about the "Level-Zero Tutorial for Getting Started with R" for R and similarly for Hadoop, there is a five-step tutorial available to get you started.

The tutorial by Prithwis Mukherjee [2] is written as a blog post and has five steps:
1. Install Hadoop 2.2, in a single machine cluster mode on a machine running Ubuntu
2. Compile and run the standard WordCount example in Java
3. Compile and run another, non WordCount, program in Java
4. Use the Hadoop streaming utility to run a WordCount program written in Python, as an example of a non-Java application
5. Compile and run a java program that actually solves a small but representative Predictive Analytics problem

The first step, Hadoop installation requires you to create a new user. You really don't have to do so. Emilio Coppa gives shows you how [3]. I followed his procedure.

I faced some minor problems while running the five steps. I added them in the comments section of the blog posts, so it may save some time for those who want to try out. All the commands I ran are available at this --> gist[4].

The four programs have the map method and reduce method do a certain number of steps. I wrote summary of those steps just to see the coding patterns. The reverse engineered documentation is available in this --> file[5]. In fact, there is a book "MapReduce Design Patterns", which explaing those things in greater details. More on that later, meaning when I get time to lay hands on the book. In the meanwhile, Happy Ubuntuing and Hadooping.
References
[1] Gartner Report : The Internet of Things Takes Over Big Data as the Most Hyped Technology - http://chiefexecutive.net/internet-things-takes-big-data-hyped-technology
[2] http://thoughtshoppe.blogspot.in/2014/05/getting-started-with-mapreduce-and.html
[3] http://www.ercoppa.org/Linux-Install-Hadoop-220-on-Ubuntu-Linux-1304-Single-Node-Cluster.htm
[4] https://gist.github.com/mh-github/25cff3ed12e60a4153b4
[5] http://www.mediafire.com/view/56qzaxh0s7kkknk/Hadoop-five-step-tutorial-pseudo-code.txt

No comments:

Post a Comment