Wednesday, 25 March 2015

Learning From A Programmer's Guide To Data Mining

During the past weeks, I was reading up, and learning from the online book, A Programmer's Guide to Data Mining by Ron Zacharski. The website is

The books advises right upfront that it "follows a learn-by-doing approach. Instead of passively reading the book, I encourage you to work through the exercises and experiment with the Python code I provide. Experimenting around, code hacking, and trying out methods with different data sets is the key to really gaining an understanding of the techniques."

I tried a different approach. My method of learning was to translate the programs from Python to Ruby. The translation journey was a whirlwind tour through various Python keywords and idioms. I started with tuples and then zoomed ahead with file I/O, map, lambda, random, priority queue, unit testing, and finally list comprehension. It was fun finding what a particular Python keyword or syntax does, and then figuring how to do the same in Ruby.

The list comprehension was a bit of a challenge. I'd written about it in an earlier blog post. I can rollout a list comprehension, but Ron has a three-depth nested loop in chapter 8. I struggled to write the normal forward loop for this, but when I was not making progress, I posted the problem on stackoverflow, question 28781721. I got answers; then I also informed the same to Ron via email, and he joined the discussion and gave the solution to me on the thread.

As one reads the chapters, one would realize that the topics that go into the subject of data mining and those that go into the subject of machine learning are very common. As you go from the first chapter, successive chapters use more advanced Python features. Not only data mining techniques, you would learn more of Python itself. Probably the author himself got more knowledge about Python as he wrote the book.

This book has a very informal tone, with plenty of pictures and examples, a la Head First approach, it looks almost like a Head First series book, but thankfully is not one. Head First books initially were good with a proper balance of text, explanation, pictures and exercises. But some of the later books were heavy on the pictures so that I thought they were cutting down on the subject material.

As a last step, I wrote up a document of the key points from each chapter and I have uploaded my Ruby code to github. I will publish them in subsequent posts.

No comments:

Post a Comment