Monday 4 August 2014

Level Zero Post On R

For the things we have to learn before we can do them, we learn by doing them.” -- Aristotle
Tutorials that tell the learner to “do” the tutorial are the ones that I like the most. A lot of regular tutorial give a bunch of steps, examples and even the application to deploy. But that doesn’t aide very much in learning. Sure, if you are working on a project and get stuck on something, you would use the tutorial as a quick reference. But if you are on the beginning of a learning curve of a new language or a framework, you gotta type up the code. When I go through a tutorial, I take prints and actually type the code, then compile and deploy the examples. The time span that you would get as you do the typing and the different ideas you get strengthens the learning process.

During the last 1 - 2 years, I had been having fleeting brushes with R. Sometimes, I tried a few commands. But never went any further. A few days ago, I bumped into this tutorial called ‘A “Level-Zero” Tutorial for Getting Started with R’. It’s authored by Prof. April Gaylardt of The University of Georgia.

Right at the beginning of the tutorial she writes: “Note, do not try to ‘read’ this tutorial. Instead, work through it. This tutorial is written for you to type each command into R as you get to it.” I couldn’t resist it, took a print of the tutorial and in an hour or so, was able to get a grip on the basic R commands. Thank you very much, Prof April, it was quick and fast and well prepared for people like me who want to start fresh on yet another language or framework.

Years ago, in the C++ phase of my career, I had encountered another such “do-it-to-learn” emphasising book. It is Accelerated C++ : Practical Programming by Example’ by Andrew Koenig and Barbara Moo. Right in the Preface they emphasise: “In order to understand how these programs work, there is no substitute for running them on a computer.” Exercises in every chapter start with: “Compile, execute, and test the programs in this chapter.” The book was immensely useful to me.

Back to R. I did want to put the basic knowledge into practice. By generating some data, and plotting some graphs. So I selected four programs and ran them for different input sizes and took the time taken for each input size as the output.

The programs are taken from the book ‘Algorithms, Fourth Edition’ by Robert Sedgewick and Kevin Wayne. They are: TwoSum, ThreeSum, TwoSumFast, ThreeSumFast. What they do is:
TwoSum : Given an array of N integers, it finds how many pairs there are that sum to zero. It’s a brute force algorithm that has two loops, and for each integer, it checks every other integer by adding and comparing with zero. Order of growth of the running time is quadratic.
ThreeSum : Given an array of N integers, it finds how many triples there are that sum to zero. Similar to TwoSum, it’s a brute force algorithm, that has three loops and the order of growth of the running time is N^3.
TwoSumFast : This is an improvement over TwoSum. Sorts the array and for each number, does a binary search if its negative is present in the array. Classic optimisation trick. Order of growth of running time is now NlogN.
ThreeSumFast : This is an improvement over ThreeSum. Sorts the array and for each pair a[i] and a[j] checks if their negative sum -(a[i] + a[j]) is in the array. Running time is now proportional to N^2logN.

I ran each of the four programs with array size of 250 through 16000, doubling the size each time. The time taken on my iMac are given below, so are the graphs I plotted with R. For the graphs, I used ggplot library and the tips given in the stack overflow question number 2564258 as well as saving the plot tips in University of California, Berkeley's Saving Plots in R.
TwoSum
    250,0.0010
    500,0.0000
   1000,0.0010
   2000,0.0010
   4000,0.0050
   8000,0.0220
  16000,0.0810

ThreeSum
    250,0.0100
    500,0.0560
   1000,0.1090
   2000,0.8490
   4000,6.7140
   8000,53.2710
  16000,427.5480

TwoSumFast
    250,0.0010
    500,0.0000
   1000,0.0020
   2000,0.0040
   4000,0.0070
   8000,0.0010
  16000,0.0040

ThreeSumFast
    250,0.0060
    500,0.0080
   1000,0.0340
   2000,0.1430
   4000,0.6220
   8000,2.6600
  16000,11.4710

No comments:

Post a Comment