Sunday, 27 October 2013

Collections Times

As you spend a lot of years in your career, you would be working on preparing and discussing plans, making choices from the available options, reviewing others’ work products and monitoring progress of plans. When I think I am becoming too strategic and too managerial, meaning not doing much myself, I pull myself back and try to go down to lower level, meaning do something myself. Ideally it would mean going all the way to the bare metal level on the computer, but having spent almost all my career in commercial / business application programming, I rarely venture even into the operating system code. But doing stuff with Java, is surely well within the realm of my fluency.

Participating in training of fresh programmers offers one such opportunity. This year, like the last few years, I was overseeing the training program of campus recruits in collaboration with the training department. During a review of the training material, I noticed a bullet point : Use LinkedList if frequent operation is insertion or deletion. My immediate thought was that rather than just repeating theory from book, it’s better if we ourselves back such points with our own performance test data.

So, Swapna Shetty, our training and recruitment executive and I got together and wrote a couple of programs. The first program is that tests HashSet, LinkedHashSet and TreeSet. It has three operations :
1) Add 2,500 integers.
2) Use an iterator and while loop to print the integers out.
3) Use a foreach loop to print the integers out.

The second program is that tests HashMap, LinkedHashMap and TreeMap. It first populates an ArrayList with 1000 random strings, each of 20 characters length. Then it runs three operations :
1) Put these 1000 random strings in a map, with the string as the key and the integer index as the value.
2) Use an iterator and while loop to print the key and value.
3) Use a foreach loop to print the key and value.

Here is the source code :

Using jdk 1.6, we ran the programs on three computers -
1) My Windows laptop : Dell Latitude E6410 with Microsoft Windows 7 on Intel Core i5 M520 processer @2.40 GHz speed. [S1]
2) Swapna’s desktop : Microsft Windows XP Professional with Intel Pentium 4 processor @2.40 GHz speed. [S2]
3) My Apple iMac : OS X 10.8 (Mountain Lion) on Intel Core 2 duo @2.4 GHz speed. [S3]
We ran each program four times and took the averages. The complete results are in the xls on mediafire.

Set ------------ HashSet----------- LinkedHashSet----------- TreeSet
Insertion---------- 2.75 ----------- 5.25 ----------- 22
while-print------- 451.75 ----------- 350.25 ----------- 161.25
foreach-print---- 328 ----------- 239.75 ----------- 161.25

Insertion---------- 7.5 ----------- 12 ----------- 31.25
while-print------- 183.75 ----------- 70.25 ----------- 62.5
foreach-print---- 74 ----------- 90 ----------- 46.75

Insertion---------- 3.75 ----------- 8 ----------- 22
while-print------- 86.75 ----------- 62.75 ----------- 161.25
foreach-print---- 91.75 ----------- 89.25 ----------- 20.75

Map ------------ HashMap----------- LinkedHashMap----------- TreeMap
Insertion---------- 4.5 ---------- 1.75 ----------- 5.25
while-print------- 365 ----------- 129.25 ----------- 140.5
foreach-print---- 147.25 ----------- 192 ----------- 120.25

Insertion---------- 4 ----------- 7.75 ----------- 35.25
while-print------- 155.75 ----------- 66.25 ----------- 281.25
foreach-print---- 82.5 ----------- 74.25 ----------- 101.5

Insertion---------- 1.75 ----------- 2.75 ----------- 7.25
while-print------- 84 ----------- 41.5 ----------- 49.25
foreach-print---- 39.25 ----------- 44.5 ----------- 54.5

So here are the observations. In set operations, HashSet was the fastest for insertion. It took 2.75 ms on S1, my Windows laptop, 7.5 ms on S2, Swapna’s desktop and 3.75 ms on S3, my iMac. LinkedHashSet took twice the time and TreeSet took about four times the time of HashSet.

For iterating and printing the members, TreeSet performed the fastest. It took 161.25 ms on S1 for both while and foreach, 62.5 ms / 46.75 ms (while / foreach) on S2, and 23 / 20.75 ms (while / foreach) on S3. The for-each loop is faster than the while loop in 2 out of the 3 systems.

Since HashSet is not concerned with preserving insertion order or maintaining elements in a sorted order, it performs the fastest for insertion operation. Iteration is the fastest for TreeSet, probably because it is implemented using arrays for which index-based traversal makes our iteration very fast.

For maps, we ran into some confusion. For insertion, HashMap was the fastest on S2 and S3 whereas LinkedHashMap was the fastest on S1. For iteration, LinkedHashMap was the fastest on S2. On S1, while-loop iteration was fastest by LinkedHashMap whereas the foreach iteration was fastest by TreeMap. On S3, HashMap was the fastest for insertion, LinkedHashMap was the fastest with while-loop iteration and HashMap was the fastest with foreach loop iteration. Meaning there are no consistent results across systems.

These results are only indicative, they are in no way helpful in making a choice. You choose the data structure depending on your need. HashSet is the basic data structure implementing the Set interface and it’s what you would use if having unique elements is your criteria. In addition, LinkedHashSet preserves the insertion order. TreeSet maintains the elements in a sorted order. With regard to map, perhaps they are implemented in different ways in different jdks.

No comments:

Post a Comment