On this page:
10.1 Getting started
10.2 Stress  Tests - Timing Tests
10.3 Exploration:
10.4 Implementing Heapsort
10.5 Implementing Husky Heaps
8.12

Lab 10: Stress Tests and Big-O behavior🔗

Related files:
  tester.jar     LabStarter.zip  

Goals: This lab supplies six implementations of sorting algorithms. Your challenge is to identify which type algorithm is being used by each implementation. Additionally, you are to implement the heapsort algorithm, and confirm that its runtime behavior is as expected.

10.1 Getting started🔗

Create a new project for the lab, and add the tester library as usual. Open the LabStarter.zip file, and unzip it into the same directory in your workspace where the src and bin directories for your new project are. Add the Sort.jar library as external JARs. Right-click on your project and select "Refresh". You should now see a package named student inside the src directory, with two files inside it: (Heapsort.java and SortingHeapSort.java). You will be modifying Heapsort.java eventually; leave the other file alone.

Notice that both files in this package start with the declaration package student;. This declaration lets you relate multiple classes in multiple files and combine them into a library. The List data type seen in the classes is an interface that ArrayList implements; it has all the methods you’re used to using (add, get, set, etc. etc.).

Create a run configuration using Main as the main class. You do not need any arguments for this configuration. Try running this configuration, to ensure you have all the pieces of this lab in place. The data sorted for this lab is found in citydb.txt, and it contains data on over 29 thousand cities — names, zip codes, states, and latitude and longitude.

10.2 StressTests - Timing Tests🔗

Your job is now to be an algorithm detective. The program we give you allows you to run any of the six different sorting algorithms on data sets of six different sizes using three different Comparators to define the ordering of the data. When you run the program, the time that each of these algorithms took to complete the task is shown in the console.

Do Now!

How many timing results would you collect if all of these tests ran successfully?

Using the created Run Configuration, run the program. Select an input file, an output file (that must be different than the input!), choose whatever subset of the algorithms you wish to test, and click Run. If you do not select an output file, the output will be printed to the console. Otherwise, the output can be saved as a .csv file that is ready to be opened in a spreadsheet program (Excel, Numbers, or LibreCalc).

Start with just a few small tests, to see how the program behaves, before you decide to run all tests.

The last choice is heapsort, that you have yet to implement. The two files in the student package originally provided only stubs to the stress test program — the method heapsort in the class Heapsort so you can run the stress tests even if you did not implement the heapsort algorithm. The original stub just returns the original unsorted ArrayList. So, if you are having difficulties with implementing heapsort, leave it alone for now, and run the program with the original files in the student package. The running times you get for the heapsort option will be bogus, but you can at least run the program.

You can repeat this as many times as you want, with different choices for algorithms, datset sizes, and comparators. Note that every time you run it, the program will randomize the order of the sorting algorithms; you cannot assume they will run in the order shown.

10.3 Exploration:🔗

Spend at most fifteen minutes trying to answer some of the following questions.

Run the program a few times with small data sizes, to get familiar with what it can do. Then run experiments and try to answer the following questions:

Note: The following algorithms are included in the choices: binary tree sort; insertion sort (for both IList and ArrayList); merge sort; quicksort (for both IList and ArrayList); selection sort. See if you can figure out which one is which.

10.4 Implementing Heapsort🔗

Recall from lecture that a binary heap is a binary tree with two invariants:
  • Structural: It is a complete tree: every layer of the tree is full, except perhaps the bottom layer, and that one must be full from the left.

  • Logical: Every node’s value is greater than the values of both of its children.

We can use these two invariants to implement a sorting algorithm very efficiently.

To represent a binary heap efficiently, we can store its items in an ArrayList in a particular order:
  • The root of the tree is at index 0.

  • The left child of the item at index \(i\) is at index \(2i+1\).

  • The right child of the item at index \(i\) is at index \(2i+2\).

  • The parent of the item at index \(i\) is at index \((i-1)/2\), rounded down.

Now the structural invariant is even easier to state: a complete tree with \(n\) nodes simply uses indices zero through \(n-1\), with no gaps.

Recall from lecture that heapsort takes place in two phases: first, building the heap, and second, repeatedly removing the maximum from the heap and placing it at the end of the ArrayList. We can build a heap in one of two ways; here is the easier one:
  • Starting from the last item in the ArrayList, ensure that it and its children are a valid heap. It has no children, so it must be a valid heap. Go to the next-to-last item in the ArrayList and ensure that it too is a valid heap. It also has no children, so it must be valid. In fact, none of the last \(n/2\) items (where \(n\) is the size of the ArrayList) have any children (why?), so they must all be valid heaps.

  • The \(n/2\) item must have at least one child. If it is smaller than its children, then swap it with the larger of the two, and recursively fixup the heap at the location of that child (because now it might not be ordered properly with respect to its children). This step is called “downheap”.

  • Repeat “downheap” for every item up to and including the root.

NOTE: Java does not use the IComparator<T> interface we have been using so far with the same names we have been using. Instead, it’s called a Comparator, and its method is called compare. But it behaves exactly the same as the IComparator interface we have used; it just has an inconsistent naming convention.

Next, we need to remove the maximum item from a heap. Fortunately, we know where to find it: at the root. To remove the item, we must fill the hole left at the root, without violating our invariants. So we choose the last item in the last layer of the tree and swap them. Now our structural invariant is restored, but the logical invariant is broken. Fortunately, we know how to fix that: just downheap the new value at the root!

Finally, we need to repeat this removal for every item in the heap. Once the heap is empty, the “discards” will actually be the contents of the heap in sorted order.

10.5 Implementing Husky Heaps🔗

Consider a standard binary tree definition, where each node has a value and two tree children, and leaves have no content at all. We want to use this data definition to implement some kind of a heap as above, and we’d like to ensure that the heap operations (get-max-value, insert, remove-max-value) are efficient: \(O(\lg(\#items))\). For this Husky Heap we’ll use a different structural invariant. Instead of asserting that the tree is complete (each layer is completely full, and the last layer is full from the left), we’ll assert that the number of nodes in the left subtree is the same or one greater than the number of nodes in the right subtree, for every node in the heap.

Do Now!

Explain why this size invariant ensures that the tree has height \(O(\lg(\#nodes))\).

Implement the necessary heap operations: inserting a value into a Husky Heap, creating a Husky Heap from a bunch of unordered values, getting the maximum value, and removing the maximum value.

This is tricky! Remember that when you implemented binary heaps above, you had two invariants to maintain (structural and heap-ordering), and the insertion and removal strategy was to preserve the structural invariant in all cases, and fix up the heap-ordering invariant as needed. In this scenario, both invariants might be temporarily invalidated by your solution, and fixed up afterward. It will take some leap of insight here to figure out how to fix the two invariants...