#### Lab 10: Stress Tests and Big-O behavior

Goals: This lab supplies six implementations of sorting algorithms. Your challenge is to identify which type algorithm is being used by each implementation. Additionally, you are to implement the heapsort algorithm, and confirm that its runtime behavior is as expected.

##### 10.1` `Getting started

Create a new project for the lab, and add the tester library as usual. Open the LabStarter.zip file, and unzip it into the same directory in your workspace where the src and bin directories for your new project are. Add the Sort.jar library as external JARs. Right-click on your project and select "Refresh". You should now see a package named student inside the src directory, with two files inside it: (Heapsort.java and SortingHeapSort.java). You will be modifying Heapsort.java eventually; leave the other file alone.

Notice that both files in this package start with the declaration package student;. This declaration lets you relate multiple classes in multiple files and combine them into a library. The List data type seen in the classes is an interface that ArrayList implements; it has all the methods you’re used to using (add, get, set, etc. etc.).

Create a run configuration using Main as the main
class. You do not need any arguments for this configuration.
Try running this configuration, to ensure you have all the pieces of this lab in place.
The data sorted for this lab is found in citydb.txt, and it contains data on
over 29 thousand cities —

##### 10.2` `StressTests - Timing Tests

Your job is now to be an algorithm detective. The program we give you allows you to run any of the six different sorting algorithms on data sets of six different sizes using three different Comparators to define the ordering of the data. When you run the program, the time that each of these algorithms took to complete the task is shown in the console.

Do Now!

How many timing results would you collect if all of these tests ran successfully?

Using the created Run Configuration, run the program. Select an input file, an output file (that must be different than the input!), choose whatever subset of the algorithms you wish to test, and click Run. If you do not select an output file, the output will be printed to the console. Otherwise, the output can be saved as a .csv file that is ready to be opened in a spreadsheet program (Excel, Numbers, or LibreCalc).

Start with just a few small tests, to see how the program behaves, before you decide to run all tests.

The last choice is heapsort, that you have yet to implement. The two files in the
student package originally provided only stubs to the stress test program
—

You can repeat this as many times as you want, with different choices for algorithms, datset sizes, and comparators. Note that every time you run it, the program will randomize the order of the sorting algorithms; you cannot assume they will run in the order shown.

##### 10.3` `Exploration:

Spend at most fifteen minutes trying to answer some of the following questions.

Run the program a few times with small data sizes, to get familiar with what it can do. Then run experiments and try to answer the following questions:

Which algorithms run mostly in quadratic time, i.e. \(O(n^2)\)?

Which algorithms run mostly in \(O(n \log n)\) time?

Which algorithms use the functional style, using Cons lists?

Which algorithm is the selection sort?

Why is there a difference when the algorithms use a different Comparator?

Copy the results into a spreadsheet. You may save the result portion in a text editor with a .csv suffix and open it in Excel (or some other spreadsheet of your choice). You can now study the data and represent the results as charts.

Do so for at least three algorithms, where there is one of each —

a quadratic algorithm and a linear-logarithmic algorithm.

Note: The following algorithms are included in the choices: binary tree sort; insertion sort (for both IList and ArrayList); merge sort; quicksort (for both IList and ArrayList); selection sort. See if you can figure out which one is which.

##### 10.4` `Implementing Heapsort

Structural: It is a complete tree: every layer of the tree is full, except perhaps the bottom layer, and that one must be full from the left.

Logical: Every node’s value is greater than the values of both of its children.

The root of the tree is at index 0.

The left child of the item at index \(i\) is at index \(2i+1\).

The right child of the item at index \(i\) is at index \(2i+2\).

The parent of the item at index \(i\) is at index \((i-1)/2\), rounded down.

Starting from the last item in the ArrayList, ensure that it and its children are a valid heap. It has no children, so it must be a valid heap. Go to the next-to-last item in the ArrayList and ensure that it too is a valid heap. It also has no children, so it must be valid. In fact, none of the last \(n/2\) items (where \(n\) is the size of the ArrayList) have any children (why?), so they must all be valid heaps.

The \(n/2\) item must have at least one child. If it is smaller than its children, then swap it with the larger of the two, and recursively fixup the heap at the location of that child (because now it might not be ordered properly with respect to its children). This step is called “downheap”.

Repeat “downheap” for every item up to and including the root.

Next, we need to remove the maximum item from a heap. Fortunately, we know where to find it: at the root. To remove the item, we must fill the hole left at the root, without violating our invariants. So we choose the last item in the last layer of the tree and swap them. Now our structural invariant is restored, but the logical invariant is broken. Fortunately, we know how to fix that: just downheap the new value at the root!

Finally, we need to repeat this removal for every item in the heap. Once the heap is empty, the “discards” will actually be the contents of the heap in sorted order.

##### 10.5` `Implementing Husky Heaps

Consider a standard binary tree definition, where each node has a value and two tree children, and leaves have no content at all. We want to use this data definition to implement some kind of a heap as above, and we’d like to ensure that the heap operations (get-max-value, insert, remove-max-value) are efficient: \(O(\lg(\#items))\). For this Husky Heap we’ll use a different structural invariant. Instead of asserting that the tree is complete (each layer is completely full, and the last layer is full from the left), we’ll assert that the number of nodes in the left subtree is the same or one greater than the number of nodes in the right subtree, for every node in the heap.

Do Now!

Explain why this size invariant ensures that the tree has height \(O(\lg(\#nodes))\).

Implement the necessary heap operations: inserting a value into a Husky Heap, creating a Husky Heap from a bunch of unordered values, getting the maximum value, and removing the maximum value.

This is tricky! Remember that when you implemented binary heaps above, you had two invariants to maintain (structural and heap-ordering), and the insertion and removal strategy was to preserve the structural invariant in all cases, and fix up the heap-ordering invariant as needed. In this scenario, both invariants might be temporarily invalidated by your solution, and fixed up afterward. It will take some leap of insight here to figure out how to fix the two invariants...