6.8
Lecture 21: ArrayLists
Working with Java’s built-in mutable lists
In the last lecture, we determined how to remove items from a mutable list,
and saw that carefully considering all the behaviors, including the
subtle special case of removing the first item from the list, led to
revising our data definitions to make that case not so special. From there,
we began designing an interface for our “ideal” generic mutable list.
As it turns out, such a construction is so ubiquitously useful that Java
defines a class that implements almost precisely this interface. Let’s start
working with this new class, and see what new challenges it brings.
21.1 Introducing ArrayLists
In the last lecture we described an IMutableList interface as follows:
interface IMutableList<T> { |
void addToFront(T t); |
void addToEnd(T t); |
void remove(T t); |
void remove(IPred<T> whichOne); |
T get(int index); |
void set(int index, T t); |
void insert(int index, T t); |
int size(); |
} |
This interface does not actually exist, in the sense that it is not predefined for us
by name, and we will not actually define it or any classes that implement it explicitly.
Instead, Java defines a class for us that provides essentially these (and other) methods anyway: the ArrayList<T> class.
(Note: to use ArrayLists in code, we have to include the following import statement at the top
of each file that uses them:
import java.util.ArrayList; |
We have seen these import statements before, though we have not remarked on them: they are essentially like Racket’s
require, and include predefined libraries into our code. The naming convention for libraries reads from left to right,
and descends from large, general categories to smaller libraries and down to individual classes. Here, we can see from the name that the designers of Java’s
libraries deemed ArrayLists to be of great utility to all of java!)
Because Java defines this class for us, several consequences immediately follow. Unlike our IList<T>
and its attendant ConsList<T> and MtList<T> classes, we do not control the implementation
of ArrayList<T>. We cannot add methods to it; we cannot use dynamic dispatch to recur over its structure;
we actually have no visibility into how the class is implemented at all. And we shouldn’t have to! It suffices
to know that we can manipulate an ArrayList<T> using its methods, rather than knowing how
those methods are implemented.
This is a powerful form of abstraction, known as representation independence: client code
(i.e. users of ArrayList<T>) should neither know nor care about how libraries (such as ArrayList<T>)
are implemented, but should simply interact with them through their interfaces. Until now, we have
been both the author of the library code and the client code, so the distinction was not so crisp.
In larger-scale programs, this distinction is a crucial part of software engineering.
Which class have we seen in the homework assignments that is most akin to the ArrayList<T> class?
(Hint: which class essentially implements the interface above?)
If we cannot rely on knowledge of the implementation to guide our methods, how can we interact with a ArrayList<T>, and
what might we want to do with one? Presumably, it should still be list-like, which means we ought to be able to
get items from it, obtain its length, map functions over it, sort it into order, etc.
21.2 Obtaining items from an ArrayList
With ILists, we could implement a method getFirst that would return this.first
from a ConsList class, and throw an exception from the MtList class. With ArrayLists,
we simply use the get method, and access items by their index, meaning their position in the list (counting from 0):
ArrayList<String> someStrings = ...; |
someStrings.get(0) |
Of course, if the list is in fact empty, then there is no first item available, and the get method
will throw an IndexOutOfBoundsException signaling the error.
Indeed, get is more general than just getting the first item of a list; it can get any item of the list,
assuming it is given a valid index:
class ExampleArrayLists { |
void testGet(Tester t) { |
ArrayList<String> someStrings = new ArrayList<String>(); |
t.checkException(new IndexOutOfBoundsException("Index: 0, Size: 0"), |
someStrings, "get", 0); |
someStrings.add("First string"); |
someStrings.add("Second string"); |
t.checkExpect(someStrings.get(0), "First string"); |
t.checkExpect(someStrings.get(1), "Second string"); |
t.checkException(new IndexOutOfBoundsException("Index: 3, Size: 2"), |
someStrings, "get", 3); |
} |
} |
This example also demonstrates adding items to the list: the new items are added to the end of the list,
as evidenced by the indices of the middle two tests. If we want to insert an item at a given index,
we can use another form of add that takes an index as its first parameter:
class ExampleArrayLists { |
void testAdd(Tester t) { |
ArrayList<String> someStrings = new ArrayList<String>(); |
someStrings.add("First string"); |
someStrings.add("Second string"); |
t.checkExpect(someStrings.get(0), "First string"); |
t.checkExpect(someStrings.get(1), "Second string"); |
|
someStrings.add(1, "Squeezed in"); |
t.checkExpect(someStrings.get(0), "First string"); |
t.checkExpect(someStrings.get(1), "Squeezed in"); |
t.checkExpect(someStrings.get(2), "Second string"); |
} |
} |
21.3 Manipulating items via indices: moving two items
Adding items to the end of a list and getting items by index are easy enough, but what else
can we accomplish? For example, how can we move items around in a list?
Suppose we had the following list, but mistakenly initialized its items in the wrong order:
void testSwap(Tester t) { |
ArrayList<String> someStrings = new ArrayList<String>(); |
someStrings.add("Second string"); |
someStrings.add("First string"); |
|
|
t.checkExpect(someStrings.get(0), "First string"); |
t.checkExpect(someStrings.get(1), "Second string"); |
} |
We need to exchange the two items somehow, by changing the items at each index. We could write code to do
this every time we needed to, or we could define a helper method. But unlike with ILists, where we’d just add a new method
to the interface, then implement it on ConsList and MtList, we do not have that luxury here.
Instead, we must resort once again to defining a utility class, within which we define our new method:
class ArrayUtils { |
<T> void swap(ArrayList<T> arr, int index1, int index2) { |
?? |
} |
} |
Implement this swap method. What new method on ArrayLists will be needed?
The counterpart to get, which gets the item at a given index, is set, which sets
the item at the given index to be the newly given value. So the clearest implementation of swap
is:
<T> void swap(ArrayList<T> arr, int index1, int index2) { |
T oldValueAtIndex1 = arr.get(index1); |
T oldValueAtIndex2 = arr.get(index2); |
|
arr.set(index2, oldValueAtIndex1); |
arr.set(index1, oldValueAtIndex2); |
} |
This code is perfectly correct. Still, it is worthwhile practice to see how to streamline this code a bit. First, notice that we don’t need
both variables oldValueAtIndex1 and oldValueAtIndex2. When we invoke set with index2,
notice that the value at index1 has not yet been modified. So we can eliminate one variable, and write
<T> void swap(ArrayList<T> arr, int index1, int index2) { |
T oldValueAtIndex2 = arr.get(index2); |
|
arr.set(index2, arr.get(index1)); |
arr.set(index1, oldValueAtIndex2); |
} |
(We can actually eliminate the other variable, as well, thanks to a quirk of set that was added precisely
to make this next step possible. The set method is defined to return a T value: specifically,
it returns the old value at the index being modified. Accordingly, we can write the following version,
which is quite terse and “clever”:
<T> void swap(ArrayList<T> arr, int index1, int index2) { |
arr.set(index2, arr.set(index1, arr.get(index2))); |
} |
Reading from the inside out, this code gets the value at index2, sets the value at index1
to be that value and then returns the old value at index1, which is then used to set the value at index2.
Perfectly clear, no? No! There is absolutely no difference in the compiled output of any of these three versions of swap, so writing
code this clever is simply asking for trouble. Just because we can write such code doesn’t mean we should.)
21.4 Transforming ArrayLists with map: introducing for-each loops
21.4.1 Mapping via recursion
Now that we have some practice with indices, let’s try implementing map: our goal is
to produce a new ArrayList containing the results of applying a given function to every
item in the original list:
<T, U> ArrayList<U> map(ArrayList<T> arr, IFunc<T, U> func) { |
??? |
} |
Try implementing map. What helpers, if any, do you need to define?
We know we have to iterate through all the items in the list, and the only way we currently have
to access items is via their indices. How can we implement this method? We don’t have ConsList and MtList against which to dynamically
dispatch to methods. Instead we need to keep track of the current index, so that we know which item of the
list to process. Additionally, note that adding an item to a list is a void method: if we want to preserve
all the transformed items, we’ll need to pass the result list as a second accumulator parameter:
<T, U> ArrayList<U> mapHelp(ArrayList<T> source, IFunc<T, U> func, |
int curIdx, ArrayList<U> dest) { |
if (curIdx >= source.size()) { |
return dest; |
} |
else { |
dest.add(func.apply(source.get(curIdx))); |
return this.mapHelp(source, func, curIdx + 1, dest); |
} |
} |
Our base case now occurs when our index has run past the last valid index of the source
list, that is, when it is at least the size of the list (since indices start counting from 0).
Our recursive case is straightforward: it gets the current item, transforms it, and adds it
to the destination list, then recurs at the next index.
All that remains is to kick off the recursion in our original map method. We
need to call the helper, starting at index 0, and passing in a new ArrayList of the
appropriate element type for use as the accumulator:
<T, U> ArrayList<U> map(ArrayList<T> arr, IFunc<T, U> func) { |
ArrayList<U> result = new ArrayList<U>(); |
return this.mapHelp(arr, func, 0, result); |
} |
There is something unsatisfying about this implementation of map, though: all this messing
about with indices seems unnecessarily distracting. Recall the very first description we gave
of map in Fundies I: it returns the result of applying a function to each item of a list.
Nowhere in that description did we have to mention indices. Can we improve our current Java code?
21.4.2 Using for-each loops
In fact, the ideas in this section are applicable to more than just ArrayLists.
We’ll see in Lecture 25: Iterator and Iterable the details of how this construct works, and how we can
make it work with our own datatypes.
The notion of “doing something to each item of a list” is so fundamental that Java actually
defines syntax for making it more convenient: a for-each loop. A for-each loop does
just what it sounds like: it executes some block of statements for each item in a list.
We introduce the syntax of a for-each loop by rewriting map to use one:
<T, U> ArrayList<U> map(ArrayList<T> arr, IFunc<T, U> func) { |
ArrayList<U> result = new ArrayList<U>(); |
for (T t : arr) { |
result.add(func.apply(t)); |
} |
return result; |
} |
Read the code aloud as, “For each item t (of type T) in the list arr,
apply func to t and add the answer to result.” In general,
code that uses a for-each loop looks like the following sketch:
... myMethod(ArrayList<X> xs) { |
...setup... |
for (X x : xs) { |
...body... |
} |
...use results... |
} |
Every loop consists of a header that declares the loop variable x (and its type)
and specifies the list to be iterated over, and a body consisting of a block of statements
to be executed every iteration of the loop, which happens once per item of the list.
Since the body consists of statements, if we want any output from the loop, we must use side effects to do so.
Therefore, before the loop begins we must set up any accumulators we want to use within the loop,
so that once the loop completes we will have the accumulated answer available. In our map
example, the setup step created the result ArrayList, and once the loop completed
we merely needed to return it.
Note that when the ArrayList is empty, the body of the loop will execute zero times. We do not need to explicitly check
for empty lists as a base case, but we do need to make sure that our setup will yield the correct answer
even when the loop never runs. In the case of map, the setup creates an empty result
ArrayList, which is exactly the correct answer when the input ArrayList was empty.
Implement foldr and foldl for ArrayLists, first using recursion
and then again using for-each loops. (Hint: one of these will be much harder than the other, when using for-each loops.)