Lecture 14: Iterators
14.1 Traversing over a collection of data
In Lecture 6: Recursive unions: Lists we designed a list, and generalized it in Lecture 7: Lists continued. All the operations on it were designed recursively, as it fit our recursive union data representation. However many of these operations traversed the list, processing each element in it exactly once. A simple example of this would be counting the size of the list, but many other general-purpose operations can be implemented by traversing the list. Subsequently, we discussed Java’s list implementations, and specifically a non-recursive way in which they can be traversed. This traversal materialized in one of two ways: a counting-for loop, and a for-each loop. An example is shown below:
List<String> list = new ArrayList<String>(); //or LinkedList<String> ... //counting for loop for (int i=0;i<list.size();i++) { String str = list.get(i); ... } //for-each loop for (String str:list) { ... }
For a linked list, the for-each loop is more efficient (when it is applicable). This is because we know that getting an element in a linked list involves starting at the beginning (or end) and advancing until we reach the desired position (array lists and arrays do not have this overhead, because of random access).
However this metaphor of traversing through a collection of data is not limited to sequential lists. The for-each loop applies to many other Java classes that are used to store data.
Set<String> setOfStrings = new TreeSet<String>(); //or HashSet<String> ... for (String str:setOfStrings) { ... } Map<Integer,String> zipcodes = new TreeMap<Integer,String>(); //or HashMap<Integer,String> ... //traverse all keys for (Integer zip:zipcodes.keySet()) { String city = zipcodes.get(zip); //get the city corresponding to this zip code ... } //traverse all values (usually slower) for (String city:zipcodes.values()) { ... } //traverse all (key,value) entries for (Map.Entry<Integer,String> e:zipcodes.entrySet()) { Integer zip = e.getKey(); String city = e.getValues(); ... }
Note the remarkable design of the above examples: the syntax to traverse the data (visiting each data exactly once) is virtually the same (using a for-each loop) irrespective of how the data is internally stored (as a list, tree, hash table, etc.). This is a powerful and useful abstraction.
Contrast this with our data representations. In the ListADT class in Lecture 7: Lists continued we defined every operation recursively, implicitly repeating the same traversal algorithm: do X to this data, and continue to rest. Every new operation would require a different way of processing data, but largely the same traversal algorithm. Moreover we had the freedom (and burden) of defining a new operation by adding methods to the internal node classes. We see the same pattern in the TreeNode classes in Lecture 9: Generic trees and other data structures. Thus, while our data representations forced us to implement each operation internally by implementing the traversal ourselves, it would be desirable to give the client the ability to implement operations externally as data traversals: Traverse the data internally, with an opportunity to do X to each item.
Traverse the data internally, and do X to each item sounds like a higher-order map operation. However a map is a specific operation: start from a data collection and produce a data collection of identical structure but possibly different type of data. Traversal may have other goals and may be used to devise more drastic transformations that neither map, fold nor filter can cleanly accomplish. Just as we formulated actual operations in terms of higher-order functions, we can formulate operations using a traversal theme.
However, an attempt to formulate the counting operation on our lists creates the following problem:
GenericListADTNode current; for (current-starts-at-head;current-is-not-end;advance-current) { }
In our data representation, there is no way to identify whether we have reached the end of the list, without resorting to determining the type of node (current instanceof GenericEmptyNode). This goes against our design (see Dynamic dispatch) of explicitly determining type to decide which operation to do.
The traversal-by-looping metaphor is even harder to imagine with hierarchical representations. While lists have a clear sequential order that is amenable to looping, hierarchies have no such structure. The overall problem seems to be to offer a uniform way to visit all data items once, irrespective of how they are internally organized. Since there are many ways to organize the data, we must define a custom solution for each such organization that creates an illusion of linear traversal. The Iterator design pattern solves this problem.
14.2 The Iterator pattern
An iterator is an object that “manages” the iteration (traversal) through a data structure. Because the data organization is different, each collection must have its own iterator object. A basic iterator offers three operations:
Return the item at the current iteration
Go to the next item
Are there more items?
The Iterator<T> interface in Java has a method to remove the current element from the collection, but it is not supported by all its implementations.
The Iterator<T> interface in Java has the following two methods for the above operations:
public T next(); //advance and return the next element public boolean hasNext(); //is there more in this iteration?
Every class that supports iteration implements the Iterable<T> interface. This interface has one method, that returns an Iterator<T> object. In other words, every data collection that implements the Iterable<T> interface announces that it can be iterated, by using an Iterator<T> object that it provides.
The following code can be used to iterate over a collection:
//get the iterator Iterable<T> it = iterableObject.iterator(); //traverse the iterator while (it.hasNext()) { T data = it.next(); ... }
The for-each loop achieves the same result as the above code, perhaps in a more concise way:
for (T data:iterableObject) { ... }
14.3 Iterating through a list
In Lecture 7: Lists continued we created a generic list as an abstract data type. We now add an iterator to this list.
14.3.1 Tests
As we will follow Java’s implementation of the iterator, we can test it by verifying a for-each loop on our list.
public class ListADTImplTest { private ListADT<String> stringList; @Before public void setup() { stringList = new ListADTImpl<String>(); } @Test public void testIterable() { //convert the list of strings above to a list that contains the length of // each word in the list String sentence = "The quick brown fox jumps over the lazy dog"; String []words = sentence.split("\\s+"); for (String w:words) { stringList.addBack(w); } StringBuilder sb = new StringBuilder(); //for-each loop on *our* list! for (String s:stringList) { sb.append(s+" "); } assertEquals("The quick brown fox jumps over the lazy dog ",sb.toString()); } }
14.3.2 Changes to the List ADT
To make our ListADT implementations iterable, we make it extend the Iterable<T> interface.
/** * This interface represents a generic list. * * We represent the type of data that this will work with * a generic parameter T. This is a placeholder for the * actual data type. * * This interface is also iterable. */ public interface ListADT<T> extends Iterable<T>{ ... }
The implementation now adds one method:
/** * This is the implementation of a generic list. * Specifically it implements the listadt.ListADT * interface. It represents a singly-linked list of data * elements, where every data node stores one piece of * data and a reference to the next item in the list. The * list ends with a sentinel empty node. */ public class ListADTImpl<T> implements ListADT<T> { private GenericListADTNode<T> head; ... @Override public Iterator<T> iterator() { ... } }
14.3.3 The Iterator object
We now define an Iterator object specifically for the ListADTImpl class. Because the iterator is customized for a class and is tightly coupled with it, we make it an inner class.
public class ListADTImpl<T> implements ListADT<T> { private GenericListADTNode<T> head; ... @Override public Iterator<T> iterator() { ... } private class ListADTIterator<T> implements Iterator<T> { @Override public boolean hasNext() { ... } @Override public T next() { ... } } }
The iterator starts at the head of the list, and tracks the node it is currently referring to. We must define operations to:
Return the data at the current node. This is possible only when the current node is a GenericElementNode object.
Return true if there is anything remaining in the list (i.e. the current node is a GenericElementNode object), false otherwise.
Accordingly, we add two methods to the ListADTNode<T> interface.
If it feels like the canAdvance method is really revealing the type of node, you are not completely wrong. However this operation has more meaning in the context of the list, and so its implementation revealing the type of node is a helpful byproduct rather than its primary purpose. In other words its name and usage blend better in context (iteration).
public interface GenericListADTNode<T> { ... /** * Given this node, advance the iteration by one step by returning the advance * node in the list. If there is no advance, return itself. * @return the advance node in the list, or itself if there is none */ GenericListADTNode<T> advance(); /** * Return if there is something else in this list. * @return true if there is something in this list ahead, false otherwise */ boolean canAdvance(); }
Strictly speaking it is inappropriate to have a method called advance in the GenericEmptyNode class. We can have it throw an UnsupportedOperationException object. Alternatively we could emphasize type safety and add this method only to the GenericElementNode class. However doing this would require knowing the type of node in the iterator, so we avoid it in this case. Refer to Adding an employee: addEmployee and addContractEmployee for a more general discussion of type-safety vs uniformity in union types.
We add a method advance() that returns us the next node in the list. Since this behavior is conceptually undefined for empty nodes, the specification states that it returns itself in this case. We also add a method canAdvance.
We implement these methods as follows:
//In GenericElementNode<T> @Override public GenericListADTNode advance() { return this.rest; } @Override public boolean canAdvance() { return true; } //In GenericEmptyNode<T> @Override public GenericListADTNode<T> advance() { return this; } @Override public boolean canAdvance() { return false; }
14.3.4 Implementation of the Iterator
With the above methods, we can now implement the iterator.
public class ListADTImpl<T> implements ListADT<T> { private GenericListADTNode<T> head; ... @Override public Iterator<T> iterator() { return new ListADTIterator(head); } private class ListADTIterator<T> implements Iterator<T> { private GenericListADTNode<T> current; private ListADTIterator(GenericListADTNode<T> head) { current = head; } @Override public boolean hasNext() { return current.canAdvance(); } @Override public T next() { T element = current.get(0); //get the data current=current.advance(); //advance return element; } } }
We create an iterator by passing the head of the list to its private constructor. We then use the new methods added in the GenericListADTNode interface to implement the iterator methods. We use the get method in the nodes to retrieve the data at the current node (the data at the current node is the data at index 0 in the list beginning at the current node).
14.4 Iterating through a hierarchy
Iterating through a non-sequential hierarchy is more involved, as the iterator cannot be advanced linearly as in the case of a list.
We implemented a generic hierarchy for an organization in Lecture 9: Generic trees and other data structures. We now make this hierarchy iterable. The overall design changes are similar, but the implementation of the iterator itself is different.
14.4.1 Tests
If our hierarchy is iterable, we can write a for-each loop over it. This illustrates the power of abstraction of the iterator pattern.
public class TreeNodeTest { IterableOrganization ccis; IterableOrganization startup; @Before public void setup() { //create the hierarchy ... } @Test public void testIterator() { List<String> names = new ArrayList<String>(); //for-each loop on *our* hierarchy! for (Employee e:ccis) { names.add(e.getName()); } Set<String> actual = new HashSet<String>(names); Set<String> expected = new HashSet<String>(ccis.allEmployees()); assertEquals(expected,actual); } }
Specifically we rely on the allEmployees method that we have tested before to test our iterator. We create a list of all employee names by iterating over the hierarchy. We then compare its contents with the list returned by the allEmployees. As the order of iteration may be different than the order of employees in the list returned by the allEmployees method, we use sets instead of using list equality.
14.4.2 Iterator design changes
We make the following changes to the existing design (similar to the list iterator above):
We extend the Organization interface to create a IterableOrganization interface that also extends Iterable<Employee>.
public interface IterableOrganization extends Organization,Iterable<Employee>{ ... } We make the OrganizationImpl class implement the IterableOrganization interface.
public class OrganizationImpl implements IterableOrganization { private TreeNode<Employee> root; ... @Override public Iterator<Employee> iterator() { return new OrganizationIterator(root); } } We define the iterator as an inner class called OrganizationIterator. It will begin iterating with the root of the hierarchy, passed to its constructor.
private static class OrganizationIterator implements Iterator<Employee> { public OrganizationIterator(TreeNode<Employee> root) { ... } @Override public boolean hasNext() { ... } @Override public Employee next() { ... } } - We add two methods to the TreeNode<T> interface:
List<TreeNode<T>> children(): this method returns a list of all children of the current node.
T getData(): this method gets the data at the node. Recall that this implementation has data at every node.
public interface TreeNode<T> { ... /** * Return all the children of this node. If the node has no children, an * empty list is returned. * @return a list of all children of this node */ List<TreeNode<T>> children(); /** * Get the data at this node. * @return the data stored at this node */ T getData(); }
14.4.3 Implementing the Iterator
The iterator must begin at the root of the tree. However it must traverse each node exactly once. This requires a traversal through the hierarchy, one step at a time.
Many types of traversals in a hierarchy are possible, as discussed in A list of all names: allEmployees: depth-first (top-down) and breadth-first approaches. The allEmployees method implements a top-down traversal of the hierarchy, but recursively. Recursion does not let us proceed one step at a time in a clean way.
Therefore we implement the top-down traversal in a non-recursive way. We start at the root. At every step, we return the data at that step. Then we advance the iteration to its children. We can implement this logic using a stack. As a stack is a last-in-first-out (LIFO) structure, we reverse the order of the children so that the first children pops from the stack first.
private static class OrganizationIterator implements Iterator<Employee> { Stack<TreeNode<Employee>> stack; private OrganizationIterator(TreeNode<Employee> root) { stack = new Stack<TreeNode<Employee>>(); stack.push(root); } @Override public boolean hasNext() { //if the stack is empty, there are no more nodes to explore return !stack.isEmpty(); } @Override public Employee next() { if (!stack.isEmpty()) { TreeNode<Employee> curr = stack.pop(); Employee e = curr.getData(); List<TreeNode<Employee>> children = curr.children(); //as we want to visit the children in order we insert them in //reverse order in the stack Collections.reverse(children); for (TreeNode<Employee> c : children) { stack.push(c); } return e; } return null; } }
Do Now!
Change the above iterator so that it produces a bottom-up traversal. How about a level-order traversal?
As we can see, there are multiple ways to traverse a hierarchy, resulting in a different ordering of the data items. If some hierarchies are ordered, then the iterator pattern may reveal something about the ordering. For example, the iterator of the TreeSet class iterates through all the data in ascending order (corresponding to an in-order traversal of the underlying binary search tree). It is important to note that the iterator pattern itself makes no promises about the order in which data items will be traversed, just the guarantee that each data item will be reached exactly once in a complete iteration. Individual implementations may go above and beyond to impose specific ordering.
14.5 Iterator as a general metaphor
Although the iterator pattern is typically used to provide a uniform abstraction to traversing a data collection, its uses in loops allows this design to be used in other situations. A simple situation is to simulate a generative sequence as if it were a iterable list.
14.5.1 A sequence of number squares
Consider the following sequence: \(1^2, 2^2, 3^2, ...\). We can express an object that generates this sequence one-step-at-a-time as an iterator. We can also customize the positions at which this sequence starts and ends, so that the iterator has a finite end point. We can further customize how it advances in the sequence, by incrementing in user-defined steps. For example, \(3^2,5^2,7^2,9^2\) is a sequence that starts and ends at \(3\) and \(9\) respectively, and increments it in steps of \(2\).
14.5.2 Testing the sequence
Assuming that we have a complete implementation, we can test the sequence as follows:
@Test public void testSquaresSequence() { SquaresSequence seq = new SquaresSequence( new BigInteger("" + 0) //start at 0 ,new BigInteger("1000000000") // go until this count ,new BigInteger(""+100)); //in increments of 100 BigInteger counter = new BigInteger(""+0); //to compute the expected results for (BigInteger result:seq) { //the sequence as an iterator assertEquals(counter.multiply(counter),result); counter = counter.add(new BigInteger(""+100)); } }
We use the BigInteger class to support arbitrarily long sequences without worrying about overflow.
14.5.3 Implementation
We start by defining a class that represents such a sequence. This object will provide us with an iterator.
public class SquaresSequence implements Iterable<BigInteger> { private BigInteger start,end,increment; public SquaresSequence(BigInteger start,BigInteger end,BigInteger increment) { this.start = start; this.end = end; this.increment = increment; } public Iterator<BigInteger> iterator() { return new SequenceIterator(start,end,increment); } private class SequenceIterator implements Iterator<BigInteger> { private SequenceIterator(BigInteger start,BigInteger end, BigInteger increment) { ... } @Override public BigInteger next() { ... } @Override public boolean hasNext() { ... } } }
The iterator maintains the current position in the sequence, which starts from the start value provided to its constructor. In next we prepare the square of the current number, and advance the current number with the provided increment. The hasNext method checks if the current value is within the specified end value.
private class SequenceIterator implements Iterator<BigInteger> { private BigInteger counter; private BigInteger end; private BigInteger increment; private SequenceIterator(BigInteger start,BigInteger end, BigInteger increment) { counter = start; this.end = end; this.increment = increment; } @Override public BigInteger next() { BigInteger result = counter.multiply(counter); counter = counter.add(increment); return result; } @Override public boolean hasNext() { return counter.compareTo(end)<=0; } }
14.6 Other possible operations on iterators
Iterators can support several other operations.
14.6.1 Backward iteration
For some data collections (such as ordered collections) it makes sense to iterate “forward” or “backward”. For lists, this operation is usually easy to add, if it is possible to navigate backwards through a list. If backward navigation is not directly supported by a list implementation (e.g. our ListADTImpl class) then a possible way to implement such an iterator is to copy the contents in a list that supports backward iteration, and support backward iteration of the data through it. A similar idea can be used to support backward iteration through non-sequential, but nevertheless ordered collections (such as binary search trees).
The ListIterator<T> interface support backward iteration.
14.6.2 Removal
Iterators can be used to remove the “current” data item from the underlying collection. In our ListADT iterator, this may be done by tracking the “previous” node, so that the current node may be removed from the list. This operation is more involved if the iteration is done by using a secondary data structure (such as the Stack in our ListADTImpl iterator).
The Iterator<T> interface in Java does have a remove() method. Its default implementation throws an UnsupportedOperationException object. If your iterable collection wishes to support removing through iterators, you should override this method accordingly.
14.7 External vs. Internal Iterators
The iterator we wrote above allow us to specify a traversal externally (“Start from the beginning, advance when I say so, and return me a data item so that I can operate upon it”). The iterator is an object that is external to the data collection it iterates. Also, the client explicitly writes code that invokes the iterators and directs it to advance. Finally, the client “extracts” the data using the iterator and operates on it. The iterator takes care of the semantics of how to advance, and to return data items. This may be termed an “external” iterator.
public class Foo<T> implements Iterable<T> { // I support the iterator pattern ... public Iterator<T> iterator() {...} } private class FooIterator<T> implements Iterator<T> { //you can use me to direct and control your iteration ... public boolean hasNext() {...} public T next() {...} } //usage: loop over the object and operate on data for (T data:fooObject) { doOperation(data); } void doOperation(T data) { ... }
Contrast this with how a map higher-order function worked (“Traverse through the collection in a way that I cannot control, just apply the given function to each data item”). The map does not provide direct access to the underlying data item, rather processes it and returns the result in a collection of identical structure. We can generalize this into another higher-order function.
public class Foo<T> implements InternalIterable<T> { //I support internal iteration ... //mandated by InternalIterable<T> public void iterate(Consumer<T> c) { //tell me what to do to every T and I will do it ... } } //usage: no loop fooObject.iterate(obj -> doOperation(obj)); //same doOperation as above
Which method is better? The internal iterator seems easier to use, because it completely hides the iteration semantics and allows us to focus on the data processing part. However it does not allow drastic transformations, such as removing elements (the filter higher-order function may be thought as an example of this, and it does not generalize to non-linear collections). The external iterator makes the client write more code, but it can allow more types of operations.