6.12
Lecture 16: Visitors
Visitors as generic function objects over union data
Introduction
Over the last few lectures, we have seen two powerful abstraction mechanisms:
function objects allow us to abstract over behavior, giving us the flexibility of
higher-order functions in an object-oriented setting, while
generics allow us abstract over types, defining once-and-for-all families of related types like IList<T>.
We can even combine the two, defining interfaces like IFunc<A, R> that describe all function objects
that take an argument of type A and return a result of type R.  We saw examples of how
to define function objects over simple data: for instance, IFunc<Book, String>
is the interface for a function object whose apply method takes a Book and returns a String.
We could then take those function objects and map() them across lists.  So far, so good.
But now we have to ask, how well do these techniques work for more complex data types?  In particular,
what happens if we try to write a function object that takes a value of a union data type?
16.1 Flawed attempt 1: Processing shapes with existing types of function objects
Recall our example from 
Lecture 4 of shapes.  Let’s simplify the definitions, though, by
getting rid of all the methods for now, and let’s temporarily only include 
Circle and 
Rect.
| interface IShape { } | 
| class Circle implements IShape { | 
| int x, y; | 
| int radius; | 
| String color; | 
| Circle(int x, int y, int r, String color) { | 
| this.x = x; | 
| this.y = y; | 
| this.radius = r; | 
| this.color = color; | 
| } | 
| } | 
| class Rect implements IShape { | 
| int x, y, w, h; | 
| String color; | 
| Rect(int x, int y, int w, int h, String color) { | 
| this.x = x; | 
| this.y = y; | 
| this.w = w; | 
| this.h = h; | 
| this.color = color; | 
| } | 
| } | 
Suppose now we have an IList<IShape>, and we want to obtain the list of areas of
all the shapes in the list.  (Notice we got rid of all methods on IShape, above!)  Recall the signature of map():
| <U> IList<U> map(IFunc<T, U> func); | 
Accordingly, for our IList<IShape>, map() must take a function object argument
of type IFunc<IShape, Double>.  But if we try to implement this function object,
we are stuck:
| class ShapeArea implements IFunc<IShape, Double> { | 
| public Double apply(IShape shape) { | 
|  | 
| } | 
| } | 
The IShape interface currently has no methods, and certainly has no fields!  So how can we
implement this functionality?
16.2 Key observation: we’ve seen this problem before
The problem we’re running into in the ShapeArea’s apply method is we don’t
know which specific type of shape we’re given as our argument.  As a result, our
template is empty, and there’s nothing we can do.  We’ve encountered this problem before
with shapes: when we tried to determine when two IShapes were the sameShape.
In that setting, there were two techniques we could use: casting, and double-dispatch.
16.2.1 Flawed attempt 2: Casting
Try writing ShapeArea using casting.  What works well in this approach, and what doesn’t?
We might reason that since a IShape is either a Circle or a Rect, the following code should work:
| class ShapeArea implements IFunc<IShape, Double> { | 
| public Double apply(IShape shape) { | 
| if (shape instanceof Circle) { | 
| return Math.PI * ((Circle)shape).radius * ((Circle)shape).radius; | 
| } | 
| else { | 
| return ((Rect)shape).w * ((Rect)shape).h; | 
| } | 
| } | 
| } | 
This code does indeed implement the formulas for the area of circles and rectangles.
But it is quite ugly having all those casts cluttering up the code.
And worse, it is badly brittle — what happens when we add a Square class as another
IShape?  Our code will still compile, but will crash at runtime with a ClassCastException.
16.3 Specific functions on shapes
Let’s take a step back from making everything too generic, and let’s instead try to design a
IShape2DoubleFunc interface that represents a function that takes an IShape and returns a Double.
(There are no type parameters involved here, yet.)  Now we have the flexibility to change this interface
as needed.
When we were working with sameShape, we started by writing simpler methods like sameCircle or sameRect.
Let’s try the same notion here: instead of having one apply(IShape) method, we’ll have simpler helper methods.
| interface IShape2DoubleFunc { | 
| double applyToCircle(Circle circle); | 
| double applyToRect(Rect rect); | 
| } | 
Now each of these methods should be easier to implement, because we have arguments with specific types instead of
the general IShape interface:
| class ShapeArea implements IShape2DoubleFunc { | 
| public double applyToCircle(Circle circle) { | 
| return Math.PI * circle.radius * circle.radius; | 
| } | 
| public double applyToRect(Rect rect) { | 
| return rect.w * rect.h; | 
| } | 
| } | 
The only remaining trick is to figure out how to use this object!  We manage this by adding one method
to the IShape interface and implementing it on each shape class.  If the purpose of the methods in the
IShape2DoubleFunc is to “apply a function to a shape”, then the purpose of this new method in the IShape
interface is to “be applied to by some function”:
| interface IShape { | 
| double beAppliedToBy(IShape2Double func); | 
| } | 
| public double beAppliedToBy(IShape2Double func) { | 
| return func.applyToCircle(this); | 
| } | 
| public double beAppliedToBy(IShape2Double func) { | 
| return func.applyToRect(this); | 
| } | 
Compare this implementation to the implementation of sameShape, and
figure out what methods are the analogues of sameShape, sameCircle and sameRect,
and what roles the IShape and IShape2DoubleFunc interfaces play.
Notice that this code is substantially cleaner than the version with casts above, and it cannot possibly fail
at runtime with a ClassCastException.  Even better, if we need to add new kinds of IShapes,
our code will still not crash at runtime!  Instead, it will fail to compile, because we won’t have
any way to implement beAppliedToBy for the new class.  This is much better: it means that we’re
using Java’s type system to prevent us from forgetting parts of our implementation.
Suppose we add the following class:
| class Square implements IShape { | 
| int x, y, size; | 
| String color; | 
| Square(int x, int y, int size, String color) { | 
| this.x = x; | 
| this.y = y; | 
| this.size = size; | 
| this.color = color; | 
| } | 
| } | 
What changes do you need to make to extend the ShapeArea class to handle this new case?
| double applyToSquare(Square square); | 
| public double applyToSquare(Square square) { | 
| return square.size * square.size; | 
| } | 
| public double beAppliedToBy(IShape2DoubleFunc func) { | 
| return func.applyToSquare(this); | 
| } | 
Do the same thing, this time trying to compute the perimeters of the shapes.
16.4 Introducing the Visitor pattern
Now, as with our initial implementation of sameShape via double dispatch, the names of the methods
above are not the typical names.  Also, we can take another look at where parameterizing our data types might be
worthwhile.  Having a specialized set of methods for IShapes seems like something we’ll need to keep,
but the return type of double seem like something that’s easily made generic.
The proper name for this pattern of double-dispatch with function objects is called the visitor pattern.
Suppose we have an interface IFoo, and classes X, Y and Z that implement this interface.
We define a visitor for this interface:
| interface IFooVisitor<R> { | 
| R visitX(X x); | 
| R visitY(Y y); | 
| R visitZ(Z z); | 
| } | 
In the IFoo interface, we need to add one method to accept the visitor:
| <R> R accept(IFooVisitor<R> visitor); | 
Finally, in each class, we implement this method in the “obvious” way by matching up names:
| public <R> R accept(IFooVisitor<R> visitor) { return visitor.visitX(this); } | 
| public <R> R accept(IFooVisitor<R> visitor) { return visitor.visitY(this); } | 
| public <R> R accept(IFooVisitor<R> visitor) { return visitor.visitZ(this); } | 
Rewrite the ShapeArea class to implement IShapeVisitor<Double>, instead of
the IShape2Double interface.
16.5 Discussion
Visitors are nothing more than the natural answer to the question, “how do we make function
objects work with union data types?”  That answer combines double dispatch, generics, and function
objects, which are all concepts we’ve seen already: this pattern is just a subtle, clever combination
of those pieces.
When would you need this pattern?  Whenever you are defining a union data type as part of a library,
whose ultimate uses you can’t completely envision.  Suppose for example someone were designing a
library for manipulating HTML content.  They’d have an interface IHTMLTag, and roughly
90 classes for each of the various tag types: IATag, IBrTag, IDivTag, etc.
But the sheer variety of “ways to manipulate HTML” means that the library author cannot
predict all those possibilities and implement them as methods himself!  Instead,
he can supply the accept method that takes an IHTMLTagVisitor — and leaves
this interface available for clients of the library to implement however they choose.
One tiny method’s worth of advance planning by the library author — the accept method — means
the library is flexible and easy to use in ways the library author could not anticipate.
16.6 Wait — but what about map()?
At the beginning of this lecture, we wanted to implement a function object for IShapes that we
could successfully map across a list of shapes.  Instead we’ve created this IShapeVisitor<T> interface,
but that is not an IFunc<IShape, T> that we can use with map!  Recall, we wanted this:
| class ShapeArea implements IFunc<IShape, Double> { | 
| public Double apply(IShape shape) { | 
| ... | 
| } | 
| } | 
But the problem was we had nothing available in our template that would let us implement this method.
At this point, we should take advantage of another feature of Java interfaces that we have not yet needed
to explore: a class is permitted to implement more than one interface.  In particular, while we
currently have
| class ShapeArea implements IShapeVisitor<Double> { | 
| public Double visitCircle(Circle c) { | 
| return Math.PI * c.radius * c.radius; | 
| } | 
| public Double visitRect(Rect r) { | 
| return r.w * r.h; | 
| } | 
| public Double visitSquare(Square s) { | 
| return s.size * s.size; | 
| } | 
| } | 
We can actually revise the class declaration to be
| class ShapeArea implements IShapeVisitor<Double>, IFunc<IShape, Double> { | 
| ... | 
| } | 
The meaning of this declaration is simply that the ShapeArea class promises to implement
all the methods from all the interfaces it claims it implements.  We already have
implementations for all the visitor methods; now we just need to add the methods for IFunc<IShape, Double>:
| class ShapeArea implements IShapeVisitor<Double>, IFunc<IShape, Double> { | 
| ... | 
| public Double apply(IShape shape) { | 
| } | 
| } | 
But this is precisely the method we didn’t know how to implement earlier!
Implement this method.  You should need only a single method call.
If only there were a way to delegate to the visitor methods we have handy in this class... and now there is: that’s
precisely the purpose of the accept method on IShape!
| class ShapeArea implements IShapeVisitor<Double>, IFunc<IShape, Double> { | 
| ... | 
| public Double apply(IShape shape) { | 
| return shape.accept(this); | 
| } | 
| } | 
Think carefully through the types here.  ShapeArea promises to implement IFunc<IShape, Double>,
so it must have a method Double apply(IShape shape).  IShapes know how to accept
a IShapeVisitor<T> (for any T), and ShapeArea promises that it can be a IShapeVisitor<Double>, too.
So the method call shape.accept(this) will typecheck.  And so will passing a ShapeArea object to
map() over a IList<IShape>, because of the IFunc promise.  And now we truly do have function objects over union data,
as we claimed above.
16.6.1 One last refinement: extending interfaces
It is mildly annoying that IShapeVisitor<T> and IFunc<IShape, T> are completely independent interfaces.  Because of that,
it’s possible to design a class that implements IShapeVisitor<T> and forgets to also implement IFunc<IShape, T>.
This is especially annoying because every visitor can implement IFunc; there’s really no reason not to implement them both.
Remember that we’ve been saying that “a visitor is a function object for union data”.  Let’s add just a bit of emphasis to that statement:
“a visitor is-a function object for union data”.  We have a mechanism for expressing when something is-a specialized kind of something else: the extends
keyword.  Java uses extends to express when one class is a subclass of another; it also allows us to express when one interface
is an enhanced version of another interface.  Any class that promises to implement the derived interface implicitly promises to implement every method in that
interface, and also every method from the base interface.  If we declare that our visitor interface extends the IFunc interface,
then there is no longer a way to implement just the visitor interface without also remembering to implement the IFunc interface.
So the final version of our code will be:
| interface IShapeVisitor<R> extends IFunc<IShape, R> { | 
| R visitCircle(Circle c); | 
| R visitSquare(Square s); | 
| R visitRect(Rect r); | 
| } | 
|  | 
| class ShapeArea implements IShapeVisitor<Double> { | 
| public Double visitCircle(Circle c) { return Math.PI * c.radius * c.radius; } | 
| public Double visitSquare(Square s) { return s.side * s.side; } | 
| public Double visitRect(Rect r) { return r.w * r.h; } | 
|  | 
| public Double apply(IShape s) { return s.accept(this); } | 
| } |