Lecture 17: Mutation
Creating cyclic data, hazards of working with mutation
17.1 Motivation
When you go to a bookstore and ask a sales clerk to look up a book whose author you remember but whose title you have forgotten, the clerk goes to a computer, types in the name of the author, and retrieves the list of books that the author has written. If you remember the title of the book but not the author, the clerk enters the title of the book and retrieves the author’s name. Even though it is feasible for the program to maintain two copies of all the information about books, it is much more natural to think of a data representation in which books and authors directly refer to each other in a circular manner.
17.2 First failed attempt
+----------------------------------------+ | +------------+ | V | V | +--------------+ | +---------------+ | | Author | | | Book | | +--------------+ | +---------------+ | | String first | | | String title | | | String last | | | int price | | | int yob | | | int quantity | | | Book book |-------+ | Author author |--+ +--------------+ +---------------+
// Represents authors of books class Author { String first; String last; int yob; Book book; Author(String fst, String lst, int yob, Book bk) { this.first = fst; this.last = lst; this.yob = yob; this.book = bk; } } // Represent books class Book { String title; int price; int quantity; Author author; Book(String title, int price, int quantity, Author ath) { this.title = title; this.price = price; this.quantity = quantity; this.author = ath; } }
Do Now!
Try creating an example of the following classic text in computer science, using the data representation above:
Donald E. Knuth. The Art of Computer Programming (volume 1).
Addison Wesley, Reading, Massachusetts. 1968.
(Knuth was born in 1938.) Can you do it?
// In ExamplesBooks class Author knuth = new Author("Donald", "Knuth", 1938, new Book("The Art of Computer Programming (volume 1)", 100, 2, ???));
Author knuth = new Author("Donald", "Knuth", 1938, new Book("The Art of Computer Programming (volume 1)", 100, 2, new Author("Donald", "Knuth", 1938, ???)));
Book taocp = new Book("The Art of Computer Programming (volume 1)", 100, 2, new Author("Donald", "Knuth", 1938, ???));
17.3 Second failed attempt
Author knuth = new Author("Donald", "Knuth", 1938, new Book("The Art of Computer Programming (volume 1)", 100, 2, this.knuth));
boolean testBookAuthorCycle(Tester t) { return t.checkExpect(knuth.book.author, knuth); }
17.4 Third flawed-but-successful attempt
// In ExamplesClass boolean testBookAuthorCycle(Tester t) { // Creates an Author whose book is **null**... Author knuth = new Author("Donald", "Knuth", 1938, null); // Creates a Book whose author is ok, but the author's book is still null... Book taocp = new Book("The Art of Computer Programming (volume 1)", 100, 2, knuth); // Now *change* the author's book field to be our newly created book... knuth.book = taocp; // This now passes! return t.checkExpect(knuth.book.author, knuth); }
There are two new features in this code. In the first line, we initialize the Author’s book field to null, a new value that we have never (deliberately) seen before. Null values have fairly strange behavior: any variable of any class or interface type can be initialized to null, but we cannot invoke any methods on null values. We’ll explain more about null shortly; for now, think of it as a “wildcard” value.
The key line here is the third line, which actually modifes the book field of knuth. We call this an assignment statement, and its meaning is to change the value of the field or variable on the left side of the equals sign to the result of evaluating the expression on the right hand side. Just like initialization statements, the right hand side is evaluated completely, before the left hand side gets modified and set to the result.
Incidentally, assignment statements are why we call variables variables. Without assignment statements, variables never actually vary; they just remain fixed at whatever value they were initialized to be. (This is why, up until now, we have been very careful to refer to “identifiers” and not to “variables”.)
Syntactically, this is very similar to how we initialize fields in the constructors of objects.
But don’t be fooled: assignment statements are very different! Initializing fields lets us
“define the field” to be equal to the given value, and in that sense initializations are
at least somewhat like mathematical equations that assert two things to be equal. But
assignment statements do not assert such an equality —
17.5 Interlude: local variables
There is a third salient point in the code above: we defined knuth and taocp as new variables inside a method, instead of as fields of a class or as parameters to a method. These local variables exist only within the body of the method, and disappear again when we return from the method. They are analogous to (local (...) ...) definitions in Racket.
17.6 Interlude: Statements versus expressions
In the section above, we talk about statements and expressions, but what exactly is the difference between them? We know from Fundies 1 that an expression is a piece of a program that can evaluate to a value. Expressions can be composed to form bigger expressions, and these too evaluate to values.
Return statements, return someExpression;, that exit a method and cause the method overall to evaluate to someExpression’s value.
If statements, if (cond) { ... } else { ... }, which mean “If cond is true then execute the first block of statments, otherwise execute the second block.”
17.7 Warning: Side effects may vary
The net effect of an assignment statement, namely a change to a variable, is known as mutation. Since statements on their own do not evaluate to values, the only way we can observe what they’ve done is by their side effects. This has some fairly drastic consequences for our programs.
17.7.1 Non-termination
Do Now!
What do you think will happen? Why?
We defined sameBook to check whether the fields of this Book are the same as the corresponding fields of the given Book. We’ve constructed taocp2 to have the same title as taocp, so we need to check the authors, knuth and knuth2, for sameness.
We defined sameAuthor to check whether the fields of this Author are the same as the corresponding fields of the given Author. We’ve constructed knuth2 to have the same name and age as knuth, so we need to check the books, taocp and taocp2, for sameness.
We defined sameBook to check whether the fields of this Book are the same as the corresponding fields of the given Book...
Exercise
The last line of the code above invokes checkExpect, and passes it two Authors...and yet the test terminates and passes! How do you think the tester library manages this, when our definitions of sameness (so far) would result in our program running forever?
// In Author // Computes whether the given author has the same name and year of birth // as this author (i.e., we're ignoring their books) boolean sameAuthor(Author that) { return this.first.equals(that.first) && this.last.equals(that.first) && this.yob == that.yob; }
17.7.2 Non-determinism
;; In Racket (check-expect (some-function arg-1 arg-2 arg-3) (some-function arg-1 arg-2 arg-3))
// In Java t.checkExpect(anObj.aMethod(arg1, arg2, arg3), anObj.aMethod(arg1, arg2, arg3));
Do Now!
Which one?
But how does random-number generation actually work? We don’t invoke random with any arguments, and yet it produces different outputs. That means it must have some extra information hidden away inside its implementation, which it uses to “keep track” of the previous random values. And in particular, it updates that information every time it’s called, which is how it can produce different values despite not having any obvious inputs.
This is a subtle and powerful point: the ability to modify local state (e.g. by assigning to local variables or fields) means that methods may no longer be deterministic, and may not produce equal answers for equal inputs. In other words, they’re no longer functions! This means that our program behavior is no longer obviously predictable.
17.7.3 Non-testable code
class Counter { int val; Counter() { this(0); } Counter(int initialVal) { this.val = initialVal; } int get() { int ans = this.val; this.val = this.val + 1; return ans; } } class ExamplesCounter { boolean testCounter(Tester t) { Counter c1 = new Counter(); Counter c2 = new Counter(2); // What should these tests be? return t.checkExpect(c1.get(), ???) // Test 1 && t.checkExpect(c2.get(), ???) // Test 2 && t.checkExpect(c1.get() == c1.get(), ???) // Test 3 && t.checkExpect(c2.get() == c1.get(), ???) // Test 4 && t.checkExpect(c2.get() == c1.get(), ???) // Test 5 && t.checkExpect(c1.get() == c1.get(), ???) // Test 6 && t.checkExpect(c2.get() == c1.get(), ???); // Test 7 } }
Do Now!
Fill in the ??? in the tests above.
We initialize c1 to a new counter, with a default initial value of 0.
We initialize c2 to a new counter with an initial value of 2.
In test 1, we get c1’s value, which is currently 0 —
and c1 updates its internal value to 1. In test 2, we get c2’s value, which is currently 2 —
and c2 updates its internal value to 3. In test 3, we get c1’s value, which is now 1, and then we immediately get it again —
and it’s now 2! So this equality test evaluates to false also — this function’s return value is not even equal to itself. In test 4, we get c2’s value, which is now 3, and then we get c1’s value, which is also now 3, so this equality test happens to be true.
Just to be sure, we try it again in test 5. This time, c2 and c1 both evaluate to 4, so the test still is true.
Something seems fishy, so we try tests 3 and 4 again. In test 6, we get c1’s value (which is 5), then get it again (now it’s 6), so this test evaluates to false.
Finally, in test 7, we try getting c2’s value (which is 5), and c1’s value (which is now 7), so this test, which was true twice, is now false.
17.8 Discussion
With all these potential hazards, what are mutation and side effects actually good for? Let’s not forget that it is essential for creating these cyclic data structures, and we’ll see that cyclic data comes up naturally over and over again.
Do Now!
If side effects are so convenient for interacting with the outside world, how does (big-bang ...) work? We certainly had no mutation in Racket, yet big-bang could draw to the screen and get input from the user.
A final word of caution: in the third example above, we finally managed to construct a cycle between our two objects. But we only accomplished it by mutating a field of one of those objects, from within the examples class. This is bad programming practice: surely we ought to be able to work with our data in the rest of our program, too, and not just in the somewhat constrained setting of our examples and tests? In the next lecture, we will see a much better approach for hiding the particular details of what assignment statements we need to execute, and we’ll see in more detail how to test code that deals with mutation.