Lecture 5: Java Safari
1 What the heck does static mean?
1.1 What is it?
1.2 How do we use it?
1.2.1 Static fields
1.2.2 Static methods
1.2.3 Static classes
1.3 When should we use static?
2 Arrays
2.1 What are they?
2.2 How can we use them?
2.2.1 Varargs
2.2.2 Array gotchas
2.3 When should we use them?
3 Characters
3.1 What are they?
3.2 How can we use them?
3.3 When should we use them?
4 Primitive types versus reference types
4.1 What are they?
4.1.1 All the primitive types
4.1.2 Boxed types
4.2 How can we use them?
5 Equality:   physical and logical
5.1 Equality and autoboxing
5.2 Equality and hashing
6 Unlearning Fundies 2
6.1 null is bad
6.2 == is bad
6.3 instanceof and casts are bad
7 The switch statement
8 Enumerations
9 Exceptions
10 Generics, or raw types considered harmful
6.5

Lecture 5: Java Safari

There are a number of Java features we’ll use in this course that you likely haven’t seen or fully understood before. In this lecture we’ll introduce several of them.

1 What the heck does static mean?

1.1 What is it?

In programming generally, static describes things that happen or are determined at compile time, and dynamic describes things that happen or are determined at run time. In object-oriented programming, and in Java in particular, static means that some member—a field, method, or nested class—is part of its class, whereas a non-static member is associated with every object of the class. (Note that a class and its static members is created at compile time, whereas objects are created dynamically.)

1.2 How do we use it?

Field, class, and method members of classes can be declared static. (Enumerations and interfaces are always static and need not be declared as such.) In each case the idea of being associated with the classes versus its instances must be interpreted slightly differently.

1.2.1 Static fields

An instance (non-static) field is a separate slot in each object of a class, whereas a static field has one slot for the whole class. For example, consider a class with one of each kind of field:

class Widget {
  int widgetId;
  static int widgetIdCounter = 3;
}

For each Widget w we create, there’s a separate w.widgetId variable. Whereas there is only one Widget.widgetIdCounter variable, which we refer to as a member of the class Widget rather than a particular object w:

assertEquals( 3, Widget.widgetIdCounter );

Widget one = new Widget();
one.widgetId = 1;

Widget two = new Widget();
two.widgetId = 2;

assertEquals( 1, one.widgetId );
assertEquals( 2, two.widgetId );
assertEquals( 3, Widget.widgetIdCounter );

one.widgetId = 11;
two.widgetId = 12;
Widget.widgetIdCounter = 13;

assertEquals( 11, one.widgetId );
assertEquals( 12, two.widgetId );
assertEquals( 13, Widget.widgetIdCounter );

In a sense, static fields are Java’s version of global variables, and like globals, they should be used sparingly. Public, static, non-final fields warrant extra suspicion.

1.2.2 Static methods

Unlike dynamic methods, static methods do not require an instance of a class to operate on; as with static fields, the method is treated as a member of the class. For example:

class Widget {
  public void getWidgetId() { return widgetId; }
  int widgetId;

  public static void resetWidgetIdCounter() { widgetIdCounter = 0; }
  static int widgetIdCounter = 3;
}

Note that it would be a static (compile-time) error for resetWidgetIdCounter to refer to widgetId, because widgetId is a non-static field, and static methods don’t operate on an individual instance. In order to call non-static method getWidgetId(), we need an instance of the widget class to call it on, because a non-static method can use the non-static fields of the object:

assertEquals( 11, one.getWidgetId() );
assertEquals( 12, two.getWidgetId() );

Bus static methods are called on the class rather than an instance, and thus don’t have an instance (this) to work on:

assertEquals( 13, Widget.getWidgetId() );

1.2.3 Static classes

When a nested class is static, it behaves like an ordinary class, just nested in its enclosing classes namespace.1We defer discussion of non-static nested classes for now. If Nested is a static member of Outer, then we refer to it as Outer.Nested when writing code that’s outside of Outer. Furthermore, nested classes share the same private scope with their enclosing class. Thus, Outer can see all of Outer.Nested’s private members, and Outer.Nested can see all of Outer’s private members.

As an example, rather than use a static field widgetIdCounter to generate widget IDs, we can manage the counter in a factory object:

class Widget {
  private Widget(...) { ... }

  public static class Factory {
    public Factory() { ... }

    public Widget create(...) {
      ...
      ++widgetIdCounter;
      ...
    }
    private int widgetIdCounter;
  }
}

In order to create Widgets, we first must create a Widget.Factory, which instantiates its own counter. Then we can use the factory to create Widget objects numbered using that factory’s counter:

Widget.Factory factObj = Widget.factory();

Widget w1 = factObj.create();
Widget w2 = factObj.create();

An advantage to this approach is that now we can create multiple independent factories, perhaps to work with multiple concurrent versions of some client system. Of course, if we actually want only one global numbering, then we can apply the singleton pattern.

1.3 When should we use static?

Use a static field when you want on variable for the whole class rather than one per object. Sometimes a class with do this privately to cache some kind of information or to keep track of some information about its instances, but for the most part static fields are rare except for constants. A constant should be a public static final field whose contents are immutable, and its name should be in all caps.

Use a static method when you want to associate some method with a class that doesn’t depend on having an instance. The most common case for static methods are static factory methods, which produce objects of a class (and don’t require that we already have an object to do so). It’s also common to factor out implementation functionality into private static helper methods.

Use a static class when you want a helper class that’s strongly associated with the enclosing class, especially when the helper doesn’t make sense on its own. Nesting a helper class also allows the outer class to see its private members and vice versa, which can be helpful when the two classes are tightly coupled. For example, it makes sense to nest a class implementing an iterator for a collection class inside the collection class, because the iterator probably doesn’t make sense without the collection class, and it is often useful for the iterator to be able to see into the collection objects.

2 Arrays

2.1 What are they?

In Java, an array of type τ[] (where τ is any Java type) is a mutable, fixed-length, constant-time–indexed sequence of values of type τ. Let’s unpack that, from back to front:

2.2 How can we use them?

To create a new array, use the special array form of the new operator, which comes in two main variants:

As a special case, when used to initialize a variable as part of its declaration everything before the curly braces can be omitted:

int[] intArray = {2, 4, 6, 8};

The main operation on arrays is indexing, for both observing and updating the array. It’s also common to find out the length of an array using its length field.

assertEquals(4, intArray[1]);
assertEquals(8, intArray[3]);

intArray[3] = 17;

assertEquals(17, intArray[3]);

assertEquals(4, intArray.length);

The class Arrays (doc) provides a large number of static methods for working with arrays, such as searching, sorting, filling, and copying. It also provides a method for turning an array of any type T into a List (doc) of T: <T> List<T> asList(T... a).3Note that List is an interface, and the venerable ArrayList is one class that implements it. Ah, but what does that T... mean? It’s a special form of array. Read on...

2.2.1 Varargs

Suppose we want a method that takes an arbitrary number of strings. Of course arrays are good for this, since they represent arbitrary length sequences:

void setPlayers(String[] newPlayers);

However, passing an array is ugly and inconvenient when in the common case we don’t have an array already and want to list the elements directly in the method call:

setPlayers(new String[] {"Crosby", "Stills", "Nash", "Young"});

Instead, Java provides a way to declare a method that takes a variable number of arguments:

void setPlayers(String... newPlayers);

The ... parameter always comes last in the argument list, though there may be a number of ordinary parameters that come before it.

Now when we call the method, it looks like it takes a variable number of string arguments:

setPlayers("Crosby", "Stills", "Nash", "Young");

Within the method, parameter newPlayers is an array just like before:

void setPlayers(String... newPlayers) {
    for (String player : newPlayers) {
        addToGame(player);
    }
}

If we already have an array ready to go, we can pass it directly:

setPlayers(someArrayOfStrings);

Whether we pass an existing array or use the varargs method call syntax, the callee receives an array.

2.2.2 Array gotchas

An array is represented as a reference to a chunk of memory, and the value of the array, so far as Java is concerned, is the reference itself, not the chunk of memory. This means that re-assigning or passing an array results in aliasing, having more than one name for the same thing:

int[] anotherArray = intArray;
anotherArray[0] = -9;
assertEquals(-9, intArray[0]);

Not only does == for arrays compare references rather than contents, but equals(Object) does as well. This means, for example, that this JUnit test will fail:

assertEquals(new int[] {3, 6}, new int[] {3, 6}); // fails!

Yes, the arrays have the same contents, but not the same physical identity. To test for equality on the contents of two arrays, use assertArrayEquals:

assertArrayEquals(new int[] {3, 6}, new int[] {3, 6});

In order to compare arrays by their values yourself, you should use the Arrays.equals(Object[], Object[]) method, which compares the elements of the arrays using their equals(Object) methods. Of course, if the contents of the arrays you are comparing are in turn arrays, those will each be compared by physical identity. If you have several layers of nested arrays and want to compare by the contents all the way down, use Arrays.deepEquals(Object[], Object[]). (Note that these methods can take Object[]s because in Java every array type is a subtype of Object[]. This design choice is actually a big problem, but we won’t get into it right now.)

2.3 When should we use them?

Mainly, you will use arrays because various existing APIs require them.

You can use an array when you want a sequence of values that you can look up and update efficiently, addressed by their positions in the sequence. However, if the length of the sequence needs to change, you probably want to use a List instead (e.g., ArrayList or LinkedList). You can simulate a variable-length array by allocating new arrays and copying the elements over as needed, and this is in fact how ArrayList works (though there are some subtleties to avoiding a large number of inefficient copies).

Since Lists can do everything that arrays can, why would we ever use an array? One reason would be if you know the length that you need up front, and you want to guarantee that you can never accidentally change it. Another reason is for efficiency, since higher-level sequences such as ArrayList build on top of arrays with additional checking and indirection. Usually this small, constant-factor change in resource efficiency isn’t worth worrying about, but if you are implementing a new data structure on top of a sequence or sequences, you may want to use an array directly to avoid an additional layer that every client will then have to pay for.

3 Characters

3.1 What are they?

The character type, char, represents single graphemes or symbols for writing text. This includes letters ('a', 'b', 'A', 'B', 'α', 'β', etc.), digits ('0', '1', '١', '٢', etc.), punctuation ('.', '-', '«', etc.), and whitespace (' ', '\n', '\t', '\v', etc.).

3.2 How can we use them?

The Character class provides, among other things, static methods for working with characters, such as tests for character categories:

assertTrue(Character.isLetter('a'));
assertTrue(Character.isLetter('β'));
assertFalse(Character.isLetter('8'));

assertFalse(Character.isLowercase('Z'));
assertTrue(Character.isLowercase('z'));

Strings are sequences of characters, and the Java String class provides methods for working with them as such. The method charAt(int) lets us treat a string as an (immutable) array by allowing us to look up characters by position:

assertEquals('a' , "abcde".charAt(0));
assertEquals('c' , "abcde".charAt(2));

Additionally, we can search for characters in strings4There’s a twist, though: some methods, such as String.indexOf(int), take “characters” represented as type int rather than type char, because it turns out that the Java char type doesn’t have enough bits to represent every Unicode code point. However, because chars are implicitly converted to ints, you can use a character where an integer is expected with no trouble. (In the other direction it requires a cast, which is lossy because some int values don’t fit in char), and convert between strings and arrays of characters:

assertEquals(0, "abcde".indexOf('a'));
assertEquals(2, "abcde".indexOf('c'));
assertEquals(-1, "abcde".indexOf('f'));

char[] abc = {'a', 'b', 'c'};
assertArrayEquals(abc, "abc".toCharArray());
assertEquals("abc", String.valueOf(abc));

3.3 When should we use them?

Characters are really for only two things:

4 Primitive types versus reference types

4.1 What are they?

Java makes a distinction between primitive types and reference (or pointer) types. Understanding this distinction will make a variety of other features and quirks of Java make sense. Let’s consider what Java variables really mean.5Not just variables, because everything below applies to method parameters and results as well. In this diagram, there are six variables, each of which is represented as a box in the left column that contains the variable’s value:

The first thing to notice about the six variables is that none has a compound value, composed of multiple components—each contains a single, simple value, which may be an immediate number, a reference to something else, or null.

Variables a and b contain values of primitive (meaning “built-in”) numeric types int (four bytes) and long (eight bytes); for each of these the value is directly in the variable, with no references to anything else. The prefix 0x on a number is the syntax in many programming languages, including Java, for hexadecimal integer literals; writing the numbers this way makes it clear how many bits each is represented with. Every bit in each of these types (and all primitive types) is part of the representation of the number, and there’s no room for a distinguished null bit pattern; hence, primitive types do not include null as a value.

The other four variables have reference types, of which there are two subdivisions, object types and array types. Variables c and d both have the object type Posn. (Assume a class with two int fields x and y.) Java variables cannot hold objects directly—objects are compound data structures—so instead they must hold a reference6What is a reference? Most likely it’s just the memory address of the object, like a pointer in C or C++—though there are optimizations that can make the situation less simple. to an object. Variable c contains a reference to a Posn object, so that, for example, c.x == 3. Note that the fields of an object are a kind of variable, which means that they, too, can hold only primitives or references, not actual objects. Variable d currently does not hold a reference, so its value is null, which is a distinguished value that indicates the absence of a reference.

Variables e and f have array types, which means that each hold a reference to an array (or null), not the array itself. Whereas e refers to an array of primitive int values, f refers to an array of Posn object references. Three of the four elements contain references to Posn objects, and the fourth contains null. Note that while there are only two Posn objects in the diagram, there are four Posn object references—multiple references can point to the same object. (This is aliasing as discussed in the array section above.)

The reasons why objects and arrays need to be accessed via references is that their types do not determine how much space they take up. In particular, an array type int[] does not determine the length of the array, and object type Posn could include subclasses with additional fields.

4.1.1 All the primitive types

In total, Java has eight primitive types that can be the immediate values of variables:

Type

  

Size

  

Description

  

Range

boolean

  

1 bit

  

truth value

  

true or false

byte

  

8 bits

  

signed integer

  

\(-2^7\) to \(2^7 - 1\)

short

  

16-bits

  

signed integer

  

\(-2^{15}\) to \(2^{15} - 1\)

char

  

16-bits

  

Unicode character

  

\(0\) to \(2^{16} - 1\)

int

  

32-bits

  

signed integer

  

\(-2^{31}\) to \(2^{31} - 1\)

float

  

32-bits

  

floating point number

  

complicated, see here

long

  

64-bits

  

signed integer

  

\(-2^{63}\) to \(2^{63} - 1\)

double

  

64-bits

  

floating point number

  

complicated, see here

These have different sizes, which means that variables have different sizes. But each one has a known size, and each is a single value rather than some combination of values.

4.1.2 Boxed types

Every primitive type in Java has a corresponding object type: int has Integer (doc), char has Character (doc), double has Double (doc), and so on. (Only Integer and Character have names that differ in more than their capitalization from their primitive counterparts; the other six are the same except for the capitalized first letter.)

In each case, the uppercase object type is a class with a field containing the corresponding lowercase primitive type. For example, let’s compare the short primitive type with the Short object type:

Variable g of type short contains a short value directly. Compare this to variable h of type Short, which contains a reference to a Short object that has a field containing its value. Variable i also has type Short, but instead of containing a reference to an object, it contains null. Note that short cannot be null but Short can, double cannot be null but Double can, and so on for the other six boxed types.

Variable j contains a reference to any array of primitive short values, which are stored directly in the array. Variable k is reference to an array of Shorts, that is, an array of object references.

Note that these types are boxed because each is a reference to a box (object) containing the primitive type. For the most part, you shouldn’t have to convert between primitive types and their object versions, because Java automatically inserts box and unbox operations where needed.

4.2 How can we use them?

You have been, and for the most part you know how. But there’s one thing worth knowing about that you may not: How to perform Object operations such as equality and hashing.

5 Equality: physical and logical

Equality seems like such a simple proposition: two things are either “the same” or they aren’t. Except of course the preceding sentence has two key terms left undefined: “things” and “sameness”. Now that we have a clearer picture of the difference between reference types and value types, there are clearly two different kinds of things: things that contain an arrow, or things that don’t. We therefore have to refine our notion of sameness: For value types, there’s really only one thing to be done: compare the values themselves and see if they are equal. But for references:

These last two are known as “physical” and “logical” equality, respectively. But this is a very operational definition: instead of saying what we want the sameness operation to mean, we are saying how to compute the result.

An alternate way of phrasing these two would be to say that physical equality is when two values are aliases of each other—that modifying one would modify the other. Conversely, logical equality simply says that two values have the same shape, with the same values in corresponding places.

In Fundies 2, we called these notions “intensional” and “extensional” equality. They mean the same thing as “physical” and “logical” equality, except we didn’t need to know exactly how references worked in order to define them!

To make equality convenient to use, Java defines the boolean equals(Object other) method, on the Object class, so that any two objects can be compared. By default, this operation is defined as

boolean equals(Object that) { return this == that; }

so that the default operation is simply intensional, physical equality. However we are free to override this method on our own classes, so that instances of our classes can support logical equality instead. Typically, if we design a class such that all of its fields are private final, then that class is probably meant to behave as a value type, rather than a reference type, despite the fact that all object types are reference types. 7Be careful! If any fields of that class, despite being final, are reference types that transitively contain values that aren’t final, then perhaps this type isn’t actually a value type. Or, perhaps the design of that non-final class should be revisited... In these cases it’s a good idea to override equals.

Logical equality can be more sophisticated than merely recursively comparing all fields for equality. Consider a Fraction class:

final class Fraction {
  private final int num, den; // represents the number (num/den)
  ...
  @Override
  public boolean equals(Object obj) {
    if (!(obj instanceof Fraction)) return false;
    Fraction that = (Fraction)obj;

    return this.num == that.num && this.den == that.den; // Oops!
  }
}

assertEquals(new Fraction(1, 2), new Fraction(1, 2)); // good...
assertEquals(new Fraction(1, 2), new Fraction(2, 4)); // Fails!

There are multiple ways of representing the “same” fraction, so we need a more general equivalence:

@Override
public boolean equals(Object obj) {
  if (!(obj instanceof Fraction)) return false;
  Fraction that = (Fraction)obj;

  return this.num * that.den == that.num * this.den;
}

Now both tests above pass.

5.1 Equality and autoboxing

Comparing an object to a primitive value seems like a silly thing to do: of course they can’t ever be equal, right? But in the presence of generics (see Generics below), Java will automatically box primitives into their boxed object forms. And at that point, looking for, say, an int in a List<Integer> makes a lot of sense. Accordingly, Java ensures that the equals method on box types coincides with the == operation on the primitive values: if any two values are ==, their boxed forms are equals():

long val_x1 = 7L;
long val_x2 = 7L;

Long box_x1 = 7L;
Long box_x2 = 7L;

assertTrue(val_x1 == val_x2); // primitive equality
assertTrue(box_x1.equals(box_x2)); // logical equality of boxes
assertTrue(box_x1.equals(val_x1)); // logical equality with auto-boxing

Unfortunately, the == operator will not perform the same way for box types as is does for primitives. Consider: there are a lot of 64-bit long numbers, so allocating an object for each one is exorbitantly expensive. So the following will occur:

Long x1 = 7L;
Long x2 = 7L;

Long y1 = 720_233_830_121_456L;
Long y2 = 720_233_830_121_456L;

assertTrue ( x1.equals(x2) ); // for small enough numbers, Java
assertTrue ( x1 == x2 ); // will ensure physical equality of boxed values

assertTrue ( y1.equals(y2) ); // but when they get large enough,
assertFalse( y1 == y2 ); // there is no such guarantee.
Similar caveats apply for other primitive types. This behavior is documented, though it might be confusing at first. Stick to equals() whenever possible, to avoid this confusion.

5.2 Equality and hashing

Equality is intended to mean that two variables “behave the same” in any scenario we care about. This implies that any equality implementation had better respect the following three rules:

(It should be apparent that == obeys these three rules too.)

Java includes one additional scenario in which we can observe objects: we can stick them inside hash tables, in which case Java relies on the int hashCode() method. Hashcodes have to obey the following consistency rules with respect to equality:

In other words, the point of a hashcode is to quickly decide when two objects are not equal. If that quick check can’t tell them apart, then the full equals() method is needed.

Always override hashCode when you override equals, or you’re practically guaranteed to violate these rules.

As corollary of these rules, the hashCode() method should be a function of only the same fields that are used by equals() or else it’s trivial to violate compatibility. (Consider an implementation of Posn whose equals() method only checked its x-coordinate, but whose hashCode() method used its y-coordinate as well.)

Fortunately, Java provides several static methods to make constructing hash codes simple. In particular, each of the box types provides a static hashCode method, as shown here for doubles:

double val_x = 314.1592;
Double box_x = 314.1592;

assertTrue(box_x.hashCode() == Double.hashCode(val_x));

This takes care of most individual fields or variables. Additionally, there is a utility class named Objects that includes a convenience method

int hash(Object... args):

final class Posn {
  private final int x, y;
  ...
  @Override
  public int hashCode() {
    return Objects.hash(x, y); // boxes x and y to Integers,
    // then gets their hashCode()s, and combines them into a single result
  }
}

The actual mathematics of good hash functions are an interesting sub-domain of algorithms; for our purposes, it’s enough to know Java has good built-in defaults that we can use as needed.

Of course, if our particular equality operation is more sophisticated than simply comparing fields, then this hashing approach won’t work. In particular, if we used the same approach for Fraction as we just used for Posn,

assertEquals(new Fraction(1, 2).hashCode(),
             new Fraction(2, 4).hashCode()); // Fails

We need a hashcode that respects our equality relation. (Try implementing this one!)

6 Unlearning Fundies 2

Fundies 2 laid out a set of guides for how to write Java code that was easy to test and relatively clear, and the rules were fairly easy to follow. However, a few of those rules were too simplistic.

6.1 null is bad

Not entirely true. More accurately, null is a pain to deal with. Unlike every other value that can be placed in a reference-typed variable, you can’t call methods on null values. Forgetting to check for the possibility of null leads to unintended NullPointerExceptions that crash your program at unanticipated times. Far cleaner to create a sentinel-value type (as we did with Empty lists) and use that instead.

It turns out that null values were a historical accident, because they were simply easy to implement. Their inventor, Sir Tony Hoare (a Turing-award winner, and creator of many ubiquitous algorithms) calls null his “billion-dollar mistake”.

One of the few good use cases for null is when intending to create cyclic data. Here, use null to indicate that the cycle hasn’t been formed; document well by what point the cycle should be formed, and check for it early and fail quickly if an unexpected null value appears.

6.2 == is bad

Not exactly. With our more sophisticated understanding of the distinction between value types and reference types, just remember that == compares the immediate contents of variables and so provides physical equality comparisons. Use this operator sparingly but appropriately.

6.3 instanceof and casts are bad

This one’s still true.

The only legitimate use for it is when implementing custom equals() overriden methods.

7 The switch statement

8 Enumerations

9 Exceptions

10 Generics, or raw types considered harmful

1We defer discussion of non-static nested classes for now.

2Well...this is a convenient approximation, really. The actual behavior for sufficiently long arrays is not so constant, and depends upon the actual architectural details of the computer.

3Note that List is an interface, and the venerable ArrayList is one class that implements it.

4There’s a twist, though: some methods, such as String.indexOf(int), take “characters” represented as type int rather than type char, because it turns out that the Java char type doesn’t have enough bits to represent every Unicode code point. However, because chars are implicitly converted to ints, you can use a character where an integer is expected with no trouble. (In the other direction it requires a cast, which is lossy because some int values don’t fit in char)

5Not just variables, because everything below applies to method parameters and results as well.

6What is a reference? Most likely it’s just the memory address of the object, like a pointer in C or C++—though there are optimizations that can make the situation less simple.

7Be careful! If any fields of that class, despite being final, are reference types that transitively contain values that aren’t final, then perhaps this type isn’t actually a value type. Or, perhaps the design of that non-final class should be revisited...