Lecture 1: The Essence of Objects
1.1 Overview
Organization and manipulation of structured data
The object-oriented structure of code
Examples and Testing with JUnit
Visual representation: class diagrams
Do Now!
A Do Now! exercise is strongly recommended for you to do now, without reading ahead in the lecture notes. You should think through the question being asked, and then see if your understanding matches the concepts the lecture notes suggest.
Exercise
An Exercise is strongly recommended for you to do, now or when finished reading these notes. These exercises may suggest further applications of the concepts in the lecture notes.
1.2 Introduction
Computer code in any language consists of only two aspects: representation of relevant data and ways of manipulating that data. That is, nouns (things) and verbs (actions). And yet, there are multiple ways to organize data and their related functionality to solve a given problem. Programming languages often better support some types of design than others, but the design has a profound impact (often independent of the programming language used to implement it) on what can or cannot be done easily and elegantly. A simple example helps to illustrate this.
1.3 Example problem: Publications
Suppose we need to represent two kinds of publications: books and journal articles. Each type of publication has different relevant attributes, as follows:
A book has a title, an author, a publisher, the publisher’s location and the year of publication.
An article has a title, an author, the journal’s name, volume number, issue number and year of publication
Publications are cited at various places, in standard formats. We would like to produce citations in two well-known styles: MLA and APA. Each citation style prescribes a specific format to cite the details of a publication, and is different for each type of publication (i.e., there are two separate format specifications to cite a book and an article in APA style, and similarly for MLA).
1.3.1 Design 1: data and functions operating on data (externally)
One way to design the above would be to start by representing the data. We can represent the attributes of a book and article as two separate compound data types ("nouns"), informally as follows:
Book(title, author, publisher, location, year) Article(title, author, name, volume, issue, year)
We can then write a function (verb) that produces a formatted string for the MLA citation style for a publication given to it, and another function that does the same for the APA citation style.
string function citeMLA(publication) string function citeAPA(publication)
Example 1: C
One can define these structures in C as follows:
//A structure that defines a book struct Book { char *title; char *author; char *publisher; char *location; int year; }; // A structure that defines a journal article struct Article { char *title; char *author; char *journal_name; int volume; int issue; int year; };
We can create instances of these structures as follows:
Book rushdie; rushdie.title = strdup("Midnight's Children"); rushdie.author = strdup("Salman Rushdie"); rushdie.publisher = strdup("Jonathan Cape"); rushdie.location = strdup("London"); rushdie.year = 1980; Article turing; turing.title = strdup("Computing machinery and intelligence"); turing.author = strdup("A. M. Turing"); turing.journal_name = strdup("Mind"); turing.volume = 59; turing.issue = 236; turing.year = 1950;
Since we have defined books and articles as two unrelated structures, we have no choice but to write two functions each for MLA and APA styles:
char * citeMLABook(Book b) { char answer[100]; //approximating max length sprintf(answer,"%s. %s. %s: %s, %d.", b.title,b.author,b.publisher,b.location,b.year); return answer; } char * citeMLAArticle(Article a) { ... } char * citeAPABook(Book b) { ... } char * citeAPAArticle(Article b) { ... }
We can see that this implementation is quite clunky, partly due to the typed nature of C, and the fact that books and articles are represented as two unrelated structures. Books and articles can be combined into one type, by using C unions. This will allow for a different implementation (we can write one function each for APA and MLA respectively, and within them distinguish between books and articles).
Example 2: Python
In Python we can represent books and articles as dictionary records:
#create a basic dictionary for a publication def createPublication(type): """This function creates a dictionary that represents a generic publication. It has only one key called "type" """ publication = {'type':type} return publication #create a dictionary that represents a book def createBook(title,author,publisher,location,year): """This function takes in the various attributes for a book and returns a dictionary with keys title, author, publisher, location, year with their respective values equal to whatever was passed to this function. It also has a key called "type" that has the value "book" """ record = createPublication("book") record['title'] = title record['author'] = author record['publisher'] = publisher record['location'] = location record['year'] = year return record #create a dictionary that represents a journal article def createArticle(title,author,journalName,volume,issue,year): """This function takes in the various attributes for a journal article and returns a dictionary with keys title, author, journalName, volume, issue, year with their respective values equal to whatever was passed to this function. It also has a key called "type" that has the value "article" """ record = createPublication("article") record['title'] = title record['author'] = author record['journalName'] = journalName record['volume'] = volume record['issue'] = issue record['year'] = year return record
We can instantiate them as follows:
#create an instance that represents the famous book "Midnight's Children" rushdie = createBook("Midnight's Children","Salman Rushdie","Jonathan Cape","London",1980) #create an instance that represents a journal article by Turing turing = createArticle("Computing machinery and intelligence","A. M. Turing", "Mind", 59, 236, 1950)
We can then implement the APA and MLA citation as functions that take a publication as an argument.
#check if the given parameter is a valid book def isBook(pub): return (type(pub) is dict) and ('type' in pub) and (pub['type'] == "book") and ('title' in pub) and ('author' in pub) and ('publisher' in pub) and ('location' in pub) and ('year' in pub) #check if the given parameter is a valid journal article def isArticle(pub): return (type(pub) is dict) and ('type' in pub) and (pub['type'] == "article") and ('title' in pub) and ('author' in pub) and ('journalName' in pub) and ('volume' in pub) and ('issue' in pub) and ('year' in pub) #function that creates a formatted string for a publication that conforms to the APA style. This function determines what type of publication is passed, and acts accordingly def citeApa(pub): if (isBook(pub)): return "{} ({}). {}. {}: {}.".format(pub['author'],pub['year'],pub['title'],pub['location'] ,pub['publisher']) elif (isArticle(pub)): return "{} ({}). {}. {}, {}({}).".format(pub['author'], pub['year'], pub['title'], pub['journalName'], pub['volume'], pub['issue']) raise ValueError("Not a valid publication") #function that creates a formatted string for a publication that conforms to the MLA style. This function determines what type of publication is passed, and acts accordingly def citeMla(pub): if (isBook(pub)): return "{}. {}. {}: {}, {}.".format(pub['author'], pub['title'], pub['location'], pub['publisher'], pub['year']) elif (isArticle(pub)): return "{}. \"{}.\" {} {}.{} ({}).".format(pub['author'], pub['title'], pub['journalName'],pub['volume'], pub['issue'], pub['year']) raise ValueError("Not a valid publication")
We write helper functions that verify whether the publication passed to this function is a book or an article (that it is a dictionary and contains the expected keys).
The format function substitutes the placeholders {} with actual attributes of the publication object (this is similar to the printf function in C or Java)
We can use these functions as follows: calling citeApa(rushdie) produces the string "Salman Rushdie (1980). Midnight’s Children. London: Jonathan Cape.", and calling citeMla(turing) produces the string "A. M. Turing. "Computing machinery and intelligence." Mind 59.236 (1950)."
Do Now!
Is this Python design "better" or worse than the C design? How so?
1.3.2 Extending the design
Consider the situation of supporting an additional type of publication: web pages. A web page is represented using its title, URL, and the date of download. How can this type of publication be supported in this type of design?
We can represent a web page using another compound data type.
Webpage(title, URL, date)
In Python, this would translate to
#create a dictionary that represents a webpage def createWebpage(title,url,date): """This function takes in the various attributes for a web page and returns a dictionary with keys title, url, date with their respective values equal to whatever was passed to this function. It also has a key called "type" that has the value "webpage" """ record = createPublication("webpage") record['title'] = title record['url'] = url record['date'] = date return record #create an instance that represents a web page ccis = createWebpage("CCIS at Northeastern University", "https://www.ccis.northeastern.edu/","10th August 2018")
But this is not enough, because our functions to generate APA and MLA styles do not support web pages. The Python implementation would be changed as follows:
#check if the given parameter is a valid web page def isWebpage(pub): return (type(pub) is dict) and ('type' in pub) and (pub['type'] == "webpage") and ('title' in pub) and ('url' in pub) and ('date' in pub) #function that creates a formatted string for a publication that conforms to the APA style. This function determines what type of publication is passed, and acts accordingly def citeApa(pub): if (isBook(pub)): ... elif (isArticle(pub)): ... elif (isWebpage(pub)): return "{}. Retrieved {}, from {}.".format(pub['title'], pub['date'], pub['url']) raise ValueError("Not a valid publication") #function that creates a formatted string for a publication that conforms to the MLA style. This function determines what type of publication is passed, and acts accordingly def citeMla(pub): if (isBook(pub)): ... elif (isArticle(pub)): ... elif (isWebpage(pub)): return "\"{}.\" Web. {} <{}>.".format(pub['title'], pub['date'], pub['url']) raise ValueError("Not a valid publication")
We can continue using them as before: calling citeMla(ccis) generates the string "CCIS at Northeastern University." Web. 10th August 2018 <https://www.ccis.northeastern.edu/>.
We see that although possible, supporting an additional type of publication creates substantial changes to existing code (a new case inside existing functions). It is reasonable to expect newer forms of data to be supported as an application evolves, so making extensive changes each time seems like a major limitation of this design.
1.3.3 Design 2: Combining data and functions
An alternative organization of data and functions is to pair the data with all the relevant manipulations to it. In other words, we assign the responsibility to a publication to cite itself.
One way to do this would be to add two functions, citeApa() and citeMla() to each type of publication.
class Book: # a structure that represents a book """This class represents a book-type publication. A book has the following attributes: title, author, publisher, location and year.""" def __init__(self,title,author,publisher,location,year): #to instantiate books self.title = title self.author = author self.publisher = publisher self.location = location self.year = year #function for a book to cite itself in APA format def citeApa(self): return "{} ({}). {}. {}: {}.".format(self.author,self.year,self.title,self.location, self.publisher) #function for a book to cite itself in MLA format def citeMla(self): return "{}. {}. {}: {}, {}.".format(self.author,self.title,self.location,self.publisher, self.year) class Article: # a structure that represents a journal article """This class represents a journal article-type publication. A journal article has the following attributes: title, author, name of journal, volume number, issue number and year. """ def __init__(self,title,author,journalName,volume,issue,year): self.title = title self.author = author self.journalName = journalName self.volume = volume self.issue = issue self.year = year #function for an article to cite itself in APA format def citeApa(self): return "{} ({}). {}. {}, {}({}).".format(self.author,self.year,self.title,self.journalName, self.volume,self.issue) #function for an article to cite itself in MLA format def citeMla(self): return "{}. \"{}.\" {} {}.{} ({}).".format(self.author,self.title,self.journalName,self.volume, self.issue,self.year)
We can creates instances of these classes as follows:
#create an instance that represents the famous book "Midnight's Children" rushdie = Book("Midnight's Children","Salman Rushdie", "Jonathan Cape","London",1980) #create an instance that represents a journal article by Turing turing = Article("Computing machinery and intelligence","A. M. Turing", "Mind", 59, 236, 1950)
Now that each instance contains the citeApa and citeMla functions inside it, we call rushdie.citeApa() to produce the APA-style citation of a book, and so on.
Note the different ways in which a citation is produced. In the first design citeApa(rushdie) can be described as "Function citeApa, take this book and cite it in APA style". In the second design rushdie.citeApa() can be described as "Book, cite yourself in APA style".
Both design achieve the same outcomes: representation of publications and producing their citations. But they model the problem in fundamentally different ways.
1.3.4 Extending the design: Take two
How does the second design fare in the same situation of supporting a new kind of publication: web pages? Since each type of publication is self-contained, we can add a Webpage class as before, and implement the citeApa() and citeMla() functions for it.
class Webpage: # a structure that represents a web page """This class represents a webpage-type publication. A web page has the following attributes: title, URL and the date of download. """ def __init__(self,title,url,date): self.title = title self.url = url self.date = date #function for a web page to cite itself in APA format def citeApa(self): return "{}. Retrieved {}, from {}.".format(self.title,self.date,self.url) #function for a web page to cite itself in MLA format def citeMla(self): return "\"{}.\" Web. {} <{}>.".format(self.title,self.date,self.url)
That’s it! No changes to any existing code is required.
1.3.5 Which design is "better"?
Design 1 (functions external to data) separates the data from functions that manipulate it. Such design is commonly found in implementations using functional programming. But it is not uncommon in implementations using procedural languages (such as C) as well. Design 2 is "classic object-oriented design." It embodies the basic OO principle of encapsulation: data and its functions are encapsulated (in a class). The resulting objects are self-contained and capable: they represent data (as attributes, often referred to as state) and offer relevant operations (as functions, often referred to as behavior).
Can we tweak Design 2 so that it is easier to add a new citation style? Similarly can we tweak Design 1 so that it is easier to add a new kind of publication? The answers to both are yes, and we will see later in the course how we can achieve this.
Which design is better? In the above situation, we see that incorporating a new publication required isolated changes in Design 2. It is tempting to conclude that Design 2 is superior to Design 1. However consider another situation: adding a new citation style. In Design 1, this would require writing another cite method for the new citation style that supports all existing type of publications. No changes to existing code would be required. However in Design 2, one would have to add a new method to each existing publication class. In real-life situations, new features are often demanded after a design and implementation are complete. Moreover, they cannot always be anticipated accurately. Thus which design is "better" depends on which future changes it is able to support more easily. There is no one universally superior design paradigm. Consequently one must be aware of and open to using different design paradigms as suitable, possibly within the same application. Most modern programming languages support multiple design paradigms to varying degrees, as illustrated by the Python code. So the choice of design paradigm is not solely determined by the language chosen for implementation. This fact makes program design both complicated and interesting.
1.4 From Python to Java
1.4.1 Interfaces
We start with the notion of a publication. What is common to all the above publications? The commonalities are in their behavior: each of them is able to cite itself in APA and MLA styles. We define a Java interface to represent this:
/** * Specifies operations for formatting citations from bibliographic data. */ public interface Publication { /** * Formats a citation in APA style. * * @return the formatted citation */ String citeApa(); /** * Formats a citation in MLA style. * * @return the formatted citation */ String citeMla(); }
A Java interface is a list of method signatures (Java calls functions "methods"). It is a specification: objects that implement this interface are mandated to implement these methods, such that any code in possession of such an object anywhere can expect to call these methods using the object.
It is good practice to separate the method signatures from method bodies (interface vs. implementation). Java allows this using interfaces and classes. C++ allows this with header (*.h) and source (*.cpp) files. It is good practice to ensure that every method of a class that can be called from outside it originates from some interface that is implemented by the class.
Interfaces in Java allow us to write a specification of what behavior must be supported, without the details of how this behavior is implemented. An equivalent feature in Python and many other languages is an abstract class. However, being a strongly-typed language, Java mandates that we specify the type of arguments and return data. This helps it to (amongst other things) catch more errors at compile time.
We now define books and articles as classes implementing this interface.
/** * The {@code Book} class represents bibliographic information for * books. */ public class Book implements Publication { private final String title, author, publisher, location; private final int year; }
The keyword implements relates the Book class to the Publication interface. This means the Book class must provide implementations of all methods declared in the Publication interface: the code will not compile without it! This adds a check at compile time, and mandates the Book class to have citeApa and citeMla methods. That is, it ensures that the Book class fulfills all expectations of a publication.
The rest of the class can be completed as follows:
/** Constructs a {@code Book} object. * * @param title the title of the book * @param author the author of the book * @param publisher the publisher of the book * @param location the location of the publisher * @param year the year of publication */ public Book(String title, String author, String publisher, String location, int year) { this.title = title; this.author = author; this.publisher = publisher; this.location = location; this.year = year; } @Override public String citeApa() { return author + " (" + year + "). " + title + ". " + location + ": " + publisher + "."; } @Override public String citeMla() { return author + ". " + title + ". " + location + ": " + publisher + ", " + year + "."; }
Explicitly initializing all fields in the constructor provides a single place to look for default values, and is good practice.
The first method is the constructor. It is the counterpart of the __init__ function in the Python code above, except a Java constructor shares its name with its class and does not explicitly take self as an argument. References to the attributes use the keyword this similar to how the Python implementation used self. The job of a constructor is to initialize all the attributes (called instance variables or fields) of this object.
Note that we use a variable of type Publication. We could have used Book instead, but there are advantages to using interface types for variables. Such a variable can contain an object of any class that implements that interface. We can use the rushdie variable to call only methods that are declared in the Publication interface. Thus any such use of this variable, anywhere in the code, depends only on the general interface and not a specific implementation (because it does not need to).
We can create a book object as follows:
Publication rushdie = new Book("Midnight's Children", "Salman Rushdie", "Jonathan Cape", "London", 1980);
The Article and Webpage classes can be implemented similarly.
1.5 Testing
How do we know that our implementation works? One could write a program that creates several instances of this class and then have it print useful information. Running this program and reading what is printed could be taken as evidence that this code works. In this case as we want to test a class (a unit), these tests are unit tests.
Such manual testing, although conceptually correct, becomes cumbersome when the number of classes and their complexity increases. Printed information becomes verbose quickly and one can easily make a mistake in reading. If the program is interactive, then manual testing includes manually typing inputs which is error-prone and slow. Good tests are automatic so that they are easy to run multiple times, and verification of correctness is part of the test instead of relying on manual verification. Such automation requires methodical test design and a framework to facilitate the automation. We use the JUnit framework to write tests in Java.
1.5.1 How to write tests
Writing a correct test involves answering two questions: what are we testing (testing objective) and how will we test it. In this case our test objective is to verify that the citeApa method works correctly in the Book class. To write this test, we
Create a Book instance with a known title, author, publisher, location and year.
Create a string that represents the expected citation of this book.
Call the citeApa method of the instance and compare what it returns with the expected string. If they match, declare the test as a success.
Note that this test, if run, directly reports "success" or "failure." No manual reading or verification is necessary.
We create a separate tester class and write this test as follows:
import org.junit.Test; import static org.junit.Assert.assertEquals; public class PublicationTest { private Publication rushdie; @Before public void setup() { rushdie = new Book("Midnight's Children", "Salman Rushdie", "Jonathan Cape", "London", 1980); } @Test public void testCiteApa() { String expectedOutput = "Salman Rushdie (1980). Midnight's Children. London: Jonathan Cape."; assertEquals(expectedOutput,rushdie.citeApa()); } }
We import the relevant classes and methods from the junit package. This allows us to refer to the methods and classes directly, such as assertEquals(...) instead of org.junit.Assert.assertEquals(...).
We declare fields of this class to store various objects that we will test.
We initialize these objects in the setup method. In this case we instantiate the Book object. We annotate this method with the @Before tag.
Each test is written as a public method in this class. It is annotated with @Test above it to distinguish it from other helper methods. A test method does not return anything, and does not take any arguments.
We create an expected citation. This is the ground truth, and our actual output must match this.
We use an assertEquals method to verify that the expected and actual outputs match. If they are not equal, the test ends and is declared a failure. If they are equal the testCiteApa method continues, ends and the test passes.
The @Before methods are executed before each @Test method, ensuring that objects are set up before they are tested.
The equivalent tests in Python (using the unittest module) are included at the end of both versions of the Python code attached to these notes.
Note that the testBookCiteApa method has a singular objective: test if the citeApa method in the Book class works correctly. This is important for the test to be small and simple. Avoid writing test methods that test for many things at once. Note that a test method may have one objective but use multiple assert methods to test its objective.
1.6 Miscellaneous aspects of Java code
1.6.1 Naming Conventions
It is standard practice for all class and interface names in Java to start with an upper case letter. Primitive types (int, float, etc.) begin with a lower case letter. Method names should begin with a lower case letter, and in case the name is multi-worded, it should be in camel case (nameOfAMethod instead of name_of_a_method).
Normally a source code file (*.java) has a single class or interface, and the name of the class matches exactly the name of the file.
1.6.2 Commenting
It is a really good idea to write comments explaining your design and purpose. This allows you and anybody else using your code to understand what it is doing, how to use it and how it has been designed and implemented.
Above the class definition, explain in 1-2 sentences what this class represents. This explanation should include both semantic details (e.g. what the class represents from the problem statement) and technical details (e.g. useful for a fellow designer/programmer). Also mention any details that a user of this class may need to know to use it appropriately.
Before each method (including the constructor and getters) write a brief explanation of what this method accomplishes (purpose statement). Also include a list of any arguments along with what they represent, and what the method returns (the contract).
Within the method body mention any details that you think are relevant to what that method is doing.
If a method declared in the interface is implemented in the class and the documentation in the interface accurately and completely describes its implementation, there is no need to rewrite the documentation in the class.
A good rule of thumb is to assume that the audience for your comments is not you, but other designers/programmers who will use your code. If the language is such that only you can understand it fully (because you implemented it) revise the comments.
1.6.3 Documentation
Documentation is critical to making code understandable, readable and usable. As programmers we are both producers and consumers of documentation. Any good code base comes with copious and helpful documentation. For example "official" Java documentation is available online.
Java comes with an inbuilt tool that helps you generate such html web pages from documentation written in your Java code. This tool is called Javadoc. This tool "reads" your comments and converts them into html. You can use specific formatting commands within your comments to facilitate generating pleasing documentation.
All comments beginning with /** are Javadoc-style comments. These are the only comments that the Javadoc tool will read.
Every line of such comments begins with * followed by a space.
@param denotes information about a method argument.
@return denotes information about whatever the method returns.
Note again that since everything is within Java comments, this documentation does not affect your Java source code in any way. We will see how to use Javadoc to produce html documentation from this file in the lab.
We strongly recommend developing the habit of writing comments in Javadoc style to facilitate creating neat documentation.
Again, Javadoc-style documentation is "public-facing." That is, this documentation is meant for others, not just yourself.
Expressing design in UML
It is important to be able to express design in the standard UML notation. The complete design of the Publication interface and the Book, Article and Webpage classes can be expressed in a class diagram as follows:
The various notations are:
Each class or interface is represented by a box. The top section contains its name.
+ denotes public and - denotes private.
A dotted arrow with a hollow triangular arrowhead denotes "implements" and points towards the interface.