7.5

## Assignment 5: Beyond gOOD: A Simple Spreadsheet, Part 1

#### Due: Thu 10/31 at 8:59pm; self-evaluation due Fri 11/01 at 9:59pm

Starter files: code.zip

### 1Working in pairs

Note: This assignment is to be completed with a partner. You will sign up with your partner on the handin server from this assignment onwards. You will not be able to submit subsequent assignments until you are part of a team on the handin server.

Note: If you do not know (or remember) how to request teams, follow these instructions. Please request teams no later than Wednesday night Oct 23 at 9pm if you have not requested a teammate by then, we will randomly assign you one.

Although the expected work is not simply double of what it was in earlier assignments, the assignment behooves mutual cooperation and pair programming. Please make a plan, meet frequently and tackle the assignment in a systematic manner. Dividing the work and then meeting directly the day before it is due is a recipe for disaster. This assignment requires you to self-learn some things, so enforce each other’s efforts and learning.

#### 1.1Submission logistics

You will need to work with your partner to submit the self-eval. As with the code, one partner will submit a single self-eval for the team. However, you will likely need to sit with your partner or be on the phone as you should agree on all the responses. Recall that once you have opened the self-eval, you may not resubmit the code.

### 2Context

All of us have encountered spreadsheets before: grids of cells whose contents are either blank, a value, or a formula that evaluates some function over inputs from other cells. Fancier spreadsheets can include graphs, images, formatted text, etc. — but we don’t need to consider such details right now.

Spreadsheets are actually a fairly sophisticated functional programming environment: formulas can contain functions of a fixed number of arguments, or of a range of values; they can contain conditionals to select among options; by creating a bunch of similar, consecutive rows they can even simulate fixed-length iteration. The only thing they prohibit is cyclic reference: no cell can depend upon its own value as one of its inputs. Moreover, if we edit any of the cells, every cell that depends upon it will automatically get re-evaluated to incorporate the new input value. (This is a small example of what is known as dataflow programming, and it’s a very powerful and useful programming model.)

In the next few assignments, we will build an application that lets us load, edit and save simple spreadsheets. We may start from a blank file, or from some input (that may have been created by our application or by some other, unrelated one).

A (professional) spreadsheet contains one or more worksheets, each of which consists of a collection of cells. In this project, we will only need to support one worksheet per file, so “spreadsheet” and “worksheet” will be considered synonyms. Cell names are written as <column name><row index>, where column names follow the pattern

• Columns 1 through 26 get names A through Z

• Columns 27 through 52 get names AA through AZ

• Columns 53 through 78 get names BA through BZ ...

• Column 703 gets name AAA ...

Row indices are also numbered starting from 1...but fortunately, we just use numbers for that instead of letters. (In this naming scheme, there is no row or column with index 0.)

We have provided the Coord class for you to describe cell coordinates in a spreadsheet, if you wish. It also contains two static methods to convert column indices to names and vice versa.

#### 2.2Cell contents

• be blank

• contain a value

• contain a formula

If a cell contains a value, then at minimum your spreadsheets must support

• boolean values

• double values

• String values

A formula is one of

• a value

• a reference to a rectangular region of cells in the spreadsheet

• a function applied to one or more formulas as its arguments

Consider the following tiny spreadsheet, that takes four values (in cells A1 through D1), interprets them as two pairs of (x, y) coordinates, and computes the distance between them, placing the result into cell A3. It then determines whether that distance is less than 10, and places the result in B3.

We write the instructions down in plain English here; we’ll give a more concrete syntax for our program to use later.

Let A1 equal 3 Let B1 equal 4 Let C1 equal 9 Let D1 equal 12 Let A2 equal the squared difference of C1 and A1 Let B2 equal the squared difference of D1 and B1 Let A3 equal the square root of the sum of A2 and B2 Let B3 equal TRUE if A3 is less than 10, otherwise FALSE

#### 2.3Evaluating cells

Cells that contain values (as described above) just evaluate to themselves. As described above, a formula can take as inputs references to other cells (or groups of cells). Evaluating a formula therefore may require evaluating other formulas as well.

No formula is permitted to refer to itself, though, either directly or indirectly, since that would lead to an infinite regress. (One simple direct example would be “Let A1 equal A1 plus 1”. One simple indirect example would be “Let A1 equal B1, and let B1 equal A1 plus 1”.) Such an error can easily happen, so detecting and preventing it is important. Moreover, any formula that depends on one of these erroneous formulas must itself be in error.

### 3Beyond gOOD: A Simple Spreadsheet Application

We will build this application progressively using the classic Model-View-Controller architecture. In this assignment we will focus on the model of this application. There is almost no starter code for this assignment: you must use the above description and broad ideas to design the interface and implementation of the model. You are deliberately not being told about the fine details of what the controller and views will need or do. You are also not being told about how exactly the application will receive a description from the user (it would not be relevant to the model).

Here are some aspects to think about:

• What does the model represent?

• The model should be able to support at least the three types of values described above, though you might support additional ones.

• The model should be able to support references to cells and to finite rectangular regions of cells, though you might try to support additional features.

• The model should be able to support various formulas, although we’ve only seen sums, differences, products, square roots, and comparisons. Note that some formulas (like addition or multiplication) can take references of any size, while others (like subtraction or comparison) can only take references to single cells. You should consider carefully how to support and how to enforce this.

• The model should be able to support editing cells’ contents, regardless of a cell’s location or prior contents.

• Remember to think from not only the implementors’ perspective (the people that are implementing the model) but also the client’s perspective (the people or classes that are using the model). What operations might they reasonably want to perform? What observations might they reasonably want to make?

#### 3.1How is the spreadsheet seen?

Spreadsheets intrinsically can be viewed in several ways:

• As a graphical grid of cells, showing their evaluated values

• As a graphical grid of cells, showing their raw contents (e.g. the text of the formulas or values in each cell)

• As a text file describing their contents

• ...

Below we show a text file describing the spreadsheet example above, along with one additional formula (just for demonstration purposes). Each row consists of a cell name, a space, and then the contents of the cell as the remainder of the line. (Lines beginning with a # are comments.) Cells A1 through D1 contain values; cells A2 through A4 contain formulas.

# Creates the four coordinates A1 3 B1 4 C1 9 D1 12 # Computes delta-x^2 A2 =(PRODUCT (SUB C1 A1) (SUB C1 A1)) # Computes delta-y^2 B2 =(PRODUCT (SUB D1 B1) (SUB D1 B1)) # Computes the rest of the distance formula A3 =(SQRT (SUM A2:B2)) B3 =(< A3 10) # Computes the distance formula all in one step A4 =(SQRT (SUM (PRODUCT (SUB C1 A1) (SUB C1 A1)) (PRODUCT (SUB D1 B1) (SUB D1 B1))))

Formulas are written with a leading =, and are followed by an s-expression describing the formula. To review, an s-expression is one of:

• A number (written like a Java double)

• A string (written like a Java String, with quotes as usual)

• A boolean (written as either true or false)

• A symbol (written as any sequence of letters, digits or symbols, with no spaces)

• A list of s-expressions, surrounded by parentheses and separated by spaces

Any function names or cell references that appear in a formula will be parsed as symbols in the s-expression. A cell reference can be either a single cell name (e.g. A5), or two cell names separated by a colon (i.e. C42:D53) where the second cell’s column and row are no smaller than the first cell’s column and row.

We will provide you with a parser for s-expressions; you do not have to figure out how to parse them yourselves. The parser will take in a String containing the expression to be read (e.g. everything after the = sign) and return a value of type SExp corresponding to that expression (or will throw an IllegalArgumentException if the string could not be correctly parsed).

We will also provide you with a parser for the spreadsheet-as-text file format above, though that one will be slightly more involved. Because we do not know how you are choosing to implement spreadsheets, we will provide you with a class with the following signature:

public final class WorksheetReader {
}

This method is parameterized by whatever type you choose to implement your spreadsheets. It will read the contents of the provided Readable, and determine the contents of your spreadsheet. In order to actually build the spreadsheet, though, it relies on a builder that you must implement and supply. The WorksheetBuilder interface has only two methods on it:

public interface WorksheetBuilder<T> {
WorksheetBuilder<T> createCell(int col, int row, String contents);

T createWorksheet();
}

The first method takes in a column and row index (both counting from 1) and the raw contents of the cell. Your implementation of this method should determine what kind of data this cell should contain, and record this information in your model somehow.

The second method returns the worksheet you have filled in. If you need to do any additional processing to fully construct your model before returning it, do so here.

In design-pattern terms, WorksheetReader is a factory: if you provide it with a builder of worksheets, it can construct a model worksheet for you.

Hint: While your builder creates a worksheet whose contents have been filled in from somewhere, the resulting model should still be editable. Make sure you do not inadvertently build in assumptions about the size or contents of your model based on what the builder produces.

### 4What to do

1. Design a model to represent a spreadsheet. This may consist of one or more interfaces, abstract classes, concrete classes, enums, etc. Consider carefully what operations it should support, what invariants it assumes, etc. You may assume that the number of rows or columns in a spreadsheet can each be represented by a positive int, but you may not make any additional assumptions about them.

2. Document your model well. Be sure to document clearly what each type and method does, what purpose it serves and why it belongs in the model. If your model assumes any invariants, explain them clearly and concisely.

3. Implement your model. If some method of your model cannot be implemented because it requires details you think may be revealed in future assignments, you may leave the method body blank for now — but leave a comment inside the method body explaining why it’s empty and what details you’re waiting for. If your implementation makes additional assumptions about invariants (beyond those asserted by the interface), document them.

4. Implement at least four functions:

• A SUM function, that adds up all the values of all of its arguments. Its arguments could be references of arbitrary size, and the sum should cover all the values in the region being referenced. Any arguments whose contents are blank or not numeric should be ignored. If there are no numeric arguments at all, the default value is zero.

• A PRODUCT function, that multiplies all the values of all of its arguments. Its arguments could be references of arbitrary size. Any arguments whose contents are blank or not numeric should be ignored. If there are no numeric arguments at all, the default value is zero. (Yes, zero: I don’t know why professional spreadsheet programs chose this value instead of one, but they did.)

• A < function, that takes exactly two values and returns whether the first is less than the second. If either value is missing or is not a number, the function should error.

• Another function of your choosing, whose return type is a string.

The names given above can be used in formulas, like =(< (SUM A1:A4) (PRODUCT B2:C5 D8)). We will use the three specific functions above to test your implementation, but you are welcome to implement additional functions.

5. Ensure that (direct or indirect) cyclic references between formulas are detected and prohibited in your model implementation. Hint: Doing this well is trickier than it may seem at first.

6. Ensure that you can evaluate cells. Hint: Doing this well is trickier than it may seem at first. We’re not extremely concerned with efficiency, but your code should not be egregiously slow.

8. Create a class edu.cs3500.spreadsheets.BeyondGood with a main method for your program. Your main method should expect a command-line of the following form:

-in some-filename -eval some-cellname

For example, one valid command line would be -in posn-distances.gOOD -eval A3.

The first two arguments specify a filename to read in. Your program must read in the file and construct a model from it. The second two arguments specify a cell whose contents you want to evaluate.

• If a cell has a problem, then print out a one-line message "Error in cell Z42: ...", where Z42 is whatever cell is broken, and where the additional message can give whatever details you want. If multiple cells have problems, then print a one-line message for each error.

• If no cells are in error, then you must fully evaluate the requested cell, and print its final value. (Numbers should simply be printed using String.format("%f", ...). Strings should be printed in double-quotes, and any quotes within the string should be escaped with a backslash...and any backslashes should be escaped with another backslash. For instance, "Jack says \"hi\". Jill has one backslash \\ here.")

If the command-line is malformed, print an error and exit.

9. Again: Test your model thoroughly.

We will not have thorough unit tests for your programs on this assignment, since we have no idea at all what your classes will be named or how your code will be designed. Instead, the only tests we will run will be through your main method and using example spreadsheet files. Make sure that your main class and main method exist and work properly!

### 5What to submit

Reminder: This assignment is to be completed with a partner. You will sign up with your partner on the handin server from this assignment onwards. You will not be able to submit subsequent assignments until you are part of a team on the handin server.

If you do not know (or remember) how to request teams, follow these instructions. Please request teams no later than Wednesday night at 9pm if you have not requested a teammate by then, we will randomly assign you one.

• Submit all Java files you created or used in this assignment

• A text README file explaining your design. Your README file should give the graders an overview of what the purposes are for every class, interface, etc. that you include in your model, so that they can quickly get a high-level overview of your code. It does not replace the need for proper Javadoc!

• At least three input files that your model can process. Two of them should work properly, while the third should trigger an evaluation error. The two working files should be subtantively different from each other, and should demonstrate all the features of your model that you support.

• As with previous assignments, please submit a zip containing only the src/ and test/ directories with no surrounding directories, so that the autograder recognizes your package structure. Please do not include your output/ or .idea/ directories — they’re not useful!

For this assignment, you will be graded on

• the design of your model interface(s), in terms of clarity, flexibility, and how plausibly it will support needed functionality;

• the appropriateness of your representation choices for the data, and the adequacy of any documented class invariants (please comment on why you chose the representation you did in the code);

• the forward thinking in your design, in terms of its flexibility, use of abstraction, etc.

• the correctness and style of your implementation;

• the thoroughness of your example files in demonstrating all the features of your model; and,

• the comprehensiveness and correctness of your test coverage.