Assignment 5: Beyond gOOD: A Simple Spreadsheet, Part 1
Due: Thu 10/31 at 8:59pm; self-evaluation due Fri 11/01 at 9:59pm
Starter files: code.zip
1 Working in pairs
Note: This assignment is to be completed with a partner. You will sign up with your partner on the handin server from this assignment onwards. You will not be able to submit subsequent assignments until you are part of a team on the handin server.
Note: If you do not know (or remember) how to request teams, follow
these
instructions. Please request teams no later than Wednesday night Oct 23 at
9pm —
Although the expected work is not simply double of what it was in earlier assignments, the assignment behooves mutual cooperation and pair programming. Please make a plan, meet frequently and tackle the assignment in a systematic manner. Dividing the work and then meeting directly the day before it is due is a recipe for disaster. This assignment requires you to self-learn some things, so enforce each other’s efforts and learning.
1.1 Submission logistics
You will need to work with your partner to submit the self-eval. As with the code, one partner will submit a single self-eval for the team. However, you will likely need to sit with your partner or be on the phone as you should agree on all the responses. Recall that once you have opened the self-eval, you may not resubmit the code.
2 Context
All of us have encountered spreadsheets before: grids of cells whose contents
are either blank, a value, or a formula that evaluates some function over
inputs from other cells. Fancier spreadsheets can include graphs, images,
formatted text, etc. —
Spreadsheets are actually a fairly sophisticated functional programming environment: formulas can contain functions of a fixed number of arguments, or of a range of values; they can contain conditionals to select among options; by creating a bunch of similar, consecutive rows they can even simulate fixed-length iteration. The only thing they prohibit is cyclic reference: no cell can depend upon its own value as one of its inputs. Moreover, if we edit any of the cells, every cell that depends upon it will automatically get re-evaluated to incorporate the new input value. (This is a small example of what is known as dataflow programming, and it’s a very powerful and useful programming model.)
In the next few assignments, we will build an application that lets us load, edit and save simple spreadsheets. We may start from a blank file, or from some input (that may have been created by our application or by some other, unrelated one).
2.1 Description of spreadsheets
A (professional) spreadsheet contains one or more worksheets, each of
which consists of a collection of cells. In this project, we will only need to
support one worksheet per file, so “spreadsheet” and “worksheet” will be
considered synonyms. Cell names are written as <column name><row index>
,
where column names follow the pattern
Columns 1 through 26 get names
A
throughZ
Columns 27 through 52 get names
AA
throughAZ
Columns 53 through 78 get names
BA
throughBZ
...Column 703 gets name
AAA
...
Row indices are also numbered starting from 1...but fortunately, we just use numbers for that instead of letters. (In this naming scheme, there is no row or column with index 0.)
We have provided the Coord
class for you to describe cell coordinates in
a spreadsheet, if you wish. It also contains two static
methods to
convert column indices to names and vice versa.
2.2 Cell contents
An individual spreadsheet cell may:
be blank
contain a value
contain a formula
If a cell contains a value, then at minimum your spreadsheets must support
boolean
valuesdouble
valuesString
values
A formula is one of
a value
a reference to a rectangular region of cells in the spreadsheet
a function applied to one or more formulas as its arguments
Consider the following tiny spreadsheet, that takes four values (in cells
A1
through D1
), interprets them as two pairs of (x, y) coordinates,
and computes the distance between them, placing the result into cell
A3
. It then determines whether that distance is less than 10, and
places the result in B3
.
We write the instructions down in plain English here; we’ll give a more concrete syntax for our program to use later.
Let A1 equal 3
Let B1 equal 4
Let C1 equal 9
Let D1 equal 12
Let A2 equal the squared difference of C1 and A1
Let B2 equal the squared difference of D1 and B1
Let A3 equal the square root of the sum of A2 and B2
Let B3 equal TRUE if A3 is less than 10, otherwise FALSE
2.3 Evaluating cells
Cells that contain values (as described above) just evaluate to themselves. As described above, a formula can take as inputs references to other cells (or groups of cells). Evaluating a formula therefore may require evaluating other formulas as well.
No formula is permitted to refer to itself, though, either directly or
indirectly, since that would lead to an infinite regress. (One simple direct
example would be “Let A1
equal A1
plus 1”. One simple indirect
example would be “Let A1
equal B1
, and let B1
equal
A1
plus 1”.) Such an error can easily happen, so detecting and
preventing it is important. Moreover, any formula that depends on one of these
erroneous formulas must itself be in error.
3 Beyond gOOD: A Simple Spreadsheet Application
We will build this application progressively using the classic Model-View-Controller architecture. In this assignment we will focus on the model of this application. There is almost no starter code for this assignment: you must use the above description and broad ideas to design the interface and implementation of the model. You are deliberately not being told about the fine details of what the controller and views will need or do. You are also not being told about how exactly the application will receive a description from the user (it would not be relevant to the model).
Here are some aspects to think about:
What does the model represent?
The model should be able to support at least the three types of values described above, though you might support additional ones.
The model should be able to support references to cells and to finite rectangular regions of cells, though you might try to support additional features.
The model should be able to support various formulas, although we’ve only seen sums, differences, products, square roots, and comparisons. Note that some formulas (like addition or multiplication) can take references of any size, while others (like subtraction or comparison) can only take references to single cells. You should consider carefully how to support and how to enforce this.
The model should be able to support editing cells’ contents, regardless of a cell’s location or prior contents.
Remember to think from not only the implementors’ perspective (the people that are implementing the model) but also the client’s perspective (the people or classes that are using the model). What operations might they reasonably want to perform? What observations might they reasonably want to make?
3.1 How is the spreadsheet seen?
Spreadsheets intrinsically can be viewed in several ways:
As a graphical grid of cells, showing their evaluated values
As a graphical grid of cells, showing their raw contents (e.g. the text of the formulas or values in each cell)
As a text file describing their contents
...
Below we show a text file describing the spreadsheet example above, along with
one additional formula (just for demonstration purposes). Each row consists of
a cell name, a space, and then the contents of the cell as the remainder of the
line. (Lines beginning with a #
are comments.) Cells A1 through D1
contain values; cells A2 through A4 contain formulas.
# Creates the four coordinates
A1 3
B1 4
C1 9
D1 12
# Computes delta-x^2
A2 =(PRODUCT (SUB C1 A1) (SUB C1 A1))
# Computes delta-y^2
B2 =(PRODUCT (SUB D1 B1) (SUB D1 B1))
# Computes the rest of the distance formula
A3 =(SQRT (SUM A2:B2))
B3 =(< A3 10)
# Computes the distance formula all in one step
A4 =(SQRT (SUM (PRODUCT (SUB C1 A1) (SUB C1 A1)) (PRODUCT (SUB D1 B1) (SUB D1 B1))))
Formulas are written with a leading =
, and are followed by an
s-expression describing the formula. To review, an s-expression is one
of:
A number (written like a Java
double
)A string (written like a Java
String
, with quotes as usual)A boolean (written as either
true
orfalse
)A symbol (written as any sequence of letters, digits or symbols, with no spaces)
A list of s-expressions, surrounded by parentheses and separated by spaces
Any function names or cell references that appear in a formula will be parsed
as symbols in the s-expression. A cell reference can be either a single cell
name (e.g. A5
), or two cell names separated by a colon
(i.e. C42:D53
) where the second cell’s column and row are no smaller
than the first cell’s column and row.
3.2 How is a spreadsheet read?
We will provide you with a parser for s-expressions; you do not have to figure
out how to parse them yourselves. The parser will take in a String
containing the expression to be read (e.g. everything after the =
sign)
and return a value of type SExp
corresponding to that expression (or
will throw an IllegalArgumentException
if the string could not be
correctly parsed).
We will also provide you with a parser for the spreadsheet-as-text file format above, though that one will be slightly more involved. Because we do not know how you are choosing to implement spreadsheets, we will provide you with a class with the following signature:
public final class WorksheetReader {
public static <T> T read(WorksheetBuilder<T> builder, Readable readable) { ... }
}
This method is parameterized by whatever type you choose to implement your
spreadsheets. It will read the contents of the provided Readable
, and
determine the contents of your spreadsheet. In order to actually build
the spreadsheet, though, it relies on a builder that you must
implement and supply. The WorksheetBuilder
interface has only two
methods on it:
public interface WorksheetBuilder<T> {
WorksheetBuilder<T> createCell(int col, int row, String contents);
T createWorksheet();
}
The first method takes in a column and row index (both counting from 1) and the raw contents of the cell. Your implementation of this method should determine what kind of data this cell should contain, and record this information in your model somehow.
The second method returns the worksheet you have filled in. If you need to do any additional processing to fully construct your model before returning it, do so here.
In design-pattern terms, WorksheetReader
is a factory: if you
provide it with a builder of worksheets, it can construct a model
worksheet for you.
Hint: While your builder creates a worksheet whose contents have been filled in from somewhere, the resulting model should still be editable. Make sure you do not inadvertently build in assumptions about the size or contents of your model based on what the builder produces.
4 What to do
Design a model to represent a spreadsheet. This may consist of one or more interfaces, abstract classes, concrete classes, enums, etc. Consider carefully what operations it should support, what invariants it assumes, etc. You may assume that the number of rows or columns in a spreadsheet can each be represented by a positive
int
, but you may not make any additional assumptions about them.Document your model well. Be sure to document clearly what each type and method does, what purpose it serves and why it belongs in the model. If your model assumes any invariants, explain them clearly and concisely.
Implement your model. If some method of your model cannot be implemented because it requires details you think may be revealed in future assignments, you may leave the method body blank for now —
but leave a comment inside the method body explaining why it’s empty and what details you’re waiting for. If your implementation makes additional assumptions about invariants (beyond those asserted by the interface), document them. Implement at least four functions:
A
SUM
function, that adds up all the values of all of its arguments. Its arguments could be references of arbitrary size, and the sum should cover all the values in the region being referenced. Any arguments whose contents are blank or not numeric should be ignored. If there are no numeric arguments at all, the default value is zero.A
PRODUCT
function, that multiplies all the values of all of its arguments. Its arguments could be references of arbitrary size. Any arguments whose contents are blank or not numeric should be ignored. If there are no numeric arguments at all, the default value is zero. (Yes, zero: I don’t know why professional spreadsheet programs chose this value instead of one, but they did.)A
<
function, that takes exactly two values and returns whether the first is less than the second. If either value is missing or is not a number, the function should error.Another function of your choosing, whose return type is a string.
The names given above can be used in formulas, like
=(< (SUM A1:A4) (PRODUCT B2:C5 D8))
. We will use the three specific functions above to test your implementation, but you are welcome to implement additional functions.Ensure that (direct or indirect) cyclic references between formulas are detected and prohibited in your model implementation. Hint: Doing this well is trickier than it may seem at first.
Ensure that you can evaluate cells. Hint: Doing this well is trickier than it may seem at first. We’re not extremely concerned with efficiency, but your code should not be egregiously slow.
Test your model thoroughly.
Create a class
edu.cs3500.spreadsheets.BeyondGood
with amain
method for your program. Yourmain
method should expect a command-line of the following form:-in some-filename -eval some-cellname
For example, one valid command line would be
-in posn-distances.gOOD -eval A3
.The first two arguments specify a filename to read in. Your program must read in the file and construct a model from it. The second two arguments specify a cell whose contents you want to evaluate.
If a cell has a problem, then print out a one-line message
"Error in cell Z42: ..."
, whereZ42
is whatever cell is broken, and where the additional message can give whatever details you want. If multiple cells have problems, then print a one-line message for each error.If no cells are in error, then you must fully evaluate the requested cell, and print its final value. (Numbers should simply be printed using
String.format("%f", ...)
. Strings should be printed in double-quotes, and any quotes within the string should be escaped with a backslash...and any backslashes should be escaped with another backslash. For instance,"Jack says \"hi\". Jill has one backslash \\ here."
)
If the command-line is malformed, print an error and exit.
Again: Test your model thoroughly.
We will not have thorough unit tests for your programs on this
assignment, since we have no idea at all what your classes will be named or how
your code will be designed. Instead, the only tests we will run will be
through your main
method and using example spreadsheet files. Make sure
that your main class and main method exist and work properly!
5 What to submit
Reminder: This assignment is to be completed with a partner. You will sign up with your partner on the handin server from this assignment onwards. You will not be able to submit subsequent assignments until you are part of a team on the handin server.
If you do not know (or remember) how to request teams, follow
these
instructions. Please request teams no later than Wednesday night at
9pm —
Submit all Java files you created or used in this assignment
A text README file explaining your design. Your README file should give the graders an overview of what the purposes are for every class, interface, etc. that you include in your model, so that they can quickly get a high-level overview of your code. It does not replace the need for proper Javadoc!
At least three input files that your model can process. Two of them should work properly, while the third should trigger an evaluation error. The two working files should be subtantively different from each other, and should demonstrate all the features of your model that you support.
As with previous assignments, please submit a zip containing only the
src/
andtest/
directories with no surrounding directories, so that the autograder recognizes your package structure. Please do not include youroutput/
or.idea/
directories —they’re not useful!
6 Grading standards
For this assignment, you will be graded on
the design of your model interface(s), in terms of clarity, flexibility, and how plausibly it will support needed functionality;
the appropriateness of your representation choices for the data, and the adequacy of any documented class invariants (please comment on why you chose the representation you did in the code);
the forward thinking in your design, in terms of its flexibility, use of abstraction, etc.
the correctness and style of your implementation;
the thoroughness of your example files in demonstrating all the features of your model; and,
the comprehensiveness and correctness of your test coverage.
7 Submission
Please submit your homework to https://handins.ccs.neu.edu/ by the above deadline. Then be sure to complete your self evaluation by its due date.