6.6

Homework 10

home work!

Programming Language ISL+

Due Date Friday, November 15 at 6pm

Purpose To practice working with tree-structured and graph-structured data.

Expectations
  • You should submit a single .rkt file containing your responses to all exercises via the Handin Server. We accept NO email submissions. Failure to submit a .rkt file will result in a 0.

  • You are only allowed to use the language specified at the top of this page: failure to do so will result in a 0.

  • Your code MUST conform to the guidelines outlined in the style guide on the course website. The style guide will be updated as the semester progresses so please remember to read it before submitting each assignment.

  • You must follow all the steps of the design recipe when completing this assignment.

  • Please be sure to look at the feedback for assignment 8 before submitting, as we will be grading you more harshly on things we have warned you about before.

  • You must submit this assignment with your partner.

Exercise 1 Consider the following data definitions:
(define-struct node [l r])
; A LeafyTree is one of:
; - "leaf"
; - (make-node LeafyTree LeafyTree)
; Interpretation: a binary tree that ends in a leaf
 
; An LR is one of:
; - "left"
; - "right"
; Interpretation: directions one can take in a LeafyTree
 
; A Path is a [List-of LR]
Design the function paths that given a LeafyTree outputs the list of all Paths that represent the ways to go from the root of the tree to its leaves. Do not forget to first finish designing the supplied data definitions.

Exercise 2 Consider the following data definitions:
; An Operation is one of:
; - "+"
; - "*"
; Interpretation: a commutative operation
 
; An AExp (Arithmetic Expression) is one of:
; - Number
; - (cons Operation [List-of AExp])
Design the function eval that consumes an AExp and evaluates it to a single number. Do not forget to first finish designing the supplied data definitions.

Exercise 3 It is now time to complete your project for this semester.

First, review Homework 5 to remind yourself of the ideas, terms, and data definitions involved.

Next, download this file which contains two folders ("test" and "train") of input files. Each file name has the following format: d_<id>_<digit>.txt where <digit> is the correct label for the file and <id> depends on the folder. In the test folder the id was related to the original dataset and can be ignored. In the train folder the ids are sequential from 1-30, meaning that you have 30 examples for each digit.

Your overall goal will be to design the mnist world program, which takes a Number and a String. The number represents how many training files there are per digit, and the string is the path to the testing file that should be used. This function should then execute the nearest-neighbor algorithm: compute all the neighbors (by comparing the indicated testing to each of the training), find the "best" (the one with the smallest distance), and run the interactive visual program described in Homework 5. In this program the training image is shown, as well as the nearest neighbor and predicted label, and the user can scroll left/right through all the other neighbors using the keyboard, viewing the image as well as the distance. Finally, mnist should return the digit of the nearest neighbor (i.e., the prediction of the classification algorithm).

This probably seems like a lot of work, but you’ve actually done most of the work in earlier homework assignments. Here are some reminders...

  • Reading in files: look to Homework 8 for training-fnames, which produces a list of file names based upon the pattern described above. Also, look to fname->label for getting the digit from the filename. In Homework 9, you designed read-lolon which will be useful for getting the contents of the files into a list of lists of numbers. And Homework 8 flatten is useful for converting from a Bitmap to an Instance. Finally, Homework 8 had bitmap->image, which might be useful for producing an image from a Bitmap.

  • Distance between Training/Testing: for distance, you should use Euclidean distance, which is basically Pythagorean distance but can take coordinates with more than 2 dimensions. The formula for Euclidean distance looks like this: distance = (sqrt (+ (sqr (- x1 x2)) (sqr (- y1 y2)) (sqr (- z1 z2)) ...))

    If you look back to Homework 9, map-2list can be really handy in computing this.

  • Nearest neighbor: given a list of neighbors, the nearest is the one with the smallest distance. Look to smallest-of-list-by-f in Homework 8 for a great start.

  • Scrolling left/right in a list: in Homework 8, look to next-index and prev-index to help with this.

  • Returning the prediction: in Homework 8, look to return-former as one way to make sure mnist returns what you want, which may or may not be returned by big-bang.

Warning: when you are designing your world state, avoid putting all the neighbors in your representation - DrRacket will be very slow. Instead, see if you can’t just keep track of where your are in the list.

In case you are curious, the image shown in HW5 was produced using the following call: (mnist 10 "test/d_55555_9.txt")