Assignment 12
Due Date Monday 10/29 at 9pm
Purpose To process large amounts of data by making use of I/O.
You should submit a single .rkt file containing your responses to all exercises via the Handin Server. We accept NO email submissions.
You are only allowed to use the language specified at the top of this page: failure to do so will result in a 0.
All steps of the design recipe are required unless otherwise specified. However, at this point we expect you are familar with the [List-of X] data definition and do not require it or its template.
You are expected to use pre-defined abstractions when stated and/or when appropriate.
Words, Words, Words
For the first part of this homework, we will ask you to do some work with H.P. Lovecraft’s Ex Oblivione. Save this file locally; read it if you want to, but it is not necessary.
Exercise 1 Design the function remove-punctuation which accepts a String and removes 1 instance of ",", ";", or "." from the end of the input String if it is there.
Exercise 2 Use the function compose and string-downcase to design a function which will first remove-punctuation from a String and then convert it to its lowercase variant.
Exercise 3 Design the function compound-word? which determines if the given String contains "-". Hint: the string-contains? function will be useful here.
The following code bins a list of elements into groups where they match. Do your best to understand it.
; An [NEList-of X] (Non-Empty List of X) is one of: ; - (cons X '()) ; - (cons X [NEList-of X]) ; bin : [X X -> Boolean] [List-of X] -> [List-of [NEList-of X]] ; Bin lox by matches? (check-expect (bin = '()) '()) (check-expect (bin = (list 2 3 4 2)) (list (list 2 2) (list 4) (list 3))) (define (bin matches? lox) (local [; find-spot-for-x : X [List-of [NEList-of X]] -> [List-of [NEList-of X]] ; Find the spot for x (define (find-spot-for-x x bins) (find-spot x matches? bins))] (foldr find-spot-for-x '() lox))) ; find-spot : X [X X -> Boolean] [List-of [NEList-of X]] -> [List-of [NEList-of X]] ; Find where x belongs and place it in the appropriate list (or give it its own) (check-expect (find-spot 2 = '()) (list (list 2))) (check-expect (find-spot 2 = (list (list 3) (list 2) (list 4))) (list (list 3) (list 2 2) (list 4))) (define (find-spot x matches? bins) (cond [(empty? bins) (list (list x))] [(cons? bins) (if (matches? x (first (first bins))) (cons (cons x (first bins)) (rest bins)) (cons (first bins) (find-spot x matches? (rest bins))))]))
Exercise 4 Design the function unique-compound-words which, given a [List-of String], produces the unique compound words in the Strings. Be sure to ignore the Strings’ capitalization and punctuation, use pre-defined abstractions (filter, etc) where appropriate, make use of the functions you designed above, and to filter data earlier rather than later to avoid unneeded computation.
bin will be helpful in ensuring uniqueness (although you will have to do something with its output to get a flat list).
Then, place (require 2htdp/batch-io) at the top of your file. Use read-words and unique-compound-words to define a constant, UNIQUE-COMPOUND-WORDS, which will be a list of the unique compound words found in Ex Oblivione.
Line Segments... In Space!
For the second part of this homework, we will ask you to work with a dataset of line segments in 3d space. Specifically, we are going to ask how many pairs of line segments share a point at their ends.
Save this file locally. In this file are rows of 7 numbers, and what they mean are described below:
; A RawInput is a (list String String String String String String String) ; and represents the x1, y1, z1, x2, y2, z2, and confidence level of a line in 3d space ; A NumericInput is a (list Number Number Number Number Number Number [0, 1]) ; and represents the x1, y1, z1, x2, y2, z2, and confidence level in the accuracy of the data ; raw->numeric : RawInput -> NumericInput ; Convert raw input to numeric input (check-expect (raw->numeric (list "0" "1" "2" "3" "4" "5" ".4")) (list 0 1 2 3 4 5 0.4)) (define (raw->numeric raw) (map string->number raw)) ; read-in-segments : String -> [List-of NumericInput] ; Read in the line segments contained in s and map to numeric input (define (read-in-segments s) (read-csv-file/rows s raw->numeric))
All of the line segments are guaranteed to have non-0 length (the first three values will never be equal to the second three values) and all of the line segments are unique.
Exercise 5 Design a function which filters a [List-of NumericInput] and returns only those which have a confidence level of 0.2 or higher.
Exercise 6 Design a structured data definition LineSegment, which is two points in 3d space. You do not need to provide templates or examples, but they will come in handy if you do.
Exercise 7 Design a function which converts a NumericInput into a LineSegment.
Exercise 8 Design a function which "flips" a LineSegment by swapping its two points. Note that this still represents the same line segment.
Exercise 9 Design a function attached-at-first-point? that accepts two LineSegments and determines if they have the same first point. Note that this should not return #true for two LineSegments with the same second point; only those that match in the first point should return #true.
Exercise 10 Design a function bin-linesegs which accepts a [List-of LineSegment] and bins them by their first point. Note that since LineSegments can connect at either end, the [List-of LineSegment] you should give to bin is the passed-in [List-of LineSegment] appended to a copy of this list with all of its LineSegments "flipped."
Exercise 11 Design a function which, given a [List-of X], computes the number of pairs of elements. Recall from math class that a list of size N can generate N*(N-1)/2 pairs.
Exercise 12 Design a function which, given a [List-of LineSegment], bins them, counts the pairs in each bin, and returns the sum of all the number of pairs in all the bins.
Exercise 13 Use your above functions to define the constant NUMBER-OF-PAIRS, which tells us how many pairs of line segments found in the original file share a point (whose confidence level is 0.2 or more).
Note that this computation will likely take a minute or two to run. In future classes you will learn both techniques to analyze why this is the case and how to make it run faster.