6.6

Assignment 17

home work!

Programming Language ISL+

Due Date Monday 04/08 at 9pm

Purpose To get practice using graphs.

Expectations
  • You should submit a single .rkt file containing your responses to all exercises via the Handin Server. We accept NO email submissions. Failure to submit a .rkt file will result in a 0.

  • You are only allowed to use the language specified at the top of this page: failure to do so will result in a 0.

  • Your code MUST conform to the guidelines outlined in the style guide on the course website. The style guide will be updated as the semester progresses so please remember to read it before submitting each assignment.

  • You must follow all the steps of the design recipe when completing this assignment.

  • Please be sure to look at the feedback for assignment 12 before submitting, as we will be grading you more harshly on things we have warned you about before.

  • You must submit this assignment with your partner. Please make sure you can submit with your partner before 5pm on Thursday or we cannot guarantee that you will be able to submit your homework before the deadline. If the course staff are unable to assist you after 5pm and this causes your homework to be late we will not grant an extension.

PageRank

PageRank is a famous algorithm that largely led to Google’s early success as a web search engine. In brief, PageRank is an algorithm that is able to determine the important pages in the web only using the structure of the links. Put simply, PageRank simulates a user browsing, and determines the probability that a user who is randomly clicking will end up on a given page. Pages that are linked to often (e.g., nytimes.com) will end up with a much higher score than pages that are not linked to often.

We recommend that you read the Wikipedia article on PageRank to get an overview of the algorithm. If you’re curious about more details, you can read the original paper as well. Today, we’ll be using a slightly simplified version of the algorithm.

Let’s begin with a data definition for pages in a web graph:

(define-struct page [name links])
 
; A WebPage is a (make-page String [NEList-of String])
; Interpretation: A page in the web, with its name (think of this as the "URL") and its
; non-empty list of outgoing links.
(define A (make-page "A" (list "B" "C" "D")))
(define B (make-page "B" (list "A")))
(define C (make-page "C" (list "A")))
(define D (make-page "D" (list "A")))
 
; A Web is a [List-of WebPage]
; Interpretation: All of the pages in the Web
(define WEB (list A B C D))

IMPORTANT NOTES: For this problem we will assume that there are no dangling links in the Web (i.e., all pages that are linked to exist in the Web). We will also assume that all pages have at least one outgoing link (hence the NEList-of String for the links).

Exercise 1 Design the function invert that accepts a Web and returns a Web with all of the links inverted. In other words, if A linked to B in the original Web, B should link to A in the returned Web. This is a good warmup problem and will be useful later.

We’re now going to work towards implementing the PageRank algorithm. Let us define a the score of a single page as a

(define-struct rank [name score])
 
; A PageRank is a (make-rank String Number)
; Interpretation: The pagerank score of a single page, with its name and score.
; The score must be a value between 0 and 1.

And the set of all PageRanks in a given Web:

; A Rank is a [List-of PageRank]
; Interpretation: A list of the scores of all pages in a Web.  All of the scores
; should sum to 1.
(define RANK-1 (list (make-rank "A" 0.25) (make-rank "B" 0.25) (make-rank "C" 0.25) (make-rank "D" 0.25)))
(define RANK-2 (list (make-rank "A" 0.67) (make-rank "B" 0.11) (make-rank "C" 0.11) (make-rank "D" 0.11)))

Exercise 2 Design the function rank-diff that accepts two Ranks and returns the absolute value of the differences of each rank. In other words, it calculates the sum of the absolute value of the differences in rank for each page.

For example, the rank-diff when applied to RANK-1 and RANK-2 would return

|0.25-0.67| + |0.25-0.11| + |0.25-0.11| + |0.25-0.11|

which should evaluate to 0.84. Be sure to remember that the PageRanks in two Ranks may not necessarily be in the same order. You may assume that the two PageRanks have ranks for the same set of pages.

Now that we can compare ranks, we need a function to update the pagerank. The updated pagerank P’ of a single page is given by the simplified algorithm with a dampening factor:

P’(A) = (1-d)/N + d*(P(B)/L(B) + P(C)/L(C) + P(D)/L(D))

Where d is a static dampening factor of 0.85, N is the number of pages in the Web, P(X) is the previous pagerank of page X, and L(X) is the number of outgoing links from page X.

In other words, if page A is linked to by pages B, C, and D, then the updated pagerank of A is given by (a) static "reset" value of (1-d)/N and (b) the sum of the old pagerank of each the pages that point to A, divided by their number of outgoing links, times the dampening factor.

Exercise 3 Design the function update that accepts a Web and a Rank. It returns the updated Rank by applying the formula above. You may find some of your previous functions helpful in solving this problem.

For this problem, we will use a constant dampening factor d of 0.85.

Exercise 4 Design the function pagerank. It accepts a Web and a threshold Number. It creates an initial Rank that assigns pages have the same initial score, where the total score sums to 1. It then repeatedly calls update until the absolute value of the differences between the original and update Rank is less than the threshold. It then returns that Rank. For example, if called on the Web above defined as WEB and a threshold of 0.01, pagerank should return a Rank close to (list (make-rank "A" 0.481) (make-rank "B" 0.174) (make-rank "C" 0.174) (make-rank "D" 0.174)).