Return to basic course information.
Assigned: Monday, September 15
  Due: Email TAs with subject "CS6200 HW1" by Friday, September 26, 11:59 p.m.
[5 points] Document filtering is an application that stores a large number of queries or user profiles and compares these profiles to every incoming document on a feed. Documents that are sufficiently similar to the profile are forwarded to that person via email or some other mechanism.
Explain the major differences compared to a search engine. Consider issues such as specific efficiency problems and the usefulness of ranking in a filtering application.
Implement your own web crawler, with the following properties:
http://en.wikipedia.org/wiki/Gerard_Salton,
	  the Wikipedia article on Gerald Salton, an important early
	  researcher in information retrieval.http://en.wikipedia.org/wiki/.  In other
	  words, do not follow links to non-English articles or to
	  non-Wikipedia pages.http://en.wikipedia.org/wiki/Main_Page.Hand in your code and instructions on how to (compile and) run it. In addition, hand in two lists of URLs:
What proportion of the total pages were retrieved by the focused crawler for "information retrieval"? Keep in mind that this will be a significant overestimate of the prevalence of Wikipedia articles on information retrieval.