CS6200: Information Retrieval

Homework 3

Return to basic course information.

Assigned: Thursday, 29 October 2015
Due: Monday, 9 November 2015, 11:59 p.m.

Indexing and Retrieval

Implement a small search engine. The main steps are:

Tokenized Document Collection

Building an Inverted Index: The following data structures are required for BM25 computation:

BM25 Ranking

  1. Retrieve all inverted lists corresponding to terms in a query.
  2. Compute BM25 scores for documents in the lists.
  3. Make a score list for documents in the inverted lists.
  4. Accumulate scores for each term in a query on the score list.
  5. Assume that no relevance information is available.
  6. For parameters, use $k_1 = 1.2$, $b = 0.75$, $k_2 = 100$.
  7. Sort the documents by the BM25 scores.

Test Queries: Use the following stemmed test queries, also provided in the file queries.txt:
Query IDQuery Text
1portabl oper system
2code optim for space effici
3parallel algorithm
4distribut comput structur and algorithm
5appli stochast process
6perform evalu and model of comput system
7parallel processor in inform retriev