Return to basic course information.
Assigned: Thursday, 20 November 2014
  Due: Tuesday, 2 December 2014, 11:59 p.m.
Implement a small search engine. The main steps are:
tccorpus.txt
	inside tccorpus.zip.
	This is an early standard collection of abstracts from
	the Communications of the ACM.indexer tccorpus.txt index.out
bm25 index.out queries.txt 100 > results.eval
query_id Q0 doc_id rank BM25_score system_nameThe string
Q0 is a literal used by the evaluation script.  You can use any space-free token for your system_name.
Tokenized Document Collection
tccorpus.txt file is in the format:
	  # 1 this is a tokenzied line for document 1 this is also a line of document 1 # 2 from here lines for document 2 begin ... ... # 3 ...
Building an Inverted Index: The following data structures are required for BM25 computation:
word -> (docid, tf), (docid, tf), ...BM25 Ranking
Test Queries: Use the following stemmed test queries, also provided in the file queries.txt:
      
| Query ID | Query Text | 
|---|---|
| 1 | portabl oper system | 
| 2 | code optim for space effici | 
| 3 | parallel algorithm | 
| 4 | distribut comput structur and algorithm | 
| 5 | appli stochast process | 
| 6 | perform evalu and model of comput system | 
| 7 | parallel processor in inform retriev |