CS 6200 | Information Retrieval

Date	Topics	Lecture	Reading	Assignments & Demos
R 1/11	IR Intro IR intro Queries and documents	Slides: IR vs. SEs Video: IR Intro	Background: Linear Algebra Background Programming: Java or Python [CMS] Chaper 1 As We May Think, Vannevar Bush, 1945. The History of Information Retrieval, Croft and Sanderson, IEEE Xplore, 2012.
	SE Architecture	Slides: Architecture of Search Engines Video: SE architecture	[CMS] Chapter 2	Demo: ElasticSearch \| ES(JAVA)
R 1/18
	Retrieval Models Boolean Retrieval TF, tf.idf	Slides: Boolean & VSMs Video: Boolean & VSM (Part 1) Notes: Retrieval Models	[CMS] Chapter 7 [MRS] Chapter 1:Boolean Retrieval [MRS] Chapter 6:Vector Space Models
R 1/25	Retrieval Models Vector Space Model: cosine AP89 data	Slides: Boolean & VSMs Video: VSMs (Part2) \| HW1 (In-class demo)	[CMS] Chapter 7 [MRS] Chapter 6:Vector Space Models	Demo: HW1-Scoring Demo: trec_eval
	Query Transformation Query Expansion Relevance Feedback co-occurence; bi-grams	Slides: Query Transformation Video: Query Expansion	[CMS] Chapter 6 [MRS] Chapter 9
R 2/1	Retrieval Models: Prob. Models BIM, idf BM25	Slides: BM25 Video: RF + BM25	[CMS] Chapter 7 [MRS] Chapter 11: Probabilistic Information Retrieval [PAPER] Document Ranking and the Vector-Space Model. by Lee et all ,1997.
	Retrieval Models: LM Language generative Models Query Likelihood Model Divergence Smoothing Relevance Feedback	Slides: Langugage Models Video: Language Models Retrieval Model toy example	Background:> Probabilities [CMS] : Chapter 7, Language Models [MRS] Chapter12: Language Models for Information Retrieval [PAPER] A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval by Chengxiang Zhai and John Lafferty	HW1 due (F-2/2)
R 2/8	Indexing: Index pre-processing Stopping, stemming Inverted Index	Slides: Indexing Video: Indexing (intro)	[CMS] Chapter 4 [MRS] Chapter 2 Zobel's Inverted files paper
	Indexing: Index Construction Inverted Index Index Constrcution Ngrams, Skipgrams, Finding Blurbs Co-occurrance	Slides: Indexing Video: Index Construction Notes: Indexing Finding blurbs (minimum spans)	[CMS] Chapter 5 [MRS] Chapter 4: Index construction [MRS] Chapter 7: Computing Scores	Demo: HW2 - Index Construction(Python) \| JAVA
R 2/15	Indexing: Distributed Indexes Distributed Indexes	Slides: Indexing Video: Distributed Indexing	[CMC] Chapter 5 [MRS] Chapter 4
	Indexing: Query Processing Query Optimization	Slides: QueryProcessing Video: Query Optimization	[CMC] Chapter 5 [MRS] Chapter 7
R 2/22	Indexing: Storage Compression Zipfs and Heaps laws Index Storage	Slides: Index Compression Text Statistics Video: Text Stats \| Compression	[CMC] Chapter 5 [MRS] Chapter 5: Index compression A summary of the WSDM talk by Greg Linden Wikipedia : Delta Encoding; Elias Delta Encoding, Elias Gamma Encoding Notes : Lempel Ziv	HW2 due (F-2/23)
	Crawling Crawling Basics HTTP Links Graph BFS recap Frontier/Queue Duplicates	Slides: Crawling(Part1) Video: Crawling-1 Notes: Crawling	[CMC] Chapter 3 [MRS] Chapter 20 Web Crawler - Wikipedia Web Crawling Tutorial by Christopher Olston and Marc Najork inr-017.dvi
R 2/29	Crawling Crawling Basics Frontier/Queue Vertical Search	Slides: Crawling (Part2) HW3-tips Video: Crawling-2	[CMC] Chapter 3 [MRS] Chapter 20	Demo: Demo: HW3 (Crawling)
	Crawling: Merging Link Graph Freshness vs. Coverage Vertical Search	Slides: Crawling(Part3) Video: Crawling-3 Note: Merging indexes in ES	Paper: Crawl Longevity Paper: IRLbot Paper: Near Duplicates Paper: Powerlaw Internet Topology
M-F 3/4 - 3/8	Spring Break - NO CLASS
R 3/14	Link Graph Citation Analysis PageRank	Slides: Link Analysis Video: Link Analysis-1(Part1) Notes : Webgraph Recap : Markov Chains PageRank Explained Paper: PageRank	PageRank Examples Paper: Graph Structure in the Web PageRank basic pseudocode	Demo: HW3 - Merging
	Link Graph Page Rank Topical PageRank Hubs / Authorities	Slides: Link Analysis Video: Link Analysis-1(Part2)
R 3/21	Link Graph HITS SALSA	Slides: HITS / SALSA Video HITS / SALSA Paper: HITS Paper: SALSA	HITS Lecture from Cornell Wikipedia: SALSA paper: HITS vs SALSA HITS vs SALSA visual comparison tool, cdf file	HW3 due (F:3/15)
	IR Evaluation: Measures IR ranking performance: Set measures: Precision, Recall, F1, Accuracy, ROC, confusion matrix Ranking measures: R-prec, AP, nDCG, Reciprocal Rank	Slides: IR Performance (Part1) Video: Evaluation-1 Notes : IR Evaluation	Paper: Classification of IR measures as of year 2000 paper : IR metrics, tests (slides) code: Diversity measures Paper: IR Metrics vs. users Wikipedia: ROC
R 3/28	IR Evaluation Relevance Assessments Significance tests Diversity eval using subtopics Assessors, Crowdsourcing, Cost Assessor Interface	Slides: IR Performance(Part2) Video: Evaluation-2		HW4 due (F-3/29)
	Machine Learning / Features Document understanding Features Extracting Query Features Similarity How to measure ML ML algorithms	Slides: ML-Classification Video:ML Intro Notes: Machine Learning for Text	[CMC] Chapter 9 [MRS] Chapter 13 Andrew Ng Coursera ML course wikipedia: Machine Learning LibLinear library
R 4/4	Machine Learning / Algorithms ML algorithms: Naive Bayes	Slides: Naive Bayes Video: Naive Bayes	wikipedia: Naive Bayes	HW5 due (F-4/12)
	ML / Algorithms / Ranking Text Classification with unigrams Sparse matrix Learning to Rank How/Why Learning Works What to expect LambdaMart Pairwise Models	Slides:Decision Trees \| Regression Video: Decision Trees \| Regression Notes: ML Decision Trees Notes: ML Linear Regression Notes: ML Logistic Regression Cheng's note, sparse format, Learning Code for HW7	Paper: From RankNet to LambdaRank to LambdaMart Wikipedia: Learning To Rank Paper: Yahoo Learning to Rank challenge Paper: AdaBoost and Rankboost Paper: RankNet Paper: LambdaMart, LambdaMart2 LSTM
R 4/11	Guest Lecture Machine Learning overview A day in the life of a Machine Learning Scientist Overview of Machine Learning in industrial research and applications			Demo: HW6 Python \| Java HW6 due (F:4/19)
				Demo - HW7 Python \| Java Bingyu's demo for HW7 HW7 due (F:4/24)