* Schedule and materials subject to change
Date Topics Lecture Reading Assignments & Demos
R   1/11  
  • IR Intro
  • IR intro
  • Queries and documents
  • SE Architecture
  • [CMS] Chapter 2
R    1/18 
  • Retrieval Models
  • Boolean Retrieval
  • TF, tf.idf
R   1/25  
  • Retrieval Models
  • Vector Space Model: cosine
  • AP89 data
  • Query Transformation
  • Query Expansion
  • Relevance Feedback
  • co-occurence; bi-grams
  • [CMS] Chapter 6
  • [MRS] Chapter 9
R   2/1 
  • Retrieval Models: Prob. Models
  • BIM, idf
  • BM25
  • Retrieval Models: LM
  • Language generative Models
  • Query Likelihood
  • Model Divergence
  • Smoothing
  • Relevance Feedback

R    2/8 
  • Indexing: Index pre-processing
  • Stopping, stemming
  • Inverted Index

  • Indexing: Index Construction
  • Inverted Index
  • Index Constrcution
  • Ngrams, Skipgrams, Finding Blurbs
  • Co-occurrance


R    2/15 
  • Indexing: Distributed Indexes
  • Distributed Indexes
  • [CMC] Chapter 5
  • [MRS] Chapter 4
  • Indexing: Query Processing
  • Query Optimization
  • [CMC] Chapter 5
  • [MRS] Chapter 7
R    2/22 
  • Indexing: Storage
  • Compression
  • Zipfs and Heaps laws
  • Index Storage

Notes : Lempel Ziv

  • HW2 due (F-2/23)
  • Crawling
  • Crawling Basics
  • HTTP Links
  • Graph BFS recap
  • Frontier/Queue
  • Duplicates
R   2/29 
  • Crawling
  • Crawling Basics
  • Frontier/Queue
  • Vertical Search
  • [CMC] Chapter 3
  • [MRS] Chapter 20
  • Crawling: Merging
  • Link Graph
  • Freshness vs. Coverage
  • Vertical Search

M-F    3/4 - 3/8  Spring Break - NO CLASS
R    3/14 
  • Link Graph
  • Citation Analysis
  • PageRank
  • Link Graph
  • Page Rank
  • Topical PageRank
  • Hubs / Authorities
R    3/21  
  • Link Graph
  • HITS
  • SALSA


  • HW3 due (F:3/15)
  • IR Evaluation: Measures
  • IR ranking performance:
  • Set measures: Precision, Recall, F1, Accuracy,  ROC, confusion matrix
  • Ranking measures: R-prec, AP, nDCG, Reciprocal Rank

R    3/28  
  • IR Evaluation
  • Relevance Assessments
  • Significance tests
  • Diversity eval using subtopics
  • Assessors, Crowdsourcing, Cost
  • Assessor Interface
  • HW4 due (F-3/29)
  • Machine Learning / Features
  • Document understanding
  • Features
  • Extracting Query Features
  • Similarity
  • How to measure ML
  • ML algorithms

R    4/4 
  • Machine Learning / Algorithms
  • ML algorithms: Naive Bayes

  • HW5 due (F-4/12)
  • ML / Algorithms / Ranking
  • Text Classification with unigrams
  • Sparse matrix
  • Learning to Rank
  • How/Why Learning Works
  • What to expect
  • LambdaMart
  • Pairwise Models

Cheng's note, sparse format, Learning Code for HW7

Paper: From RankNet to LambdaRank to LambdaMart
Wikipedia: Learning To Rank
Paper: Yahoo Learning to Rank challenge

Paper: AdaBoost and  Rankboost
Paper: RankNet
Paper: LambdaMart, LambdaMart2


LSTM
R    4/11 
  • Guest Lecture
  • Machine Learning overview
  • A day in the life of a Machine Learning Scientist
  • Overview of Machine Learning in industrial research and applications