* Schedule and materials subject to change
Date Topics Lecture Reading Assignments & Demos
M   1/8  
  • IR Intro
  • IR intro
  • Queries and documents
R   1/11  
  • SE Architecture
  • [CMS] Chapter 2
M    1/15  Martin Luther King Jr Day - NO CLASS
R    1/18  Lab Day - Installing/Setting up Elastic Search and Kibana
M   1/22 
  • Retrieval Models
  • Boolean Retrieval
  • TF, tf.idf
R   1/25  
  • Retrieval Models
  • Vector Space Model: cosine
  • AP89 data
M   1/29  
  • Query Transformation
  • Query Expansion
  • Relevance Feedback
  • co-occurence; bi-grams
  • [CMS] Chapter 6
  • [MRS] Chapter 9
R   2/1 
  • Retrieval Models: Prob. Models
  • BIM, idf
  • BM25
M    2/5  
  • Retrieval Models: LM
  • Language generative Models
  • Query Likelihood
  • Model Divergence
  • Smoothing
  • Relevance Feedback

R    2/8 
  • Indexing: Index pre-processing
  • Stopping, stemming
  • Inverted Index

M    2/12 
  • Indexing: Index Construction
  • Inverted Index
  • Index Constrcution
  • Ngrams, Skipgrams, Finding Blurbs
  • Co-occurrance


R    2/15 
  • Indexing: Distributed Indexes
  • Distributed Indexes
  • [CMC] Chapter 5
  • [MRS] Chapter 4
M    2/19  Presidents' Day - NO CLASS
R    2/22 
  • Indexing: Query Processing
  • Query Optimization
  • [CMC] Chapter 5
  • [MRS] Chapter 7
M    2/26  
  • Indexing: Storage
  • Compression
  • Zipfs and Heaps laws
  • Index Storage

Notes : Lempel Ziv

R   2/29 
  • Crawling
  • Crawling Basics
  • HTTP Links
  • Graph BFS recap
  • Frontier/Queue
  • Duplicates
  • HW2 due (F-3/1)
M-F    3/4 - 3/8  Spring Break - NO CLASS
M   2/11 
  • Crawling
  • Crawling Basics
  • Frontier/Queue
  • Vertical Search
  • [CMC] Chapter 3
  • [MRS] Chapter 20
R    3/14 
  • Crawling: Merging
  • Link Graph
  • Freshness vs. Coverage
  • Vertical Search

M    3/18 
  • Link Graph
  • Citation Analysis
  • PageRank
R    3/21 
  • Link Graph
  • Page Rank
  • Topical PageRank
  • Hubs / Authorities
M    3/25  
  • Link Graph
  • HITS
  • SALSA


    HW3 due (W:3/27)
R    3/28  
  • IR Evaluation: Measures
  • IR ranking performance:
  • Set measures: Precision, Recall, F1, Accuracy,  ROC, confusion matrix
  • Ranking measures: R-prec, AP, nDCG, Reciprocal Rank

M    4/1  
  • IR Evaluation
  • Relevance Assessments
  • Significance tests
  • Diversity eval using subtopics
  • Assessors, Crowdsourcing, Cost
  • Assessor Interface
    HW4 due (W-4/3)
R    4/4 
  • Machine Learning / Features
  • Document understanding
  • Features
  • Extracting Query Features
  • Similarity
  • How to measure ML
  • ML algorithms

  • HW5 due (F-4/10)
M    4/8 
  • ML / Algorithms / Ranking
  • ML algorithms: Naive Bayes
  • Text Classification with unigrams
  • Sparse matrix
  • Learning to Rank
  • How/Why Learning Works
  • What to expect
  • LambdaMart
  • Pairwise Models

Cheng's note, sparse format, Learning Code for HW7

Paper: From RankNet to LambdaRank to LambdaMart
Wikipedia: Learning To Rank
Paper: Yahoo Learning to Rank challenge

Paper: AdaBoost and  Rankboost
Paper: RankNet
Paper: LambdaMart, LambdaMart2


LSTM
R    4/11 
  • Guest Lecture
  • Machine Learning overview
  • A day in the life of a Machine Learning Scientist
  • Overview of Machine Learning in industrial research and applications

M    4/15  Patriots Day - NO CLASS