CS6200: Information Retrieval

Fall 2015 Syllabus

This schedule is subject to change. Check back as the class progresses.

CMS refers to Search Engines by Croft, Metzler, and Strohman; MRS refers to Introduction to Information Retrieval by Manning, Raghavan, and Schütze.

Overview of Information Retrieval (10 Sept. 2015)
- Reading: CMS chap. 1
Architecture of a Search Engine (10 Sept. 2015)
- Reading: CMS ch. 2
Acquiring Data (10, 17 Sept. 2015)
- Reading: CMS chap. 3; MRS chap. 19 and 20
- Crawling the Web
- Document Conversion
- Storing the Documents
- Detecting Duplicates
- Noise Detection and Removal
Processing Text
- Reading: CMS chap. 4; MRS chap. 2 and 21
- Text Statistics
- Document Parsing
Ranking with Indexes
- Reading: CMS chap. 5; MRS chap. 4-5
- Abstract Model of Ranking
- Inverted indexes
- MapReduce
- Query Processing
  - Document-at-a-time evaluation
  - Term-at-a-time evaluation
  - Optimization techniques
  - Structured queries
  - Distributed evaluation
  - Caching
Queries and Interfaces
- Reading: CMS chap. 6
- Information Needs and Queries
- Query Transformation and Refinement
  - Stopping and Stemming Revisited
  - Spell Checking and Query Suggestions
  - Query Expansion
  - Relevance Feedback
  - Context and Personalization
- Displaying the Results
  - Result Pages and Snippets
  - Advertising and Search
  - Clustering the Results
  - Translation
- User Behavior Analysis
Retrieval Models
- Reading: CMS chap. 7; MRS chap. 11-12 and for background chap. 1 and 6
- Overview of Retrieval Models
  - Boolean Retrieval
  - The Vector Space Model
- Probabilistic Models
  - Information Retrieval as Classification
  - The BM25 Ranking Algorithm
- Ranking based on Language Models
  - Query Likelihood Ranking
  - Relevance Models and Pseudo-Relevance Feedback
- Complex Queries and Combining Evidence
  - The Inference Network Model
  - The Galago Query Language
- Models for Web search
- Machine Learning and Information Retrieval: Learning to Rank (LeToR)
- Topic Models
Evaluating Search Engines
- Reading: CMS chap. 8; MRS chap. 8
- Test collections
- Query logs
- Effectiveness Metrics
  - Recall and Precision
  - Averaging and interpolation
  - Focusing on the top documents
- Training, Testing, and Statistics
  - Significance tests
  - Setting parameter values
Classification and Clustering (see also further slides on clustering and classification)
Social Search: Networks of People and Search Engines
- User tagging
- Searching within Communities
- Filtering and recommending
- Metasearch
Beyond Bag of Words
- Feature-Based Retrieval Models
- Term Dependence Models
- Question Answering
- Pictures, Pictures of Words, etc.
- XML Retrieval
- Dimensionality Reduction and LSI
Review