CS6200: Information Retrieval
Fall 2013 Syllabus
Return to basic course information.
This schedule is subject to change. Check back as the class progresses.
CMS refers to Search Engines by Croft, Metzler, and Strohman; MRS refers to Introduction to Information Retrieval by Manning, Raghavan, and Schütze.
- Overview of Information Retrieval (5 Sept. 2013)
- Architecture of a Search Engine (5 Sept. 2013)
- Acquiring Data (12 Sept. 2013)
- Reading: CRS chap. 3; MRS chap. 19 and 20
- Crawling the Web
- Document Conversion
- Storing the Documents
- Detecting Duplicates
- Noise Detection and Removal
- Processing Text (12, 19 Sept. 2013)
- Reading: CRS chap. 4; MRS chap. 2 and 21
- Text Statistics
- Document Parsing
- Ranking with Indexes (26 Sept.)
- Reading: CRS chap. 5; MRS chap. 4-5
- Abstract Model of Ranking
- Inverted indexes
- MapReduce
- Query Processing
- Document-at-a-time evaluation
- Term-at-a-time evaluation
- Optimization techniques
- Structured queries
- Distributed evaluation
- Caching
- Queries and Interfaces (11 Oct.)
- Reading: CRS chap. 6
- Information Needs and Queries
- Query Transformation and Refinement
- Stopping and Stemming Revisited
- Spell Checking and Query Suggestions
- Query Expansion
- Relevance Feedback
- Context and Personalization
- Displaying the Results
- Result Pages and Snippets
- Advertising and Search
- Clustering the Results
- Translation
- User Behavior Analysis
- Retrieval Models (10, 17 Oct.)
- Reading: CRS chap. 7; MRS chap. 11-12 and for background chap. 1 and 6
- Overview of Retrieval Models
- Boolean Retrieval
- The Vector Space Model
- Probabilistic Models
- Information Retrieval as Classification
- The BM25 Ranking Algorithm
- Ranking based on Language Models
- Query Likelihood Ranking
- Relevance Models and Pseudo-Relevance Feedback
- Complex Queries and Combining Evidence
- The Inference Network Model
- The Galago Query Language
- Models for Web search
- Machine Learning and Information Retrieval
- Evaluating Search Engines (24, 31 Oct.)
- Test collections
- Query logs
- Effectiveness Metrics
- Recall and Precision
- Averaging and interpolation
- Focusing on the top documents
- Training, Testing, and Statistics
- Significance tests
- Setting parameter values