IS4200/CS6200: Information Retrieval
Fall 2012 Syllabus
Return to basic course information.
This schedule is subject to change. Check back as the class progresses.
CMS refers to Search Engines by Croft, Metzler, and Strohman; MRS refers to Introduction to Information Retrieval by Manning, Raghavan, and Schütze.
- Overview of Information Retrieval (6 Sept. 2012)
- Architecture of a Search Engine (6 & 13 Sept. 2012)
- Acquiring Data (13 Sept. 2012)
- Reading: CRS chap. 3; MRS chap. 19 and 20
- Crawling the Web
- Document Conversion
- Storing the Documents
- Detecting Duplicates
- Noise Detection and Removal
- Processing Text (20 Sept. 2012)
- Reading: CRS chap. 4; MRS chap. 2 and 21
- Text Statistics
- Document Parsing
- Ranking with Indexes (4 Oct.)
- Abstract Model of Ranking
- Inverted indexes
- MapReduce
- Query Processing
- Document-at-a-time evaluation
- Term-at-a-time evaluation
- Optimization techniques
- Structured queries
- Distributed evaluation
- Caching
- Queries and Interfaces (11 Oct.)
- Information Needs and Queries
- Query Transformation and Refinement
- Stopping and Stemming Revisited
- Spell Checking and Query Suggestions
- Query Expansion
- Relevance Feedback
- Context and Personalization
- Displaying the Results
- Result Pages and Snippets
- Advertising and Search
- Clustering the Results
- Translation
- User Behavior Analysis
- Retrieval Models (18, 25 Oct.)
- Overview of Retrieval Models
- Boolean Retrieval
- The Vector Space Model
- Probabilistic Models
- Information Retrieval as Classification
- The BM25 Ranking Algorithm
- Ranking based on Language Models
- Query Likelihood Ranking
- Relevance Models and Pseudo-Relevance Feedback
- Complex Queries and Combining Evidence
- The Inference Network Model
- The Galago Query Language
- Models for Web search
- Machine Learning and Information Retrieval
- Evaluating Search Engines (25 Oct.)
- Test collections
- Query logs
- Effectiveness Metrics
- Recall and Precision
- Averaging and interpolation
- Focusing on the top documents
- Training, Testing, and Statistics
- Significance tests
- Setting parameter values
- Classification and Clustering
(see also last term's further slides
on clustering
and classification) (1 Nov.)
- User Modeling (8 Nov.)
- Social Search (8 Nov.)
- User tagging
- Searching within Communities
- Filtering and recommending
ond Bag of Words (15 Nov.)
- Feature-Based Retrieval Models
- Term Dependence Models
- Question Answering
- Pictures, Pictures of Words, etc.
- XML Retrieval, Distributed IR, and Metasearch (29 Nov.)