CS6200: Information Retrieval
Fall 2015 Syllabus
Return to basic course information.
This schedule is subject to change. Check back as the class progresses.
CMS refers to Search Engines by Croft, Metzler, and Strohman; MRS refers to Introduction to Information Retrieval by Manning, Raghavan, and Schütze.
- Overview of Information Retrieval (10 Sept. 2015)
- Architecture of a Search Engine (10 Sept. 2015)
- Acquiring Data (10, 17 Sept. 2015)
- Reading: CMS chap. 3; MRS chap. 19 and 20
- Crawling the Web
- Document Conversion
- Storing the Documents
- Detecting Duplicates
- Noise Detection and Removal
- Processing Text
- Reading: CMS chap. 4; MRS chap. 2 and 21
- Text Statistics
- Document Parsing
- Tokenizing
- Stopping
- Stemming
- Phrases
- Document Structure
- Link Extraction
- More detail on PageRank
- Feature Extraction and Named Entity Recognition
- Internationalization
- Ranking with Indexes
- Reading: CMS chap. 5; MRS chap. 4-5
- Abstract Model of Ranking
- Inverted indexes
- MapReduce
- Query Processing
- Document-at-a-time evaluation
- Term-at-a-time evaluation
- Optimization techniques
- Structured queries
- Distributed evaluation
- Caching
- Queries and Interfaces
- Reading: CMS chap. 6
- Information Needs and Queries
- Query Transformation and Refinement
- Stopping and Stemming Revisited
- Spell Checking and Query Suggestions
- Query Expansion
- Relevance Feedback
- Context and Personalization
- Displaying the Results
- Result Pages and Snippets
- Advertising and Search
- Clustering the Results
- Translation
- User Behavior Analysis
- Retrieval Models
- Reading: CMS chap. 7; MRS chap. 11-12 and for background chap. 1 and 6
- Overview of Retrieval Models
- Boolean Retrieval
- The Vector Space Model
- Probabilistic Models
- Information Retrieval as Classification
- The BM25 Ranking Algorithm
- Ranking based on Language Models
- Query Likelihood Ranking
- Relevance Models and Pseudo-Relevance Feedback
- Complex Queries and Combining Evidence
- The Inference Network Model
- The Galago Query Language
- Models for Web search
- Machine Learning and Information Retrieval: Learning to Rank (LeToR)
- Topic Models
- Evaluating Search Engines
- Reading: CMS chap. 8; MRS chap. 8
- Test collections
- Query logs
- Effectiveness Metrics
- Recall and Precision
- Averaging and interpolation
- Focusing on the top documents
- Training, Testing, and Statistics
- Significance tests
- Setting parameter values
- Classification and Clustering
(see also further slides
on clustering
and classification)
- Social Search: Networks of People and Search Engines
- User tagging
- Searching within Communities
- Filtering and recommending
- Metasearch
- Beyond Bag of Words
- Feature-Based Retrieval Models
- Term Dependence Models
- Question Answering
- Pictures, Pictures of Words, etc.
- XML Retrieval
- Dimensionality Reduction and LSI
- Review