Midterm Exam Topic List
Note: These topics are also included in the final exam. Additional topics for the final exam can be found here.
Linear Regression
- Problem definition
- Ordinary Least Squares, Pseudo-inverse
- Implementation
- Computational complexity
- Least Mean Squares, Gradient descent
- Implementation
- Effect of step size
- Computational Complexity
- Regularization
- Ridge regression (L2-regularization)
- LASSO (L1-regularization)
- Probabilistic interpretation
- MSE minimization as maximum likelihood estimation
- Regularized regression as maximum a posteriori estimation
- Polynomial / basis function regression
Model Selection
- Curse of Dimensionality
- Number of points in center vs periphery as a function of number of dimensions
- Bias variance trade off
- Overfitting vs Underfitting
- Learning Curve
- Cross validation
- Model evidence / marginal likelihood
Probability
- Measure theory (high level)
- Sample space, event space, probability measures
- Axioms of probability
- Probability mass functions vs probability density functions
- Expected values, Mean, Variance, Covariance
- Joint, conditional and marginal probabilities
- Sum and Product rules
- Bayes Rule
- Conditional independence
- Independent and identically distributed variables
- Central limit theorem
Distributions / Densities
- Bernouilli
- Binomial
- Discrete
- Multinomial
- Normal
- Multivariate Normal
- Spherical
- Diagonal
- General relationship between Σ and shape (in 2D)
- Beta
- Dirichlet
- Conjugacy
- Beta and Binomial
- Dirichlet and Multinomial
- Gaussian with unknown mean and Gaussian
- Entropy
- Mutual Information
- KL Divergence
Data pre-processing
- Normalization via Min-max, Z-score, and Scaling
Distance Metrics
- Euclidean, Manhattan and Minkowski distance
- Humming distance
- Edit distance
- Properties of a distance measure
- Relationship to inner products
Optimization (high level)
- Gradient descent
- Stochastic Gradient descent
- Batch size
- Step size schedule
- Robbins-monro conditions
- Convex Optimization
- Convex sets and functions
- Lagrange duality
- Lagrangian
- Generalized Lagrangian
- Primal problem, dual problem, duality gap
- KKT conditions at optimum, dual complementarity
Kernels (high level)
- Definition of hilbert space and inner product
- Sum rule, product rule, mapping rule, taylor series kernels
- Polynomial, Radial Basis Function, Squared Exponential, Automatic Relevance Determination
Classification
- Problem definition
- k-Nearest Neighbors
- Definition
- Time and Storage complexity
- Performance vs dimensionality of feature space
- Linear classifiers
- Perceptron
- Logistic regression
- Definition
- Gradient of objective
- Loss functions
- Square loss, logistic loss, hinge loss
- Relationship to perceptron, LDA, logistic regression and SVM
- Generative Learning Algorithms
- Naive Bayes
- Maximum likelihood parameter estimation
- Maximum posterior parameter estimation (Smoothing)
- Linear Discriminant Analysis
- Problem definition
- Probabilistic interpretation
- Maximum likelihood estimates for parameters
- Maximum posterior interpretation for class prediciton
- Implicit assumptions about feature distributions and failure modes
- Extension to quadratic discriminant analysis
- Relationship to logistic regression and SVMs
- Similarities and differences
- Advantages and disadvantages
- Support Vector Machines
- Margin of a classification boundary
- Relationship to hinge loss
- Solution as convex optimization problem
- Dual problem, support vectors
- Extension to non-separable data
- Relationship to logistic regression, LDA
- Extension to nonlinear case
- Feature map
- Kernel trick
- Effect on computational complexity
- Dual problem with kernels
- Decision trees
- Algorithm for construction
- Split criteria
- Entropy, information gain
- Gini index
- Bias variance trade-off in relation to depth
- Relationship to k-Nearest Neighbors
- Ensemble methods
- Bagging vs Boosting
- Random Forests
- Gradient Boosting
- Performance measures
- Confusion matrix
- True/False Positives/Negatives
- Decision theoretical interpretation of bias in odds in terms of loss matrix
- Precision, Recall, F1 Score, Specificity
- Precision recall and ROC curve
Clustering
- Hierarchical (linkage-based methods)
- Dendrogram
- Agglomerative vs Divisive Clustering
- Simple, Average and Complete Linkage
- Strengths and weaknesses
- K-means and K-medoids (centroid-based methods)
- Algorithm definition
- Sum of Squared Errors (SSE)
- Time complexity
- Initialization and local minima
- Choosing K, elbow finding
- Limitations: Different sizes, different densities, non-globular shapes
- DBSCAN (density-based methods)
- Method definition
- EPS and MinPts parameters
- Core points, border points, noise points
- Density-reachable points, density-connected points
- Time and space complexity
- Methodology for determining EPS and MinPts
- Strengths and weaknesses
- Gaussian Mixture Models (distribution-based methods)
- Generative model
- Relationship to LDA and QDA
- Expectation Maximization
- Calculation of posterior weights
- Calculation of maximum likelihood parameters
- Lower bound on log likelihood
- Advantages and disadvantages
- Quality Criteria
- External vs Internal
- Mutual information (External)
- Definition
- Minimum, maximum, and properties under label permutation
- Scatter criteria (Internal)
- Within cluster and between cluster covariances
- Trace criteria