Midterm Exam Topic List

Note: These topics are also included in the final exam. Additional topics for the final exam can be found here.

Linear Regression

Problem definition
Ordinary Least Squares, Pseudo-inverse
- Implementation
- Computational complexity
Least Mean Squares, Gradient descent
- Implementation
- Effect of step size
- Computational Complexity
Regularization
- Ridge regression (L2-regularization)
  - Closed form solution
- LASSO (L1-regularization)
Probabilistic interpretation
- MSE minimization as maximum likelihood estimation
- Regularized regression as maximum a posteriori estimation
Polynomial / basis function regression

Model Selection

Curse of Dimensionality
- Number of points in center vs periphery as a function of number of dimensions
Bias variance trade off
Overfitting vs Underfitting
Learning Curve
Cross validation
Model evidence / marginal likelihood

Probability

Measure theory (high level)
- Sample space, event space, probability measures
- Axioms of probability
Probability mass functions vs probability density functions
Expected values, Mean, Variance, Covariance
Joint, conditional and marginal probabilities
Sum and Product rules
Bayes Rule
Conditional independence
Independent and identically distributed variables
Central limit theorem

Distributions / Densities

Bernouilli
Binomial
Discrete
Multinomial
Normal
Multivariate Normal
- Spherical
- Diagonal
- General relationship between Σ and shape (in 2D)
Beta
Dirichlet
Conjugacy
- Beta and Binomial
- Dirichlet and Multinomial
- Gaussian with unknown mean and Gaussian

Information Theory

Entropy
Mutual Information
KL Divergence

Data pre-processing

Normalization via Min-max, Z-score, and Scaling

Distance Metrics

Euclidean, Manhattan and Minkowski distance
Humming distance
Edit distance
Properties of a distance measure
- Relationship to inner products

Optimization (high level)

Gradient descent
Stochastic Gradient descent
- Batch size
- Step size schedule
- Robbins-monro conditions
Convex Optimization
- Convex sets and functions
- Lagrange duality
  - Lagrangian
  - Generalized Lagrangian
  - Primal problem, dual problem, duality gap
  - KKT conditions at optimum, dual complementarity

Kernels (high level)

Definition of hilbert space and inner product
Sum rule, product rule, mapping rule, taylor series kernels
Polynomial, Radial Basis Function, Squared Exponential, Automatic Relevance Determination

Classification

Problem definition
k-Nearest Neighbors
- Definition
- Time and Storage complexity
- Performance vs dimensionality of feature space
Linear classifiers
- Perceptron
  - Definition
  - Failure modes
- Logistic regression
  - Definition
  - Gradient of objective
- Loss functions
  - Square loss, logistic loss, hinge loss
  - Relationship to perceptron, LDA, logistic regression and SVM
Generative Learning Algorithms
- Naive Bayes
  - Maximum likelihood parameter estimation
    - Failure modes
  - Maximum posterior parameter estimation (Smoothing)
- Linear Discriminant Analysis
  - Problem definition
  - Probabilistic interpretation
    - Maximum likelihood estimates for parameters
    - Maximum posterior interpretation for class prediciton
  - Implicit assumptions about feature distributions and failure modes
  - Extension to quadratic discriminant analysis
  - Relationship to logistic regression and SVMs
    - Similarities and differences
    - Advantages and disadvantages
- Support Vector Machines
  - Margin of a classification boundary
    - Relationship to hinge loss
  - Solution as convex optimization problem
    - Dual problem, support vectors
  - Extension to non-separable data
  - Relationship to logistic regression, LDA
  - Extension to nonlinear case
    - Feature map
    - Kernel trick
      - Effect on computational complexity
    - Dual problem with kernels
Decision trees
- Algorithm for construction
- Split criteria
  - Entropy, information gain
  - Gini index
- Bias variance trade-off in relation to depth
- Relationship to k-Nearest Neighbors
Ensemble methods
- Bagging vs Boosting
- Random Forests
  - Definition
  - Parameters
- Gradient Boosting
  - Definition
  - Parameters
Performance measures
- Confusion matrix
- True/False Positives/Negatives
Decision theoretical interpretation of bias in odds in terms of loss matrix
Precision, Recall, F1 Score, Specificity
Precision recall and ROC curve

Clustering

Hierarchical (linkage-based methods)
- Dendrogram
- Agglomerative vs Divisive Clustering
- Simple, Average and Complete Linkage
- Strengths and weaknesses
K-means and K-medoids (centroid-based methods)
- Algorithm definition
- Sum of Squared Errors (SSE)
- Time complexity
- Initialization and local minima
- Choosing K, elbow finding
- Limitations: Different sizes, different densities, non-globular shapes
DBSCAN (density-based methods)
- Method definition
  - EPS and MinPts parameters
  - Core points, border points, noise points
  - Density-reachable points, density-connected points
- Time and space complexity
- Methodology for determining EPS and MinPts
- Strengths and weaknesses
Gaussian Mixture Models (distribution-based methods)
- Generative model
  - Relationship to LDA and QDA
- Expectation Maximization
  - Calculation of posterior weights
  - Calculation of maximum likelihood parameters
  - Lower bound on log likelihood
- Advantages and disadvantages
Quality Criteria
- External vs Internal
- Mutual information (External)
  - Definition
  - Minimum, maximum, and properties under label permutation
- Scatter criteria (Internal)
  - Within cluster and between cluster covariances
  - Trace criteria