Principles of Machine Learning (STATS 302 / COMPSCI 309)

Fall 2024-2025 / Session 1 (7 weeks, 35 + 8.75 hours)

Course Period: August 19 - October 10, 2024

  • Lectures: Monday / Tuesday / Wednesday / Thursday @ 08:30-09:45 (Classroom: LIB 1123 + Zoom Recording)
  • Recitations / Labs: Tuesday @ 13:15-14:30 (Classroom: LIB 1123 + Zoom Recording)
Instructor: Mustafa MISIR (Office: WDR 2106), mustafa.misir [at] dukekunshan.edu.cn
Recitation / Lab Instructors (Teaching Assistants):
Machine Learning (ML) is a popular field with interdisciplinary characteristics relating to various fields including Computer Science, Mathematics and Statistics. ML aims at learning without being explicitly programmed, through data and experience. The target applications are the complex tasks which are challenging, impractical or unrealistic to program. ML can be used to address those sophisticated activities that humans or animals can routinely do such as speech recognition, image understanding and driving. The other functions to learn that ML concentrates on, are concerned with the ones requiring capabilities beyond human capacities, in terms of speed and memory.

This course will identify the major ML problems while introducing the fundamental ML algorithms to solve them. To be specific, the topics to be covered are maximum likelihood estimation, linear discriminant analysis, logistic regression, support vector machines, decision trees, linear regression, Bayesian inference, unsupervised learning, and semi-supervised learning. The course will require basic programming skills (Python) and introductory level knowledge on probability and statistics besides benefiting from certain linear algebra concepts.

By the end of this course, you will be able to:
  1. specify a given learning task as a ML problem
  2. determine the appropriate ML algorithms for addressing an ML problem
  3. manipulate the given data concerned with a learning task so that the preferred ML algorithm can be effectively applied
  4. construct generalizable ML models that can address a given ML problem of unseen data
  5. analyze the performance of the ML algorithms while revealing their shortcomings referring to the nature of the data
  6. build complete ML workflows in Python together with the relevant libraries / frameworks / tools besides effectively communicating your methods and results using Jupyter notebooks
Follow Canvas for announcements and discussions   |   Academic Calendar


The chart, on the right, shows how STATS 302 / COMPSCI 309 fits to the DKU curriculum, where the abbreviations indicate the course types, i.e. D: Divisional, DF: Divisional Foundation, ID: Interdisciplinary and E: Elective. Refer to the DKU Undergraduate Bulletin (2024-2025) for more details.

Pre-requisites

  • MATH 201: Multivariable Calculus
  • MATH 202: Linear Algebra
  • MATH 205 / 206: Probability and Statistics
  • COMPSCI 201: Introduction to Programming and Data Structures

Anti-requisites

  • MATH 405: Mathematics of Data Analysis and Machine Learning





There is no official textbook for this course. Still, the following books can be used as references.

Reference Books



Lecture Notes / Slides

  • Week 0 (Reviews)

  • Week 1   [19/08 - 22/08]   (Keywords: History, Terminology and Basics; Supervised Learning; Regression Problem; Gradient Descent)

  • REMINDER [22/08, Thursday]: Drop/add ends for first 7-week undergraduate session.    { Source: Academic Calendar }


  • Week 2   [26/08 - 29/08]   (Keywords: Supervised Learning; Classification Problem; Regression Problem)
    • Logistic Regression
      • External Lecture Notes: CS229: Machine Learning (Andrew Ng, Stanford U) [Part II] - Logistic Regression
      • External Video: CS229: Machine Learning (Andrew Ng, Stanford U) - Logistic Regression
    • k-Nearest Neighbors (kNN)
      • External Lecture Notes: CS4780: Machine Learning for Intelligent Systems (Kilian Weinberger, Cornell U) - kNN
      • External Video: CS4780: Machine Learning for Intelligent Systems (Kilian Weinberger, Cornell U) - kNN
    • Naive Bayes
      • Review: Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville, Chapter 3 - Probability
      • Article: Hand, D.J. and Yu, K., 2001. Idiot's Bayes-not so stupid after all?. International statistical review, 69(3), pp.385-398
      • External Lecture Notes: CS4780: Machine Learning for Intelligent Systems (Kilian Weinberger, Cornell U) - Bayes Classifier and Naive Bayes
      • External Video: CS4780: Machine Learning for Intelligent Systems (Kilian Weinberger, Cornell U) - Naive Bayes
    • Recitation / Lab: Pandas, Logistic Regression, kNN, Naive Bayes
    • Homework 1: TBA

  • Week 3   [02/09 - 05/09] Artificial Neural Networks  (Keywords: Supervised Learning; Feed-forward Neural Networks; Regression Problem; Classification Problem)

  • Week 4   [09/09 - 12/09]   (Keywords: Supervised Learning; Regression Problem; Classification Problem)
    • Decision Trees
      • Book Chapter: Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani (2014), Chapter 8 - Tree-based Methods
    • Ensembles: Bagging and Boosting
      • Book Chapter: Pattern Recognition and Machine Learning by Christopher Bishop (2006), Chapter 14 - Combining Models
      • Book Chapter: Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani (2017), Chapter 8.2 - Bagging, Random Forests, Boosting
    • Support Vector Machines (SVM)
      • External Lecture Notes: CS229: Machine Learning (Andrew Ng, Stanford U) [Part VI] - SVM
      • External Video: CS229: Machine Learning (Andrew Ng, Stanford U) - SVM
      • External Lecture Notes: CS4780: Machine Learning for Intelligent Systems (Kilian Weinberger, Cornell U) - SVM
      • External Video: CS4780: Machine Learning for Intelligent Systems (Kilian Weinberger, Cornell U) - SVM
    • Recitation / Lab: Decision Trees, Ensembles, SVM
    • Homework 3: TBA

    MIDTERM EXAM [10/09, Tuesday, 18:30-20:30, LIB 1123]


    HOLIDAY [14/09, Saturday - 17/09, Tuesday]: Mid-Autumn Festival (NO CLASSES) - Continue on [18/09, Wednesday], this week's lectures end on [21/09, Saturday]    { Source: Academic Calendar }


  • Week 5   [18/09 - 21/09]   (Keywords: Unsupervised Learning; Dimensionality Reduction; Clustering: +Evaluation and Analysis)
    • Dimensionality Reduction
    • Principal Component Analysis (PCA)
      • External Lecture Notes: CS229: Machine Learning (Andrew Ng, Stanford U) - PCA
      • External Lecture Notes: CS4786: Machine Learning for Data Science (Karthik Sridharan, Cornell U) - PCA
    • Clustering: k-means and k-medoids
    • Recitation / Lab: PCA, k-means
    • Homework 4: TBA

    REMINDER [21/09, Saturday]: Last day to withdraw with a W grade of first 7-week classes; Last day to change grading basis of first 7-week classes.    { Source: Academic Calendar }


  • Week 6   [23/09 - 26/09]   (Keywords: Graphical Models)
    • Bayesian Networks
    • Markov Random Fields (MRF)
      • Book Chapter: Pattern Recognition and Machine Learning by Christopher Bishop (2006), Chapter 8.3 - Bayesian Networks
      • External Lecture Notes: CS228 - Probabilistic Graphical Models (Stefano Ermon, Stanford U) - MRF
    • Factor Graphs
      • Book Chapter: Pattern Recognition and Machine Learning by Christopher Bishop (2006), Chapter 8.4.3 - Factor Graphs
    • Recitation / Lab: Graphical Models
    • Homework 5: TBA

    HOLIDAY [01/10, Tuesday - 07/10, Monday]: National Day Holiday (NO CLASSES) - Continue on [08/10, Tuesday], this week's lectures end on [10/10, Thursday]    { Source: Academic Calendar }


  • Week 7   [08/10 - 10/10]   (Keywords: Sequential Decision Making)
    • Hidden Markov Models (HMMs)
      • Book Chapter: Pattern Recognition and Machine Learning by Christopher Bishop (2006), Chapter 13 - Sequential Data
      • Book Chapter: Pattern Recognition and Machine Learning by Christopher Bishop (2006), Chapter 13.2 - HMMs
    • Recitation / Lab: HMMs
    • Homework 6: TBA

    PROJECT PRESENTATIONS [10/10, Thursday, 08:30-09:45 (regular lecture time)] (NO LECTURE)

    FINAL EXAM [14/10, Monday, 15:30-18:30, LIB 2123]



Grading

  • Homework: 20%
    • Mathematical, Conceptual, or Programming related
    • Submit on Canvas; 6 in total, the lowest score is dropped
  • Weekly Journal: 10%
    • Each week, write a page or so about what you have learned
    • Submit on Canvas; 2 points off for each missing journal, capped at 10
  • Midterm: 20%
  • Final: 30%
  • Project: 20%
    • Report Rubric (TBA)
    • Presentation Rubric (TBA)


Reference Courses



Sample Projects



Other Books

Quick / Easy Reads:
Python Programming:
Python Programming for Data Science / Analytics:
Data Visualization:

Other Materials / Resources