Elements of Machine Learning (COMPSCI 309)

Spring 2023-2024 / Session 4 (7 weeks, 35 hours)

Course Period: March 18 - May 3, 2024

  • Lectures: Tuesday / Thursday @ 14:45-17:15 (Classroom: IB 1056 + Zoom)
Instructor: Mustafa MISIR (Office: WDR 2106), mustafa.misir [at] dukekunshan.edu.cn


Machine Learning (ML) is a popular field with interdisciplinary characteristics relating to various fields including Computer Science, Mathematics and Statistics. ML aims at learning without being explicitly programmed, through data and experience. The target applications are the complex tasks which are challenging, impractical or unrealistic to program. ML can be used to address those sophisticated activities that humans or animals can routinely do such as speech recognition, image understanding and driving. The other functions to learn that ML concentrates on, are concerned with the ones requiring capabilities beyond human capacities, in terms of speed and memory.

This course covers theoretical and practical issues in modern machine learning techniques. Topics considered include statistical foundations, supervised and unsupervised learning, decision trees, hidden Markov models, neural networks, and reinforcement learning. The course will require basic programming skills (Python) and introductory level knowledge on calculus, probability and statistics besides benefiting from certain linear algebra concepts.

By the end of this course, you will be able to:
  1. specify a given learning task as a ML problem
  2. determine the appropriate ML algorithms for addressing an ML problem
  3. manipulate the given data concerned with a learning task so that the preferred ML algorithm can be effectively applied
  4. construct generalizable ML models that can address a given ML problem of unseen data
  5. analyze the performance of the ML algorithms while revealing their shortcomings referring to the nature of the data
  6. build complete ML workflows in Python together with the relevant libraries / frameworks / tools besides effectively communicating your methods and results using Jupyter notebooks
Follow Canvas and Ed Discussions for announcements and discussions.


The chart, on the right, shows how COMPSCI 309 fits to the DKU curriculum, where the abbreviations indicate the course types, i.e. D: Divisional, DF: Divisional Foundation, ID: Interdisciplinary and E: Elective. Refer to the DKU Undergraduate Bulletin for more details.

Pre-requisites

  • MATH 201: Multivariable Calculus
  • MATH 202: Linear Algebra
  • MATH 205 / 206: Probability and Statistics
  • COMPSCI 201: Introduction to Programming and Data Structures

Anti-requisites

  • MATH 405: Mathematics of Data Analysis and Machine Learning
  • STATS 302: Principles of Machine Learning





There is no official textbook for this course. Still, the following books can be used as references.

Reference Books



Lecture Notes / Slides

  • Week 0 (Reviews)

  • Week 1   [18/03 - 21/03]   (Keywords: History, Terminology and Basics; Supervised Learning; Regression Problem; Gradient Descent)

  • Week 2   [25/03 - 28/03]   (Keywords: Supervised Learning; Classification Problem; Regression Problem)
    • Logistic Regression
      • External Lecture Notes: CS229: Machine Learning (Andrew Ng, Stanford U) [Part II] - Logistic Regression
      • External Video: CS229: Machine Learning (Andrew Ng, Stanford U) - Logistic Regression
    • k-Nearest Neighbors (kNN)
      • External Lecture Notes: CS4780: Machine Learning for Intelligent Systems (Kilian Weinberger, Cornell U) - kNN
      • External Video: CS4780: Machine Learning for Intelligent Systems (Kilian Weinberger, Cornell U) - kNN
    • Naive Bayes
      • Review: Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville, Chapter 3 - Probability
      • Article: Hand, D.J. and Yu, K., 2001. Idiot's Bayes-not so stupid after all?. International statistical review, 69(3), pp.385-398
      • External Lecture Notes: CS4780: Machine Learning for Intelligent Systems (Kilian Weinberger, Cornell U) - Bayes Classifier and Naive Bayes
      • External Video: CS4780: Machine Learning for Intelligent Systems (Kilian Weinberger, Cornell U) - Naive Bayes
    • Recitation / Lab: Pandas, Logistic Regression, kNN, Naive Bayes
    • Homework 1: TBA

    HOLIDAY [04/04, Thursday]: Qing Ming - Tomb Sweeping Day (NO CLASSES) - Continue on [05/04, Friday]


  • Week 3   [01/04 - 05/04; 04/04 lecture is moved to 05/04] Artificial Neural Networks  (Keywords: Supervised Learning; Feed-forward Neural Networks; Regression Problem; Classification Problem)

  • Week 4   [08/04 - 11/04]   (Keywords: Supervised Learning; Regression Problem; Classification Problem)
    • Decision Trees
      • Book Chapter: Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani (2014), Chapter 8 - Tree-based Methods
    • Ensembles: Bagging and Boosting
      • Book Chapter: Pattern Recognition and Machine Learning by Christopher Bishop (2006), Chapter 14 - Combining Models
      • Book Chapter: Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani (2017), Chapter 8.2 - Bagging, Random Forests, Boosting
    • Support Vector Machines (SVM)
      • External Lecture Notes: CS229: Machine Learning (Andrew Ng, Stanford U) [Part VI] - SVM
      • External Video: CS229: Machine Learning (Andrew Ng, Stanford U) - SVM
      • External Lecture Notes: CS4780: Machine Learning for Intelligent Systems (Kilian Weinberger, Cornell U) - SVM
      • External Video: CS4780: Machine Learning for Intelligent Systems (Kilian Weinberger, Cornell U) - SVM
    • Recitation / Lab: Decision Trees, Ensembles, SVM
    • Homework 3: TBA
    • MIDTERM (Date: TBA)

  • Week 5   [15/04 - 18/04]   (Keywords: Unsupervised Learning; Dimensionality Reduction; Clustering (+ Evaluation and Analysis))
    • Dimensionality Reduction
    • Principal Component Analysis (PCA)
      • External Lecture Notes: CS229: Machine Learning (Andrew Ng, Stanford U) - PCA
      • External Lecture Notes: CS4786: Machine Learning for Data Science (Karthik Sridharan, Cornell U) - PCA
    • Clustering: k-means and k-medoids
    • Recitation / Lab: PCA, k-means
    • Homework 4: TBA

  • Week 6   [22/04 - 25/04]   (Keywords: Sequential Data Models)
    • Markov Chains (MCs)
      • Book Chapter: Pattern Recognition and Machine Learning by Christopher Bishop (2006), Chapter 13.1 - Markov Models
      • Book Chapter: Speech and Language Processing by Dan Jurafsky, James H. Martin (2023), Chapter 8.4.1 - Markov Chains
    • Hidden Markov Models (HMMs)
      • Book Chapter: Pattern Recognition and Machine Learning by Christopher Bishop (2006), Chapter 13.2 - HMMs
      • Book Chapter: Speech and Language Processing by Dan Jurafsky, James H. Martin (2023), Chapter 8.4.2 - The Hidden Markov Model
    • Markov Decision Processes (MDPs)
    • Recitation / Lab: HMMs
    • Homework 5: TBA

    HOLIDAY [01/05, Wednesday]: International Labor Day


  • Week 7   [29/04 - 02/05]   (Keywords: Reinforcement Learning)
    • Reinforcement Learning (RL)
      • Book Chapter: Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto (2020), Chapter 1 - Introduction
      • Book Chapter: Applying Reinforcement Learning on Real-World Data with Practical Examples in Python by P Osborne, K Singh, ME Taylor (2022), Chapter 1 - Basics and Definitions
      • External Video: CS234: Reinforcement Learning (Emma Brunskill, Stanford U) - Introduction [ Lecture Slides ]
    • Value and Policy Iterations
      • Book Chapter: Applying Reinforcement Learning on Real-World Data with Practical Examples in Python by P Osborne, K Singh, ME Taylor (2022), Chapter 2.2.2 - Policy Improvement
    • Model-based RL
      • Book Chapter: Applying Reinforcement Learning on Real-World Data with Practical Examples in Python by P Osborne, K Singh, ME Taylor (2022), Chapter 2.2 - Model-based Methods
    • Value Function Approximation
    • Recitation / Lab: RL
    • Homework 6: TBA
    • Project Presentations (Date: TBA)
    • FINAL (Date: TBA)


Grading

  • Homework: 20%
    • Mathematical, Conceptual, or Programming related
    • Submit on Sakai; 6 in total, the lowest score is dropped
  • Weekly Journal: 10%
    • Each week, write a page or so about what you have learned
    • Submit on Sakai; 2 points off for each missing journal, capped at 10
  • Midterm: 20%
  • Final: 30%
  • Project: 20%
    • Report Rubrick (TBA)
    • Presentation Rubrick (TBA)


Reference Courses



Sample Projects



Other Books

Quick / Easy Reads:
Python Programming:
Python Programming for Data Science / Analytics:
Data Visualization:

Other Materials / Resources