Introduction to Data Science (STATS 102)

Spring 2021-2022 / Session 3 (7 weeks, 35 + 8.75 | 8.75 hours)

Course Period: January 10 - March 3, 2022

  • Lectures: Monday / Tuesday / Wednesday / Thursday @ 11:45-13:00 (Classroom: AB 2107 + Zoom)
  • Labs: Thursday @ 14:45-16:00 (102-001L); 16:15-17:30 (102-002L) (Classroom: AB 2107 + Zoom)
Instructor: Mustafa MISIR (Office: CC 3019), mustafa.misir [at] dukekunshan.edu.cn   /   mm940 [at] duke.edu
Teaching Assistant: Shiyi Liu (Office: TBA), shiyi.liu [at] duke.edu

As an introductory course in Data Science, this course will show you not only the big picture of data science but also the detailed essential skills of loading, cleaning, manipulating, visualizing, analyzing and interpreting data with hands-on programming experience. You will be able to adopt the techniques and tools to identify, formulate and solve some simple practical data analysis problems. This course is an elective course open to everyone, and no specific prerequisite required. Successfully completing this course will also bring new concepts and tools to other future courses in social science, arts humanities and natural science.

Despite the detailed knowledge and skills about data science, we also want you to develop your high level capabilities closely related to the DKU's animating principles, for instance, collaborative problem-solving, research and practice, lucid communication, etc. In this course, there are group activities and group projects that encourage collaborative problem solving. Also, there are lab sessions each week to encourage group discussion on mini-projects. In the final project, you will have the opportunity to collect real data from the practical world related to a topic of interest, and apply the knowledge and skills to handle data. Moreover, these group discussion and project presentation activities will help you enhance the capability of lucid communication.

By the end of this course, you will be able to:
  1. manage fundamental Python programming techniques
  2. employ frequently used Python libraries, such as NumPy, SciPy, Pandas, Matplotlib, scikit-learn
  3. design project topics and collect valid data for that topic
  4. apply simple data analysis methods on real data
  5. evaluate whether the selected methods and experimental results are reasonable
  6. plan and manage the progress of final project in an efficient way
  7. develop written and oral presentation skills
>> Follow Sakai for announcements and discussions

Pre-requisites

  • None. Not open to students who have credit for COMPSCI 101: Introduction to Computer Science

The following chart shows how STATS 102 fits to the DKU curriculum, where the abbreviations indicate the course types, i.e. D: Divisional, DF: Divisional Foundation, ID: Interdisciplinary and E: Elective. Refer to the DKU Undergraduate Bulletin for more details.




Reference Books

There is no official textbook for this course. Still, the following books can be used as references.

Python Programming:
Python Programming for Data Science / Analytics:
Probability & Statistics:

Lecture Notes / Slides



Grading

  • Homework: 20%
    • Mathematical, Conceptual, or Programming related
    • Submit on Sakai; 6 in total, the lowest score is dropped
  • Weekly Journal: 10%
    • Each week, write a page or so about what you have learned
    • Submit on Sakai; 2 points off for each missing journal, capped at 10
  • Midterm: 20%
  • Final: 30%
  • Project: 20%
    • Report Rubrick (TBA)
    • Presentation Rubrick (TBA)


Reference Courses


Machine Learning:
Introductary Computer Science / (Python) Programming:

Other Books / Articles

Quick / Easy Reads:
Data Science with Other Programming Languages:
Machine / Statistical Learning & Data Mining:
Data Visualization:
Data Science / Machine Learning for Specific Application Domains:
Software Engineering / Development / Programming:
Computing / Computers + History:

Other Materials / Resources