Introduction to Data Science

The course provides students a hands-on introduction to data science, with applied examples of data collection, processing, transformation, management, and analysis. Students will explore key concepts related to data science, including applied statistics, information visualization, text mining, and machine learning.

R, the open source statistical analysis and visualization system, will be used throughout the course. R is reckoned by many to be the most popular choice among data analysts worldwide; having knowledge and skill with using it is considered a valuable and marketable job skill for most data scientists. 

Students will also learn how to use supervised and unsupervised machine learning techniques. They will focus on structured data, using R (e.g., support vector machines, association rules mining) in conjunction with learning the full life cycle of data science.

Learning Objectives

  • Understand essential concepts and characteristics of data.
  • Perform basic computational scripting using R and other optional tools.
  • Apply scripting/code development for data management using R and RStudio.
  • Comprehend principles and practices in data screening, cleaning, and linking.
  • Communicate results to decision makers.

Tools and Concepts

  • Coding with R
  • Applied statistics
  • Data mapping
  • Linear modeling
  • Information visualization
  • Text mining
  • Machine learning