STA 160 Practice in Statistical Data Science

Units: 4

Lecture: 3 hours
Discussion: 1 hour

Catalog Description:
Principles and practice of interdisciplinary, collaborative data analysis; complete case study review and team data analysis project.

Prerequisite: course 106; course 108; course 130B or course 131B; course 141 or course 141A.

This course serves as a capstone course in which the students focus on the practice of data analysis, and both statistical and computational reasoning. They work on all steps in the data pipeline and workflow to get authentic experience in analyzing and working with data.

Summary of course contents:
Students will work in groups of 3 - 4 members on a data analysis project. They will:

a) frame the question and possible approaches,
b) acquire data (if necessary),
c) clean and explore the data,
d) use appropriate statistical and machine learning methods to effectively answer the question(s), and
e) prepare a technical report & presentation (for a non-statistical audience) detailing the conclusions and insights, potential shortcomings/issues, and possible alternative approaches and directions.

The instructor will provide/select/approve the projects. Sample problems may be adapted from journal papers, activities in previous versions of 260, and the consulting activities of both the department’s StatLab and the campus’ Data Science Initiative. Also, similar to 260 and ECS193A,B, instructors can solicit problems from researchers on campus. Two or more different groups may work independently on the same project. Students will be introduced to the projects at the start of the course. The first 4-5 weeks of the course will involve studying sample case studies in statistical data science. These will illustrate all of the steps a) through e) above and prepare the students for working on the project. The instructor may also use the lectures to introduce new statistical methods that occur in the case studies or multiple team projects. The course will also discuss technical writing. Students will be encouraged to use best practices such as version control and reproducible computations (e.g., using knitR or iPython notebooks).


Illustrative reading:

  • Statistics: a Guide to the Unknown, edited by Peck, Casella, Cobb, Hoerl, Nolan. 2005
  • Stat Labs: Mathematical Statistics Through Applications, Nolan & Speed, 2001
  • Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving, Nolan & Temple Lang, 2014


Potential Overlap:
This course has some similarity to course 260, but is at the undergraduate level.

First offered Spring 2017.