In this Specialization, you will learn to analyze and visualize data in R and create reproducible data analysis reports, demonstrate a conceptual understanding of the unified nature of statistical inference, perform frequentist and Bayesian statistical inference and modeling to understand natural phenomena and make data-based decisions, communicate statistical results correctly, effectively, and in context without relying on statistical jargon, critique data-based claims and evaluated data-based decisions, and wrangle and visualize data with R packages for data analysis.
You will produce a portfolio of data analysis projects from the Specialization that demonstrates mastery of statistical data analysis from exploratory analysis to inference to modeling, suitable for applying for statistical analysis or data scientist positions.
Introduction to Probability and Data
This course introduces you to sampling and exploring data, as well as basic probability theory and Bayes' rule. You will examine various types of sampling methods, and discuss how such methods can impact the scope of inference. A variety of exploratory data analysis techniques will be covered, including numeric summary statistics and basic data visualization. You will be guided through installing and using R and RStudio (free statistical software), and will use this software for lab exercises and a final project. The concepts and techniques in this course will serve as building blocks for the inference and modeling courses in the Specialization.
This course covers commonly used statistical inference methods for numerical and categorical data. You will learn how to set up and perform hypothesis tests, interpret p-values, and report the results of your analysis in a way that is interpretable for clients or the public. Using numerous data examples, you will learn to report estimates of quantities in a way that expresses the uncertainty of the quantity of interest. You will be guided through installing and using R and RStudio (free statistical software), and will use this software for lab exercises and a final project. The course introduces practical tools for performing data analysis and explores the fundamental concepts necessary to interpret and report results for both categorical and numerical data
Linear Regression and Modeling
This course introduces simple and multiple linear regression models. These models allow you to assess the relationship between variables in a data set and a continuous response variable. Is there a relationship between the physical attractiveness of a professor and their student evaluation scores? Can we predict the test score for a child based on certain characteristics of his or her mother? In this course, you will learn the fundamental theory behind linear regression and, through data examples, learn to fit, examine, and utilize regression models to examine relationships between multiple variables, using the free statistical software R and RStudio.
This course describes Bayesian statistics, in which one's inferences about parameters or hypotheses are updated as evidence accumulates. You will learn to use Bayes’ rule to transform prior probabilities into posterior probabilities, and be introduced to the underlying theory and perspective of the Bayesian paradigm. The course will apply Bayesian methods to several practical problems, to show end-to-end Bayesian analyses that move from framing the question to building models to eliciting prior probabilities to implementing in R (free statistical software) the final posterior distribution. Additionally, the course will introduce credible regions, Bayesian comparisons of means and proportions, Bayesian regression and inference using multiple models, and discussion of Bayesian prediction. We assume learners in this course have background knowledge equivalent to what is covered in the earlier three courses in this specialization: "Introduction to Probability and Data," "Inferential Statistics," and "Linear Regression and Modeling."
Statistics with R Capstone
The capstone project will be an analysis using R that answers a specific scientific/business question provided by the course team. A large and complex dataset will be provided to learners and the analysis will require the application of a variety of methods and techniques introduced in the previous courses, including exploratory data analysis through data visualization and numerical summaries, statistical inference, and modeling as well as interpretations of these results in the context of the data and the research question. The analysis will implement both frequentist and Bayesian techniques and discuss in context of the data how these two approaches are similar and different, and what these differences mean for conclusions that can be drawn from the data. A sampling of the final projects will be featured on the Duke Statistical Science department website. Note: Only learners who have passed the four previous courses in the specialization are eligible to take the Capstone.
Basic math, no programming experience required. A genuine interest in data analysis is a plus!
In the later courses in the Specialization, we assume knowledge and skills equivalent to those which would have been gained in the prior courses (for example: if you decide to take course four, Bayesian Statistics, without taking the prior three courses we assume you have knowledge of frequentist statistics and R equivalent to what is taught in the first three courses).