### 课程概况

这门课将为你介绍作为理解和分析数据的科学的统计学。你将学到如何在不确定性下高效使用数据，包括：如何采集数据，如何分析数据，如何使用数据来推断现实世界现象并作出结论。

这门课的目标如下：

认识到数据采集的重要性，了解各种数据采集方法的局限性，确定它们如何影响推断。

使用统计软件（R），在数值上和图像上概括数据并进行数据分析。

概念上理解统计推断的统一性。

运用估计和检验方法（置信区间和假设检验）来分析单一变量以及两变量之间的关系，以此理解自然现象并作出基于数据的决策。

在回归框架中建模并探索两个变量或更多变量之间的关系。

不依赖统计术语正确有效地解释结果。

批判基于数据的论断并评估基于数据的决策。

完成一个使用简单统计推断和建模技术的研究项目。

In this Specialization, you will learn to analyze and visualize data in R and create reproducible data analysis reports, demonstrate a conceptual understanding of the unified nature of statistical inference, perform frequentist and Bayesian statistical inference and modeling to understand natural phenomena and make data-based decisions, communicate statistical results correctly, effectively, and in context without relying on statistical jargon, critique data-based claims and evaluated data-based decisions, and wrangle and visualize data with R packages for data analysis.

You will produce a portfolio of data analysis projects from the Specialization that demonstrates mastery of statistical data analysis from exploratory analysis to inference to modeling, suitable for applying for statistical analysis or data scientist positions.

### 你将学到什么

Bayesian Statistics

Linear Regression

Statistical Inference

R Programming

### 包含课程

课程1

Introduction to Probability and Data

This course introduces you to sampling and exploring data, as well as basic probability theory and Bayes' rule. You will examine various types of sampling methods, and discuss how such methods can impact the scope of inference. A variety of exploratory data analysis techniques will be covered, including numeric summary statistics and basic data visualization. You will be guided through installing and using R and RStudio (free statistical software), and will use this software for lab exercises and a final project. The concepts and techniques in this course will serve as building blocks for the inference and modeling courses in the Specialization.

课程2

Inferential Statistics

This course covers commonly used statistical inference methods for numerical and categorical data. You will learn how to set up and perform hypothesis tests, interpret p-values, and report the results of your analysis in a way that is interpretable for clients or the public. Using numerous data examples, you will learn to report estimates of quantities in a way that expresses the uncertainty of the quantity of interest. You will be guided through installing and using R and RStudio (free statistical software), and will use this software for lab exercises and a final project. The course introduces practical tools for performing data analysis and explores the fundamental concepts necessary to interpret and report results for both categorical and numerical data

课程3

Linear Regression and Modeling

This course introduces simple and multiple linear regression models. These models allow you to assess the relationship between variables in a data set and a continuous response variable. Is there a relationship between the physical attractiveness of a professor and their student evaluation scores? Can we predict the test score for a child based on certain characteristics of his or her mother? In this course, you will learn the fundamental theory behind linear regression and, through data examples, learn to fit, examine, and utilize regression models to examine relationships between multiple variables, using the free statistical software R and RStudio.

课程4

Bayesian Statistics

This course describes Bayesian statistics, in which one's inferences about parameters or hypotheses are updated as evidence accumulates. You will learn to use Bayes’ rule to transform prior probabilities into posterior probabilities, and be introduced to the underlying theory and perspective of the Bayesian paradigm. The course will apply Bayesian methods to several practical problems, to show end-to-end Bayesian analyses that move from framing the question to building models to eliciting prior probabilities to implementing in R (free statistical software) the final posterior distribution. Additionally, the course will introduce credible regions, Bayesian comparisons of means and proportions, Bayesian regression and inference using multiple models, and discussion of Bayesian prediction. We assume learners in this course have background knowledge equivalent to what is covered in the earlier three courses in this specialization: "Introduction to Probability and Data," "Inferential Statistics," and "Linear Regression and Modeling."

课程5

Statistics with R Capstone

The capstone project will be an analysis using R that answers a specific scientific/business question provided by the course team. A large and complex dataset will be provided to learners and the analysis will require the application of a variety of methods and techniques introduced in the previous courses, including exploratory data analysis through data visualization and numerical summaries, statistical inference, and modeling as well as interpretations of these results in the context of the data and the research question. The analysis will implement both frequentist and Bayesian techniques and discuss in context of the data how these two approaches are similar and different, and what these differences mean for conclusions that can be drawn from the data. A sampling of the final projects will be featured on the Duke Statistical Science department website. Note: Only learners who have passed the four previous courses in the specialization are eligible to take the Capstone.

### 预备知识

基础数学，无需编程基础。

### 参考资料

本课程很好地包含了所有要讲解的内容，不过我们推荐一本参考书给学生：《开放式统计学导论》（OpenIntro Statistics），第二版。这门课的内容同这本书结合得很紧密，因此，这本书可以作为视频的补充材料。课本是开源的，提供免费在线阅读：openintro.org。