Recognise the key components of statistical thinking in order to defend the critical role of statistics in modern public health research and practice
Describe a given data set from scratch using descriptive statistics and graphical methods as a first step for more advanced analysis using R software
Apply appropriate methods in order to formulate and examine statistical associations between variables within a data set in R
Interpret the output from your analysis and appraise the role of chance and bias as explanations for your results
Statistics are everywhere. The probability it will rain today. Trends over time in unemployment rates. The odds that India will win the next cricket world cup. In sports like football, they started out as a bit of fun but have grown into big business. Statistical analysis also has a key role in medicine, not least in the broad and core discipline of public health.
In this specialisation, you’ll take a peek at what medical research is and how – and indeed why – you turn a vague notion into a scientifically testable hypothesis. You’ll learn about key statistical concepts like sampling, uncertainty, variation, missing values and distributions. Then you’ll get your hands dirty with analysing data sets covering some big public health challenges – fruit and vegetable consumption and cancer, risk factors for diabetes, and predictors of death following heart failure hospitalisation – using R, one of the most widely used and versatile free software packages around.
This specialisation consists of four courses – statistical thinking, linear regression, logistic regression and survival analysis – and is part of our upcoming Global Master in Public Health degree, which is due to start in September 2019.
The specialisation can be taken independently of the GMPH and will assume no knowledge of statistics or R software. You just need an interest in medical matters and quantitative data.
Introduction to Statistics & Data Analysis in Public Health
Welcome to Introduction to Statistics & Data Analysis in Public Health! This course will teach you the core building blocks of statistical analysis - types of variables, common distributions, hypothesis testing - but, more than that, it will enable you to take a data set you've never seen before, describe its keys features, get to know its strengths and quirks, run some vital basic analyses and then formulate and test hypotheses based on means and proportions. You'll then have a solid grounding to move on to more sophisticated analysis and take the other courses in the series. You'll learn the popular, flexible and completely free software R, used by statistics and machine learning practitioners everywhere. It's hands-on, so you'll first learn about how to phrase a testable hypothesis via examples of medical research as reported by the media. Then you'll work through a data set on fruit and vegetable eating habits: data that are realistically messy, because that's what public health data sets are like in reality. There will be mini-quizzes with feedback along the way to check your understanding. The course will sharpen your ability to think critically and not take things for granted: in this age of uncontrolled algorithms and fake news, these skills are more important than ever. Prerequisites Some formulae are given to aid understanding, but this is not one of those courses where you need a mathematics degree to follow it. You will need only basic numeracy (for example, we will not use calculus) and familiarity with graphical and tabular ways of presenting results. No knowledge of R or programming is assumed.
Linear Regression in R for Public Health
Welcome to Linear Regression in R for Public Health! Public Health has been defined as “the art and science of preventing disease, prolonging life and promoting health through the organized efforts of society”. Knowing what causes disease and what makes it worse are clearly vital parts of this. This requires the development of statistical models that describe how patient and environmental factors affect our chances of getting ill. This course will show you how to create such models from scratch, beginning with introducing you to the concept of correlation and linear regression before walking you through importing and examining your data, and then showing you how to fit models. Using the example of respiratory disease, these models will describe how patient and other factors affect outcomes such as lung function. Linear regression is one of a family of regression models, and the other courses in this series will cover two further members. Regression models have many things in common with each other, though the mathematical details differ. This course will show you how to prepare the data, assess how well the model fits the data, and test its underlying assumptions – vital tasks with any type of regression. You will use the free and versatile software package R, used by statisticians and data scientists in academia, governments and industry worldwide.
Logistic Regression in R for Public Health
Welcome to Logistic Regression in R for Public Health! Why logistic regression for public health rather than just logistic regression? Well, there are some particular considerations for every data set, and public health data sets have particular features that need special attention. In a word, they're messy. Like the others in the series, this is a hands-on course, giving you plenty of practice with R on real-life, messy data, with predicting who has diabetes from a set of patient characteristics as the worked example for this course. Additionally, the interpretation of the outputs from the regression model can differ depending on the perspective that you take, and public health doesn’t just take the perspective of an individual patient but must also consider the population angle. That said, much of what is covered in this course is true for logistic regression when applied to any data set, so you will be able to apply the principles of this course to logistic regression more broadly too. By the end of this course, you will be able to: Explain when it is valid to use logistic regression Define odds and odds ratios Run simple and multiple logistic regression analysis in R and interpret the output Evaluate the model assumptions for multiple logistic regression in R Describe and compare some common ways to choose a multiple regression model This course builds on skills such as hypothesis testing, p values, and how to use R, which are covered in the first two courses of the Statistics for Public Health specialisation. If you are unfamiliar with these skills, we suggest you review Statistical Thinking for Public Health and Linear Regression for Public Health before beginning this course. If you are already familiar with these skills, we are confident that you will enjoy furthering your knowledge and skills in Statistics for Public Health: Logistic Regression for Public Health. We hope you enjoy the course!
Survival Analysis in R for Public Health
Welcome to Survival Analysis in R for Public Health! The three earlier courses in this series covered statistical thinking, correlation, linear regression and logistic regression. This one will show you how to run survival – or “time to event” – analysis, explaining what’s meant by familiar-sounding but deceptive terms like hazard and censoring, which have specific meanings in this context. Using the popular and completely free software R, you’ll learn how to take a data set from scratch, import it into R, run essential descriptive analyses to get to know the data’s features and quirks, and progress from Kaplan-Meier plots through to multiple Cox regression. You’ll use data simulated from real, messy patient-level data for patients admitted to hospital with heart failure and learn how to explore which factors predict their subsequent mortality. You’ll learn how to test model assumptions and fit to the data and some simple tricks to get round common problems that real public health data have. There will be mini-quizzes on the videos and the R exercises with feedback along the way to check your understanding. Prerequisites Some formulae are given to aid understanding, but this is not one of those courses where you need a mathematics degree to follow it. You will need basic numeracy (for example, we will not use calculus) and familiarity with graphical and tabular ways of presenting results. The three previous courses in the series explained concepts such as hypothesis testing, p values, confidence intervals, correlation and regression and showed how to install R and run basic commands. In this course, we will recap all these core ideas in brief, but if you are unfamiliar with them, then you may prefer to take the first course in particular, Statistical Thinking in Public Health, and perhaps also the second, on linear regression, before embarking on this one.
The specialisation will assume no knowledge of statistics or R software.