Run Kaplan-Meier plots and Cox regression in R and interpret the output
Describe a data set from scratch, using descriptive statistics and simple graphical methods as a necessary first step for more advanced analysis
Describe and compare some common ways to choose a multiple regression model
Welcome to Survival Analysis in R for Public Health!
The three earlier courses in this series covered statistical thinking, correlation, linear regression and logistic regression. This one will show you how to run survival – or “time to event” – analysis, explaining what’s meant by familiar-sounding but deceptive terms like hazard and censoring, which have specific meanings in this context. Using the popular and completely free software R, you’ll learn how to take a data set from scratch, import it into R, run essential descriptive analyses to get to know the data’s features and quirks, and progress from Kaplan-Meier plots through to multiple Cox regression. You’ll use data simulated from real, messy patient-level data for patients admitted to hospital with heart failure and learn how to explore which factors predict their subsequent mortality. You’ll learn how to test model assumptions and fit to the data and some simple tricks to get round common problems that real public health data have. There will be mini-quizzes on the videos and the R exercises with feedback along the way to check your understanding.
Some formulae are given to aid understanding, but this is not one of those courses where you need a mathematics degree to follow it. You will need basic numeracy (for example, we will not use calculus) and familiarity with graphical and tabular ways of presenting results. The three previous courses in the series explained concepts such as hypothesis testing, p values, confidence intervals, correlation and regression and showed how to install R and run basic commands. In this course, we will recap all these core ideas in brief, but if you are unfamiliar with them, then you may prefer to take the first course in particular, Statistical Thinking in Public Health, and perhaps also the second, on linear regression, before embarking on this one.
The Kaplan-Meier Plot
What is survival analysis? You’ll see what it is, when to use it and how to run and interpret the most common descriptive survival analysis method, the Kaplan-Meier plot and its associated log-rank test for comparing the survival of two or more patient groups, e.g. those on different treatments. You’ll learn about the key concept of censoring.
The Cox Model
This week you’ll get to know the most commonly used survival analysis method for incorporating not just one but multiple predictors of survival: Cox proportional hazards regression modelling. You’ll learn about the key concepts of hazards and the risk set. From now and until the end of this course, there’ll be plenty of chance to run Cox models on data simulated from real patient-level records for people admitted to hospital with heart failure. You’ll see why missing data and categorical variables can cause problems in regression models such as Cox.
The Multiple Cox Model
You’ll extend the simple Cox model to the multiple Cox model. As preparation, you’ll run the essential descriptive statistics on your main variables. Then you’ll see what can happen with real-life public health data and learn some simple tricks to fix the problem.
The Proportionality Assumption
In this final part of the course, you’ll learn how to assess the fit of the model and test the validity of the main assumptions involved in Cox regression such as proportional hazards. This will cover three types of residuals. Lastly, you’ll get to practise fitting a multiple Cox regression model and will have to decide which predictors to include and which to drop, a ubiquitous challenge for people fitting any type of regression model.