Learn SAS or Python programming, expand your knowledge of analytical methods and applications, and conduct original research to inform complex decisions.
The Data Analysis and Interpretation Specialization takes you from data novice to data expert in just four project-based courses. You will apply basic data science tools, including data management and visualization, modeling, and machine learning using your choice of either SAS or Python, including pandas and Scikit-learn. Throughout the Specialization, you will analyze a research question of your choice and summarize your insights. In the Capstone Project, you will use real data to address an important issue in society, and report your findings in a professional-quality report. You will have the opportunity to work with our industry partners, DRIVENDATA and The Connection. Help DRIVENDATA solve some of the world’s biggest social challenges by joining one of their competitions, or help The Connection better understand recidivism risk for people on parole in substance use treatment. Regular feedback from peers will provide you a chance to reshape your question. This Specialization is designed to help you whether you are considering a career in data, work in a context where supervisors are looking to you for data insights, or you just have some burning questions you want to explore. No prior experience is required. By the end you will have mastered statistical methods to conduct original research to inform complex decisions.
Data Management and Visualization
Whether being used to customize advertising to millions of website visitors or streamline inventory ordering at a small restaurant, data is becoming more integral to success. Too often, we’re not sure how use data to find answers to the questions that will make us more successful in what we do. In this course, you will discover what data is and think about what questions you have that can be answered by the data – even if you’ve never thought about data before. Based on existing data, you will learn to develop a research question, describe the variables and their relationships, calculate basic statistics, and present your results clearly. By the end of the course, you will be able to use powerful data analysis tools – either SAS or Python – to manage and visualize your data, including how to deal with missing data, variable groups, and graphs. Throughout the course, you will share your progress with others to gain valuable feedback, while also learning how your peers use data to answer their own questions.
Data Analysis Tools
In this course, you will develop and test hypotheses about your data. You will learn a variety of statistical tests, as well as strategies to know how to apply the appropriate one to your specific data and question. Using your choice of two powerful statistical software packages (SAS or Python), you will explore ANOVA, Chi-Square, and Pearson correlation analysis. This course will guide you through basic statistical principles to give you the tools to answer questions you have developed. Throughout the course, you will share your progress with others to gain valuable feedback and provide insight to other learners about their work.
Regression Modeling in Practice
This course focuses on one of the most important tools in your data analysis arsenal: regression analysis. Using either SAS or Python, you will begin with linear regression and then learn how to adapt when two variables do not present a clear linear relationship. You will examine multiple predictors of your outcome and be able to identify confounding variables, which can tell a more compelling story about your results. You will learn the assumptions underlying regression analysis, how to interpret regression coefficients, and how to use regression diagnostic plots and other tools to evaluate the quality of your regression model. Throughout the course, you will share with others the regression models you have developed and the stories they tell you.
Machine Learning for Data Analysis
Are you interested in predicting future outcomes using your data? This course helps you do just that! Machine learning is the process of developing, testing, and applying predictive algorithms to achieve this goal. Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts. Building on Course 3, which introduces students to integral supervised machine learning concepts, this course will provide an overview of many additional concepts, techniques, and algorithms in machine learning, from basic classification to decision trees and clustering. By completing this course, you will learn how to apply, test, and interpret machine learning algorithms as alternative methods for addressing your research questions.
Data Analysis and Interpretation Capstone
The Capstone project will allow you to continue to apply and refine the data analytic techniques learned from the previous courses in the Specialization to address an important issue in society. You will use real world data to complete a project with our industry and academic partners. For example, you can work with our industry partner, DRIVENDATA, to help them solve some of the world's biggest social challenges! DRIVENDATA at www.drivendata.org, is committed to bringing cutting-edge practices in data science and crowdsourcing to some of the world's biggest social challenges and the organizations taking them on. Or, you can work with our other industry partner, The Connection (www.theconnectioninc.org) to help them better understand recidivism risk for people on parole seeking substance use treatment. For more than 40 years, The Connection has been one of Connecticut’s leading private, nonprofit human service and community development agencies. Each month, thousands of people are assisted by The Connection’s diverse behavioral health, family support and community justice programs. The Connection’s Institute for Innovative Practice was created in 2010 to bridge the gap between researchers and practitioners in the behavioral health and criminal justice fields with the goal of developing maximally effective, evidence-based treatment programs. A major component of the Capstone project is for you to be able to choose the information from your analyses that best conveys results and implications, and to tell a compelling story with this information. By the end of the course, you will have a professional quality report of your findings that can be shown to colleagues and potential employers to demonstrate the skills you learned by completing the Specialization.
This Specialization is appropriate for anyone interested in learning more about data analysis, including those new to the field. Some knowledge of basic programming and familiarity with linear algebra concepts may be helpful, but no specific background is required.