This course introduces students to data and statistics. By the end of the course, students should be able to interpret descriptive statistics, causal analyses and visualizations to draw meaningful insights.
The course first introduces a framework for thinking about the various purposes of statistical analysis. We’ll talk about how analysts use data for descriptive, causal and predictive inference. We’ll then cover how to develop a research study for causal analysis, compute and interpret descriptive statistics and design effective visualizations. The course will help you to become a thoughtful and critical consumer of analytics. If you are in a field that increasingly relies on data-driven decision making, but you feel unequipped to interpret and evaluate data, this course will help you develop these fundamental tools of data literacy.
Data and Theories
When most people think about using data, they quickly jump to considering the best way to analyze it with statistical methods. A good analysis, however, begins with a strong theoretical framework. A good theory will guide the collection of data, selection of appropriate statistical methods and interpretation of the results. Further, the theory will determine what kind of research design is needed, such as an observational study or experiment. This module will focus on the development of high-quality theories that can be used to guide descriptive, causal and predictive inference.
The Causality Framework
Establishing causality is frequently the primary motivation for research. Policymakers often want to understand how the implementation of a new program or other policy tool will affect an outcome of interest. Will smaller class sizes increase student learning? Will the implementation of stricter background checks for gun buyers reduce gun violence? Biomedical researchers often want to understand whether a new medicine will improve a disease outcome. Will taking a drug improve life expectancy, or even cure the disease under study? To answer these and similar questions, analysts must develop research designs that are appropriate for causal inference. Estimating a causal effect is challenging, yet it is essential to understand the impacts of a policy, medicine or any other kind of intervention.
Over the next four lessons we’ll begin to make sense of raw data. Staring at raw data, such as a spreadsheet, does not reveal much of anything about the key takeaway points. Consider a variable such as a survey question that asks about the level of discrimination in the U.S. (where the answer choices are “a lot,” “some,” “only a little,” “none at all,” and “don’t know”). Reading the raw data does not tell you about the average respondent or the distribution of responses among the possible answer choices. To better understand the shape of the distribution, we can calculate measures of central tendency, measures of spread and characterize the data’s dispersion. These summary statistics allow a researcher to draw some simple yet powerful initial conclusions about what the data tell us in a real-world sense.
Edward Tufte, a world-renowned expert of data visualization, once said, “There is no such thing as information overload. There is only bad design.” When communicating the results of an analysis, and particularly when trying to persuade an audience, a picture is truly worth a thousand words. A well-designed graph can leverage either a small or large amount of data to make a convincing argument. Data visualizations highlight specific points about the underlying information and enable the viewer to draw insights that are nearly invisible when staring at the numbers alone. In short, to be a good at communicating with data, you must become skilled at visualizing data.