Michael Knapp

Beyond Data Visualizations: Storytelling and Machine Learning for Data-Informed Social Action

If data sharing is going to be effective for the public — and not only for public health professionals — we have to tell stories.

screenshot of Delaware Opioid Crisis page with blurred background video of E.M.T.s running to a patient

On March 9, 2020, Green River gave a presentation about data-driven storytelling to the CDC’s Environmental Health Tracking Network. Green River has been working with Delaware Health and Human Services (DHSS) on My Healthy Community, since April 2018 to develop a platform for sharing community-level statistics and data that can be used to understand and explore health and related factors that influence health. As part of that work, we recently collaborated with Caroline Judd, an epidemiologist with DHSS on a data story describing Delaware’s Opioid Crisis — a high-level one-page information piece that describes the scope and impact of the crisis, as well as what Delaware is doing to help people and stem the crisis.

In recent years, public-serving organizations have been opening their data sources, often as data web platforms with visualizations. This work has done a great deal to improve public data accessibility, but these tools remain challenging to access and understand for the majority of people represented in the data: those without statistical training or familiarity with interpreting data. In other words, these displays and dashboards do not help people connect with and care about the data being shown.

Data Visualizations: Answering Descriptive Questions 

screenshot of CDC heart disease and stroke map

CDC interactive tool for exploring U.S. heart disease and stroke data. (Heart Disease Maps and Data Sources, CDC)

Data visualizations shown without interpretation or narration are limited in that they show only descriptive information. Take this “Interactive Atlas of Heart Disease and Stroke” above as an example. These kinds of data visualization platforms and interactive tools are undoubtedly useful for exploring large datasets, drilling in on specific data attributes, and revealing patterns and outliers. They help us to answer descriptive questions such as: Where are disease rates higher or lower? How far is it to the nearest healthcare facility?

But why are deaths caused by strokes concentrated in the south? How did this come to be? And how might different concerned people act in response to this information?

Data Storytelling and Data-Driven Journalism: Perspective and Context Towards Behavioral Change

Insights from visualizations alone are often not enough to stimulate action and pro-social behavior change. To better understand the why and how, we need data storytelling (Gottschell, The Storytelling Animal, 2012)

Storytelling can make datasets:

  • More accessible, by translating data into something broadly understandable;
  • More meaningful, by contextualizing data points (ex. time, location, volume) and analyses (ex. trends, significance, proportions); and
  • More useful, by expanding the ability to make data-driven decisions and a feeling of ownership over public data.

chart of Death Rates By Drug Type

Screenshot from our story on Opioid Crisis in Delaware: the visualization combined with a description reinforces that opioids are the leading cause of deaths caused by overdoses.

Since 2009, Data-Driven Journalism (DDJ) has emerged as a form of data storytelling that operates at the intersection of investigative research, statistics, design, and programming. The Pudding, a well-known DDJ news source, describes their team as “journalist-engineers” and their process as “[explaining] ideas debated in culture with visual essays. By wielding original datasets, primary research, and interactivity, we try to thoroughly explore complex topics.” In DDJ, journalists use the processing and analysis of large datasets to discover and support news stories. For example:

  • The Guardian, “Bussed Out: How America moves its homeless:” While the narrative shares the moving story of one homeless man, an accompanying data visualization reveals his cross-country journey before contextualizing it with the national status of homelessness.
  • NJ Advance Media, “The Force Report:” Responding to a lack of statewide data collection and analysis on police use of force in New Jersey, journalists created “the most comprehensive statewide database of police use of force in the U.S.” and a website to easily explore this dataset (including traditional news stories, major findings, interactive visualizations).

In How to be a Data Journalist , Paul Bradshaw from The Guardian writes, “[DDJ] represents the convergence of a number of fields which are significant in their own right … The idea of combining those skills to tell important stories is powerful -- but also intimidating. Who can do all that?” Indeed, Data-Driven Journalism remains a highly specialized and expensive set of skills. How might DDJ be more accessible to a broader audience?

Can meaningful data stories be automated and told at scale?

Automated journalism (aka algorithmic journalism, robot journalism) is a fairly new approach to journalism where news stories are automatically produced using artificial intelligence (AI) software, rather than human reporters. Wikipedia explains, “Typically, the [automated journalism] process involves an algorithm that scans large amounts of provided data, selects from an assortment of pre-programmed article structures, orders key points, and inserts details such as names, places, amounts, rankings, statistics, and other figures. … [But] due to the formulaic nature of automation, it is mainly used for stories based on statistics and numerical figures. Common topics include sports recaps, weather, financial reports, real estate analysis, and earnings reviews.”

In 2015, Forbes reported that two well-known AI apps, Quill by Narrative Science and Automated Insights, are already used by media outlets to generate stories or parts of stories. These apps specialize in Natural Language Generation (NLG) and automated narrative creation--or a type of AI that puts together convincing together from specific data sources. Currently dependent on datasets being cleaned and structured, these features are especially powerful for communicating personalized narratives to specific audience segments at scale. Wimbledon is using IBM sensors and camera systems to monitor and generate millions of real-time data points from games, that are then turned into automated stories or tweets, ensuring that they are the first to break stories about game results. Associated Press uses NLG to write over 4000 earnings reports each quarter, or 15 times what they could produce manually.

Though automated stories continue to pale in comparison to the curated and compelling stories created by journalists, the question we want to ask is not whether AI will take over the jobs of human journalists, but rather, how might human journalists, AI, and other stakeholders work well together to tell meaningful stories using large public datasets?

Storytelling with a plurality of intelligences

In our work with Delaware’s My Healthy Community, our next challenge is to help the public understand the data being shared, by highlighting useful findings and offering meaningful interpretations of charts and maps at hundreds of geographic locations.

The question remains—even if we can accurately offer plain language interpretations, is automation sufficient to help people understand why findings are important and to decide where interventions are needed? 

At Green River, we believe a combination of contributors—which can include artificial intelligence—can support the most effective storytelling. The table below describes how different contributors could work together:

Artificial Intelligence Good at discovering potential stories, by calling out anomalies and patterns in large datasets; good at generating story visuals. Algorithms must be interrogated to ensure damaging, unintended biases are eliminated.
Community Members with lived experience Good at verifying which of the discovered stories matter to their people; adding nuance and context from their local politics; and interpreting the story and findings in their own words.
Subject Matter Experts Good at framing the story in the context of the larger social issue and history; and verifying the importance of different topics.
Critical Data Designer Good at cleaning up visuals, story narrative, and managing this collaborative process, or holding the AI accountable.

The stories that result would be semi-automated stories. Rather than rely fully on new technologies like artificial intelligence, we recognize the many intelligences that our stakeholders bring to the table. We are working to combine the intelligences of AI, community members, and technical and subject-matter experts to create stories that are driven by data from both large public datasets and local narratives.