The Ethics of Big Data

How should researchers approach an era of unprecedented information?

By Nicole EtterFebruary 26, 2018

The modern era of connectivity is, in many ways, a researcher’s paradise.

Research is built upon data, and everyday life creates unprecedented cascades of raw information about people. It flows from social media, mobile apps, internet searches and countless other sources. Your Twitter posts, shopping purchases or workout habits can drive corporate strategies, advertising campaigns and, yes, research initiatives.

“In this era of big data, everything is now collectable and sharable and could be aggregated,” says Michael Zimmer, director of UWM’s Center for Information Policy Research. “You just never know what kind of data researchers might use next to do research.”

Zimmer has spent his career studying privacy and internet ethics, and he emphasizes that most research is conducted ethically. Still, the way information is handled today raises questions that deserve answers. “How do users feel,” asks Zimmer, an associate professor in the School of Information Studies, “about the fact that something that they put out there for a certain reason is being used for a different reason?”

That’s just one of the many issues he and others are investigating through a project called PERVADE, short for Pervasive Data Ethics for Computational Research. The new four-year project recently received a $3 million grant from the National Science Foundation.

“You just never know what kind of data researchers might use next to do research.”

PERVADE’s goal is to paint a picture of how pervasive data – in-depth information collected over a broad timeframe – is viewed and used in computational research. It will go beyond case studies, anecdotes and critiques to produce empirical research that can guide the field. Zimmer, one of the project’s seven principal investigators, will lead subprojects on ethical training and practices of computational researchers.

Zimmer’s team will survey researchers on their privacy beliefs and practices. It will assess more than 500 data science programs nationwide to see how – or even if – students are trained in data ethics. In addition, Zimmer will analyze the data management plans submitted by researchers who are funded by the National Science Foundation. “Did they talk about privacy?” Zimmer wonders. “Did they address anonymity or try to de-identify data? Then, I’m going to look at whether there are differences across disciplines.”

The PERVADE team will also survey social media users, social media companies and universities’ institutional review boards to get their perspectives on the ethics of social media research. Institutional review boards are crucial to the research process, and any project involving human subjects must gain the board’s approval before proceeding. The goal is to protect subjects from a project’s potential to cause physical or psychological harm.

Zimmer worries about protecting subjects from reputational harm as well, which is why he takes a measured approach to online privacy. He knows some researchers believe it’s fine to capture and use any available data in the name of science, but for him, it comes down to informed consent.

“I don’t think I’m being radical in my position, but it’s often not what people want to hear,” he says. “My intention is not to stop research. We just need to make sure we’re doing it the right way, in the ways that respect the dignity and the autonomy of the subjects.”

Zimmer first became intrigued by the ethics of social media research in 2008. He’d heard sociological researchers at Harvard had tracked an “anonymous” class of freshmen on Facebook to see how their social networks and interests changed over time. At the end of the project, the researchers publicly released the data without students’ names attached. “I did a little digging,”

Zimmer says, “and within about two days, I was able to identify the data was actually from Harvard students.”

His critique on the researchers’ handling of subjects’ privacy drew the attention of The Chronicle of Higher Education and other national outlets.

“It was one of the first big cases that got people to think in a different way about internet research ethics,” Zimmer says. “It brought to light all kinds of questions: What is privacy? What kinds of rules should researchers follow if we’re going to use data from Facebook or Twitter?”

Zimmer knows how valuable a resource pervasive data can be for such tasks as building algorithms and studying human activity and communication. But he also knows that researchers, or even different disciplines, have different approaches to data.

As an example, Zimmer points to the high-profile case in which a Danish psychology graduate student collected data from some 70,000 users of the OkCupid dating site. The data was published in 2016 – including usernames, political beliefs and sexual preferences – with the reasoning that identities didn’t need to be protected because the information was already online and public to other OkCupid users. “We’re constantly running into this view,” Zimmer says, “that the data is already out there, so why do we need to worry about privacy.”

The PERVADE study will explore viewpoints from across the spectrum and offer researchers tools to navigate the information landscape.

“We’re hoping to generate a whole bunch of new knowledge and come up with a set of guidelines for research ethics in the big-data environment,” Zimmer says. “I think we’re going to have a big impact.”