The Lowdown on Big Data

The Lowdown on Big Data

Mardi, Mars 28, 2017

Michael Todd, SAGE Publishing

Who’s doing big data?

Based on the buzz that the term has been creating since the turn of the century, perhaps a better question is who isn’t doing big data. Certainly the awareness of giant datasets and their potential to be mined for good, or ill, is well-nigh universal. As political scientist Gary King, who heads Harvard’s Institute for Quantitative Social Science, is fond of saying, “My mom now thinks she understands what I do.”

As with anything buzzy, the truth is that not nearly so many people really understand what big data really is, and an even smaller number are actively working with it. Last year, SAGE Publishing took a stab at figuring out who was doing big data work and what sort of support they needed. More than 9,000 people, mostly academics, worldwide answered SAGE’s survey. That survey resulted in a white paper, Who Is Doing Computational Social Science? Trends in Big Data Research, and is the genesis of panel at the Congress on June 1.

The survey responses were genuinely global, with 35 countries each supplying at least 50 completed surveys; more than 350 Canadians answered.

Disciplines of respondents were also all over the map, with education, psychology and health sciences each providing more than a thousand respondents. Nonetheless, fields as diverse as the law, nursing, marketing and history joined more traditional social science disciplines such as political science, demographics, criminology and sociology in supplying respondents.

One third of respondents self-identified as having been involved in big data research of some kind, with one of four of them reporting that all or most of their research involved big data or data science methods. Predictably, who is doing the most big data research is in large part explained by type of research associated with the respondent’s discipline. And so, the most common disciplines reporting any big data research were social statistics and research methods, where almost three out of five respondents had been involved in big data research at some point, economics (about half), demography, population studies, and human geography (slightly less than half), and health sciences (slightly less than two out of five).

“Overall,” wrote the white paper’s authors, “these percentages seem very high (especially in the case of history and anthropology, which are not typically disciplines associated with big data), and this further suggests that researchers who are very interested in big data and who are already engaged in big data research were more likely to complete the survey. It may also indicate ambiguity about what people understand by the terms big data and data science.”

Of the remaining two thirds of respondents, those who have not yet engaged in big data research, half of them (3,057 respondents) said that they are either “definitely planning on doing so in the future” or “might do so in the future.” That means that a substantial number of respondents don’t expect to do any big data work period, and while it might seem difficult to escape some brush with big data, some 1,083 of respondents said they definitively are not planning on doing it.

The white paper authors asked social and behavioral researchers about what data sources they used and what tools they used to tap these sources. Among respondents who are already active in computational social science, by far the most common data source they had most recently used for their endeavors was administrative data – government generated data on subjects as diverse as government departments and can include health, education or income. Some 55 percent of respondents reported having used that in their most recent research involving big data.

The next largest source, cited by 29 percent, was social media data, such as Facebook or Twitter. (Multiple answers were possible.) The third most commonly cited was commercial or proprietary data, cited by 23 percent of respondents. Giving an idea of the scope of what can constitute ‘big data,’ the fourth most common response included photographs, video or audio sources.

Because ‘big data’ is new, interdisciplinary and, well, big, the authors posited that it would present “unique problems” to researchers. In fact, a lot of the biggest problems faced by researches will ring true in any academic endeavor – elusive funding, elusive data and that elusive perfect collaborator.

Among big data researchers surveyed about their challenges, 42 percent identified funding as a “big problem,” followed by 32 percent who cited gaining access to commercial or proprietary data and 30 percent “finding collaborators with the right skills and knowledge.” (Multiple answers were allowed.) Of course, the nature of those challenges may have a different complexion for social data researchers, and the respondents also identified challenges that definitely had a big data cast. For example, 30 percent cited learning new software as a major challenge, and 27 percent “learning new analytic methods for myself.”

***

For those interested in learning more about using big data in their social science or humanities research, SAGE Publishing is sponsoring a panel at Congress titled “Getting Comfortable with Big Data” from 10:30-noon on Thursday, June 1. The session takes place at the Expo Event Space in the MAC-Mattamy – Congress Hub.

For more information on SAGE’s work in this area please contact Michael Todd at  Michael.Todd@sagepub.com or visit MethodSpace.com.