For this blog post, I’d like to share a book I recently read. The book is called Bit by Bit: Social Research in the Digital Age, written by Matthew Salganik. It discusses new opportunities, challenges, and common pitfalls of using large scale digital data for social science research in general.

One theme of this book is the blending of readymade and custom-made data. Traditional social research relies on purposefully-collected, custom-made data; and it is often times of crucial importance for researchers to gather information directly from original sources, via methods such as surveys, interviews, and experiments. Salganik points out that tremendous opportunities exist in readymade data sources (e.g. sensor, machine log, and social media data) that were not intentionally created for research but can be repurposed for social research. He does not advocate one data source as better than another; instead, he argues that, when done well, a mixing of readymades and custommades can be incredibly powerful. Salganik takes Blumenstock, Cadamuro, & On’s (2015) study on wealth as an example to illustrate how powerful the combination of readymades and custommades could be. Blumenstock et al. linked mobile phone call records (a ready-made data source) with a random-sampled telephone survey (a custom-made data source), and were able to predict 1.5 million phone users’ socioeconomic status in Rwanda. In addition to the large scale, their method of linking two data sources together was ten times faster and fifty times cheaper than Rwanda’s Demographic and Health Survey, a gold-standard traditional method.

Another theme of this book is ethics; in other words, how to take advantage of the digital age in ways that are responsible and beneficial to society. We discussed about some of the ethics issues in our rapid development challenges as well. In Salganik’s opinion, the increasing abilities of researchers to observe and run experiments on human subjects without their awareness and consent is the fundamental reason for uncertainty about appropriate conduct of digital-age social research. Because the capabilities of digital systems are evolving more quickly than rules, laws, and norms, researchers are facing more ethical challenges nowadays than in the analog age, when most social research was conducted within the boundary of clear rules. He suggests that four principles (respect for persons, beneficence, justice, and respect for law and public interest) and two frameworks (consequentialism and deontology) could guide researchers to deal with ethical uncertainties. In addition to the ethical principles and frameworks, he offers three practical tips: “the IRB is a floor, not a ceiling;” “put yourself in everyone else’s shoes;” and “think of research ethics as continuous, not discrete.” I find his suggestions very helpful in moving beyond focusing on what is permitted by current regulations and apply the regulations in a way that is sensitive and aware of the ethics and how they apply to the lives of the research subjects. We, as researchers, must adapt to inconsistent and overlapping rules and use our best judgement to deal with ethical uncertainties.

I think this book is a timely piece given digital methods and “informatics” are becoming increasingly popular. Thought it may be of interest to some of you =)

Reference: Blumenstock, J., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science, 350(6264), 1073–1076.