Our last rapid development challenge “mapping memory” makes me think that the data sources for mapping and spatial analysis nowadays can be very different from the situation in John Snow’s time (the cholera person not the one on a dragon). I have seen more and more space related studies and projects being conducted based on big data, especially social media posts. If they have Twitter back then or John Snow has had information such as foot traffic, maybe he could have made the connection and figured everything out more quickly.

So I will talk about social media data for spatial analysis and mapping for this post. On the one hand, the richness of location-referenced information in social media has led to the emergence and evolution of data-driven geography. Researchers find social media to be a valuable data source for understanding spatial patterns and interactions as well as informing spatial planning. On the other hand, people raise questions about the data quality and the suitability of using traditional analytical methods to gain insights from this non-traditional data source.

Social media posts’ geo-tags, textual contents, photographs, videos, and users’ profiles could all be leveraged for mapping and spatial analysis. These pieces of information have been proved to be useful in many fields. In public health, for instance, traditional epidemic monitors rely on clinical reports gathered by public health authorities. Health care providers in the US depend on the information provided by the CDC to learn about disease outbreaks. However, the time lag between the date that the disease starts and the date that clinical cases are reported to authorities is a major drawback of official surveillance systems. For this reason, many researchers have developed data processing and modeling techniques to employ social media for real-time epidemic analysis. In a Japanese nationwide study, Wakamiya et al. (2018) extracted tweets’ GPS information, location names mentioned in tweets, and users’ profile locations to effectively estimate when and where influenza outbreaks are happening. This approach is relatively fast with reasonable accuracy, which could support the early detection of epidemics. Multimedia content in social media provides means for spatial analysis as well. Based on the metadata of Instagram photos and hashtags in the caption, Jang and Kim (2019) created a cognitive map of the Seoul metropolitan area to capture residents’ collective perceptions of urban space.

Compared to conventional spatial data sources, such as aerial photographs and field surveys, social media data has the advantages of being massive in volume and scale, timely, unobtrusive, and cost-effective (Campagna, 2016; Martí et al., 2019). However, some of its inherent characteristics, namely incompleteness, non-representativeness, and inaccessibility, are generally problematic for academic research (Goodchild, 2013; Salganik, 2018). No matter how big the data is, social media are very likely not to have information such as demographics, which makes it challenging for researchers to explore the reasons behind spatial patterns. This is due to the fact that social media is not purposefully designed as a channel for data collection. For similar reasons, social media users usually cannot represent a certain population, which limits findings’ generalizability. In addition, social media data are not always accessible to researchers due to legal or ethical barriers.

Besides the nature of the data, analytical methods further complicate the usage of social media data for spatial analysis. While traditional statistical tools were established in situations where data are relatively scarce and clean, data-driven analytics nowadays need to take account of the massiveness and messiness of data, which will eventually transform into a new generation of techniques in which the potential to support real-time analysis at a large scale and the ability to handle noise is valued more (Arribas-Bel, 2014; Miller & Goodchild, 2015). As these techniques, such as machine learning, are constantly evolving, I think it is important for us not to overlook all the limitations and keep our skills up to date.

References:

Arribas-Bel, D. (2014). Accidental, open and everywhere: Emerging data sources for the understanding of cities. Applied Geography, 49, 45–53. https://doi.org/10.1016/j.apgeog.2013.09.012

Campagna, M. (2016). Social Media Geographic Information: Why social is special when it goes spatial? Ubiquity Press. https://doi.org/10.5334/bax.d

Goodchild, M. F. (2013). The quality of big (geo)data. Dialogues in Human Geography. https://doi.org/10.1177/2043820613513392

Jang, K. M., & Kim, Y. (2019). Crowd-sourced cognitive mapping: A new way of displaying people’s cognitive perception of urban space. PLoS ONE, 14(6). https://doi.org/10.1371/journal.pone.0218590

Martí, P., Serrano-Estrada, L., & Nolasco-Cirugeda, A. (2019). Social Media data: Challenges, opportunities, and limitations in urban studies. Computers, Environment and Urban Systems, 74, 161–174. https://doi.org/10.1016/j.compenvurbsys.2018.11.001

Miller, H. J., & Goodchild, M. F. (2015). Data-driven geography. GeoJournal, 80(4), 449–461. https://doi.org/10.1007/s10708-014-9602-6

Salganik, M. (2018). Bit by Bit: Social Research in the Digital Age (Open review edition). Princeton University Press. https://www.bitbybitbook.com/en/preface/

Wakamiya, S., Kawai, Y., & Aramaki, E. (2018). Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study. JMIR Public Health and Surveillance, 4(3), e65. https://doi.org/10.2196/publichealth.8627