The Library
Estimating socioeconomic indicators using online data
Tools
Lochanachit, Sirasit (2020) Estimating socioeconomic indicators using online data. PhD thesis, University of Warwick.
|
PDF
WRAP_Theses_Lochanachit_2020.pdf - Submitted Version - Requires a PDF viewer. Download (16Mb) | Preview |
Official URL: http://webcat.warwick.ac.uk/record=b3467548~S15
Abstract
Policymakers and businesses need a good understanding of the current state of society to make fully informed decisions. In contrast to traditional approaches to measuring human behaviour, which can be expensive, time-consuming and subject to delay, data on collective online behaviour, such as what people are searching for on Google, is available publicly, rapidly and at low cost. Studies into online behaviour may therefore be able to provide useful insights into collective human behaviour in the real world.
Here, we investigate whether online data from social media platforms, such as Instagram and Twitter, and search engine data, specifically data from Google, can help estimate key characteristics of society. In particular, we seek to infer the number of people speaking various languages across different urban areas based on publicly exchanged messages on the photo-sharing platform Instagram. We find that such data can help estimate the spatial distribution of language usage in Greater London. In a parallel analysis, we investigate whether Twitter data is similarly useful. However, our results suggest that data from Instagram is more valuable, as a higher number of posts to the service contain location data.
We also investigate whether online data can be used to help estimate economic activity. Specifically, we focus on unemployment rates in the United Kingdom and draw on data retrieved from Google Trends. Our findings reveal that Google search data can help generate quicker estimates of the current level of unemployment before official data is released. We also find that, according to some performance metrics, a variable selection technique based on an elastic net can improve model performance.
This thesis highlights the potential for inferences generated from online data to complement official statistics, for example by providing quicker estimates before official figures are released. We suggest that rapid, low-cost measurements of collective human behaviour from publicly available data may provide valuable new insights for policymakers and businesses alike.
Item Type: | Thesis (PhD) | ||||
---|---|---|---|---|---|
Subjects: | H Social Sciences > HM Sociology H Social Sciences > HT Communities. Classes. Races Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software |
||||
Library of Congress Subject Headings (LCSH): | User-generated content, Internet users -- Psychology, Online social networks -- Psychological aspects, Language and languages -- Variation, Unemployment | ||||
Official Date: | June 2020 | ||||
Dates: |
|
||||
Institution: | University of Warwick | ||||
Theses Department: | Warwick Business School | ||||
Thesis Type: | PhD | ||||
Publication Status: | Unpublished | ||||
Supervisor(s)/Advisor: | Moat, Suzy ; Preis, Toby | ||||
Sponsors: | Thailand. Government | ||||
Format of File: | |||||
Extent: | xv, 152 leaves : illustrations, charts | ||||
Language: | eng |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year