The Library
A longitudinal assessment of the persistence of Twitter datasets
Tools
Zubiaga, Arkaitz (2018) A longitudinal assessment of the persistence of Twitter datasets. Journal of the Association for Information Science and Technology, 69 (8). pp. 974-984. doi:10.1002/asi.24026 ISSN 2330-1643.
|
PDF
WRAP-longitudinal-assessment-persistence-Twitter-datasets-Zubiaga-2018.pdf - Accepted Version - Requires a PDF viewer. Download (1015Kb) | Preview |
Official URL: https://doi.org/10.1002/asi.24026
Abstract
Social media datasets are not always completely replicable. Having to adhere to requirements of platforms such as Twitter, researchers can only release a list of unique identifiers, which others can then use to recollect the data themselves. This leads to subsets of the data no longer being available, as content can be deleted or user accounts deactivated. To quantify the long‐term impact of this in the replicability of datasets, we perform a longitudinal analysis of the persistence of 30 Twitter datasets, which include more than 147 million tweets. By recollecting Twitter datasets ranging from 0 to 4 years old by using the tweet IDs, we look at four different factors quantifying the extent to which recollected datasets resemble original ones: completeness, representativity, similarity, and changingness. Although the ratio of available tweets keeps decreasing as the dataset gets older, we find that the textual content of the recollected subset is still largely representative of the original dataset. The representativity of the metadata, however, keeps fading over time, both because the dataset shrinks and because certain metadata, such as the users' number of followers, keeps changing. Our study has important implications for researchers sharing and using publicly shared Twitter datasets in their research.
Item Type: | Journal Article | ||||||||
---|---|---|---|---|---|---|---|---|---|
Subjects: | H Social Sciences > HM Sociology Q Science > QA Mathematics |
||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||||
Library of Congress Subject Headings (LCSH): | Quantitative research, Social media, Twitter (Firm) | ||||||||
Journal or Publication Title: | Journal of the Association for Information Science and Technology | ||||||||
Publisher: | John Wiley & Sons, Inc. | ||||||||
ISSN: | 2330-1643 | ||||||||
Official Date: | August 2018 | ||||||||
Dates: |
|
||||||||
Volume: | 69 | ||||||||
Number: | 8 | ||||||||
Page Range: | pp. 974-984 | ||||||||
DOI: | 10.1002/asi.24026 | ||||||||
Status: | Peer Reviewed | ||||||||
Publication Status: | Published | ||||||||
Access rights to Published version: | Restricted or Subscription Access | ||||||||
Date of first compliant deposit: | 8 March 2018 | ||||||||
Date of first compliant Open Access: | 21 May 2018 | ||||||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year