The Library
Statistical modelling of citation networks, research influence and journal prestige
Tools
Selby, David Antony (2020) Statistical modelling of citation networks, research influence and journal prestige. PhD thesis, University of Warwick.
|
PDF
WRAP_Theses_Selby_2020.pdf - Submitted Version - Requires a PDF viewer. Download (4Mb) | Preview |
Official URL: http://webcat.warwick.ac.uk/record=b3690782
Abstract
Standard approaches to measurement of the ‘impact’ of academic journals, or even sometimes of individual researchers or single research outputs, are typically not based on principled statistical methods for the analysis of citation data, through appropriate statistical models. Recent research has shown the value of such statistical modelling, for citations within a research discipline, for example in reproducing more faithfully the quality judgements of human assessors. In this project we study the strengths and weaknesses of statistical modelling approaches to citation-network data, and in so doing, uncover a deep theoretical connection between two otherwise unrelated journal ranking methods: PageRank and the Bradley–Terry model.
We extend the usual journal- or author-based metrics, by aggregating all publications in a given field into ‘super-journals’. This permits modelling the exchange of citations between disciplines, raising the question: which scientific fields export the most intellectual influence, through recent research, to other fields? The relative merits of human and algorithmic field classifications are discussed. For this task, we propose a methodology of residual diagnosis for network community structures.
Finally, we investigate the extent to which the 2014 Research Excellence Framework’s assessment of ‘quality’ of research outputs (rated 4*, 3*, 2* or 1*) was associated with the reputation of the journals in which those outputs were published. Submissions data are available, as are the aggregate scores for each university department, but the individual ratings for each paper are not. The research question is thus an ‘ecological inference’ problem attempting to estimate individual-level characteristics from aggregate data. Results are presented for several research fields.
To promote reproducibility and enable future research, the thesis includes a vignette on how to obtain citation network data from various databases, and is accompanied by R packages scrooge and ref2014 to facilitate analysis.
Item Type: | Thesis (PhD) | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics Z Bibliography. Library Science. Information Resources > ZA Information resources |
||||
Library of Congress Subject Headings (LCSH): | Bibliographical citations -- Statistical methods, Scholarly periodicals, Research, Statistics -- Periodicals | ||||
Official Date: | June 2020 | ||||
Dates: |
|
||||
Institution: | University of Warwick | ||||
Theses Department: | Department of Statistics | ||||
Thesis Type: | PhD | ||||
Publication Status: | Unpublished | ||||
Supervisor(s)/Advisor: | Firth, David | ||||
Sponsors: | Engineering and Physical Sciences Research Council | ||||
Format of File: | |||||
Extent: | xvii, 185 leaves : illustrations | ||||
Language: | eng |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year