Tsakalidis, Adam, Bazzi, Marya, Cucuringu, Mihai, Basile, Pierpaolo and McGillivray, Barbara (2019) Mining the UK web archive for semantic change detection. In: Recent Advances in Natural Language Processing (RANLP) 2019, Varna, Bulgaria, 2–4 Sep 2019. Published in: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019) pp. 1212-1221. ISBN 9789544520557. doi:10.26615/978-954-452-056-4_139 ISSN 1313-8502.
Preview |
PDF
WRAP-Mining-UK-web-archive-semantic-change-detection-2019.pdf - Published Version - Requires a PDF viewer. Available under License Creative Commons Attribution 4.0. Download (3MB) | Preview |
![]() |
PDF
nlp_adam_tsakalidis.pdf - Accepted Version Embargoed item. Restricted access to Repository staff only - Requires a PDF viewer. Download (3MB) |
Abstract
Semantic change detection (i.e., identify- ing words whose meaning has changed over time) started emerging as a grow- ing area of research over the past decade, with important downstream applications in natural language processing, historical linguistics and computational social sci- ence. However, several obstacles make progress in the domain slow and diffi- cult. These pertain primarily to the lack of well-established gold standard datasets, resources to study the problem at a fine- grained temporal resolution, and quantita- tive evaluation approaches. In this work, we aim to mitigate these issues by (a) re- leasing a new labelled dataset of more than 47K word vectors trained on the UK Web Archive over a short time-frame (2000- 2013); (b) proposing a variant of Pro- crustes alignment to detect words that have undergone semantic shift; and (c) intro- ducing a rank-based approach for evalu- ation purposes. Through extensive nu- merical experiments and validation, we il- lustrate the effectiveness of our approach against competitive baselines. Finally, we also make our resources publicly available to further enable research in the domain.
Item Type: | Conference Item (Paper) |
---|---|
Subjects: | Q Science > QA Mathematics T Technology > TK Electrical engineering. Electronics Nuclear engineering Z Bibliography. Library Science. Information Resources > ZA Information resources |
Divisions: | Faculty of Science, Engineering and Medicine > Science > Mathematics |
Library of Congress Subject Headings (LCSH): | Semantic Web, Semantic computing, Information technology -- Sociological aspects, Data mining -- Great Britain, Web archives -- Great Britain |
Journal or Publication Title: | Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019) |
Publisher: | INCOMA Ltd. |
ISBN: | 9789544520557 |
ISSN: | 1313-8502 |
Official Date: | 22 October 2019 |
Dates: | Date Event 22 October 2019 Published 6 July 2019 Accepted |
Page Range: | pp. 1212-1221 |
DOI: | 10.26615/978-954-452-056-4_139 |
Status: | Peer Reviewed |
Publication Status: | Published |
Access rights to Published version: | Open Access (Creative Commons open licence) |
Date of first compliant deposit: | 30 October 2019 |
Date of first compliant Open Access: | 1 March 2021 |
RIOXX Funder/Project Grant: | Project/Grant ID RIOXX Funder Name Funder ID EP/N510129/1 [EPSRC] Engineering and Physical Sciences Research Council Seed funding grant : SF099 [EPSRC] Engineering and Physical Sciences Research Council |
Conference Paper Type: | Paper |
Title of Event: | Recent Advances in Natural Language Processing (RANLP) 2019 |
Type of Event: | Conference |
Location of Event: | Varna, Bulgaria |
Date(s) of Event: | 2–4 Sep 2019 |
Related URLs: | |
Open Access Version: | |
URI: | https://wrap.warwick.ac.uk/128469/ |
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |