Mining the UK web archive for semantic change detection

[thumbnail of WRAP-Mining-UK-web-archive-semantic-change-detection-2019.pdf]
Preview
PDF
WRAP-Mining-UK-web-archive-semantic-change-detection-2019.pdf - Published Version - Requires a PDF viewer.
Available under License Creative Commons Attribution 4.0.

Download (3MB) | Preview
[thumbnail of nlp_adam_tsakalidis.pdf] PDF
nlp_adam_tsakalidis.pdf - Accepted Version
Embargoed item. Restricted access to Repository staff only - Requires a PDF viewer.

Download (3MB)

Request Changes to record.

Abstract

Semantic change detection (i.e., identify- ing words whose meaning has changed over time) started emerging as a grow- ing area of research over the past decade, with important downstream applications in natural language processing, historical linguistics and computational social sci- ence. However, several obstacles make progress in the domain slow and diffi- cult. These pertain primarily to the lack of well-established gold standard datasets, resources to study the problem at a fine- grained temporal resolution, and quantita- tive evaluation approaches. In this work, we aim to mitigate these issues by (a) re- leasing a new labelled dataset of more than 47K word vectors trained on the UK Web Archive over a short time-frame (2000- 2013); (b) proposing a variant of Pro- crustes alignment to detect words that have undergone semantic shift; and (c) intro- ducing a rank-based approach for evalu- ation purposes. Through extensive nu- merical experiments and validation, we il- lustrate the effectiveness of our approach against competitive baselines. Finally, we also make our resources publicly available to further enable research in the domain.

Item Type: Conference Item (Paper)
Subjects: Q Science > QA Mathematics
T Technology > TK Electrical engineering. Electronics Nuclear engineering
Z Bibliography. Library Science. Information Resources > ZA Information resources
Divisions: Faculty of Science, Engineering and Medicine > Science > Mathematics
Library of Congress Subject Headings (LCSH): Semantic Web, Semantic computing, Information technology -- Sociological aspects, Data mining -- Great Britain, Web archives -- Great Britain
Journal or Publication Title: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Publisher: INCOMA Ltd.
ISBN: 9789544520557
ISSN: 1313-8502
Official Date: 22 October 2019
Dates:
Date
Event
22 October 2019
Published
6 July 2019
Accepted
Page Range: pp. 1212-1221
DOI: 10.26615/978-954-452-056-4_139
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Open Access (Creative Commons open licence)
Date of first compliant deposit: 30 October 2019
Date of first compliant Open Access: 1 March 2021
RIOXX Funder/Project Grant:
Project/Grant ID
RIOXX Funder Name
Funder ID
UNSPECIFIED
Alan Turing Institute
EP/N510129/1
[EPSRC] Engineering and Physical Sciences Research Council
Seed funding grant : SF099
[EPSRC] Engineering and Physical Sciences Research Council
Conference Paper Type: Paper
Title of Event: Recent Advances in Natural Language Processing (RANLP) 2019
Type of Event: Conference
Location of Event: Varna, Bulgaria
Date(s) of Event: 2–4 Sep 2019
Related URLs:
Open Access Version:
URI: https://wrap.warwick.ac.uk/128469/

Export / Share Citation


Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item