The Library
TweetNorm : a benchmark for lexical normalization of Spanish tweets
Tools
Alegria, Iñaki, Aranberri, Nora, Comas, Pere R., Fresno, Víctor, Gamallo, Pablo, Padró, Lluis, San Vicente, Iñaki, Turmo, Jordi and Zubiaga, Arkaitz (2015) TweetNorm : a benchmark for lexical normalization of Spanish tweets. Language Resources and Evaluation, 49 (4). pp. 883-905. doi:10.1007/s10579-015-9315-6 ISSN 1574-020X.
|
PDF
WRAP_1373353-cs-041115-alegria-2015-tweetnorm.pdf - Accepted Version - Requires a PDF viewer. Download (490Kb) | Preview |
Official URL: http://dx.doi.org/10.1007/s10579-015-9315-6
Abstract
The language used in social media is often characterized by the abundance of informal and non-standard writing. The normalization of this non-standard language can be crucial to facilitate the subsequent textual processing and to consequently help boost the performance of natural language processing tools applied to social media text. In this paper we present a benchmark for lexical normalization of social media posts, specifically for tweets in Spanish language. We describe the tweet normalization challenge we organized recently, analyze the performance achieved by the different systems submitted to the challenge, and delve into the characteristics of systems to identify the features that were useful. The organization of this challenge has led to the production of a benchmark for lexical normalization of social media, including an evaluation framework, as well as an annotated corpus of Spanish tweets—TweetNorm_es—, which we make publicly available. The creation of this benchmark and the evaluation has brought to light the types of words that submitted systems did best with, and posits the main shortcomings to be addressed in future work.
Item Type: | Journal Article | ||||||
---|---|---|---|---|---|---|---|
Subjects: | P Language and Literature > P Philology. Linguistics | ||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||
Library of Congress Subject Headings (LCSH): | Computational linguistics, Social media, Corpora (Linguistics), Spanish language | ||||||
Journal or Publication Title: | Language Resources and Evaluation | ||||||
Publisher: | Springer | ||||||
ISSN: | 1574-020X | ||||||
Official Date: | December 2015 | ||||||
Dates: |
|
||||||
Volume: | 49 | ||||||
Number: | 4 | ||||||
Page Range: | pp. 883-905 | ||||||
DOI: | 10.1007/s10579-015-9315-6 | ||||||
Status: | Peer Reviewed | ||||||
Publication Status: | Published | ||||||
Access rights to Published version: | Restricted or Subscription Access | ||||||
Date of first compliant deposit: | 28 July 2016 | ||||||
Date of first compliant Open Access: | 15 August 2016 |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year