The Library
TweetMT : a parallel microblog corpus
Tools
Vicente, Iñaki San, Alegria, Iñaki , Aranberri, Nora , España-Bonet, Cristina , Gamallo, Pablo , Gonçalo Oliveira, Hugo, Martinez Garcia, Eva , Toral , Antonio and Zubiaga, Arkaitz (2016) TweetMT : a parallel microblog corpus. In: Language Resources and Evaluation Conference, Portorož (Slovenia), 23-28 May 2016. Published in: LREC 2016 Proceedings pp. 2936-2941. ISBN 9782951740891.
|
PDF
WRAP_ Zubiaga_465_Paper.pdf - Published Version - Requires a PDF viewer. Available under License Creative Commons: Attribution-Noncommercial 4.0. Download (1703Kb) | Preview |
Official URL: http://www.lrec-conf.org/proceedings/lrec2016/inde...
Abstract
We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula. The corpus has been created by combining automatic collection and crowdsourcing approaches, and it is publicly available. It is intended for the development and testing of microtext machine translation systems. In this paper we describe the methodology followed to build the corpus, and present the results of the shared task in which it was tested.
Item Type: | Conference Item (Paper) | ||||
---|---|---|---|---|---|
Subjects: | P Language and Literature > P Philology. Linguistics | ||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||
Library of Congress Subject Headings (LCSH): | Machine translating -- Microblogs -- Research, Social Media, Spanish language, Basque language, Catalan language, Galician language, Portuguese language | ||||
Journal or Publication Title: | LREC 2016 Proceedings | ||||
Publisher: | European Language Resources Association | ||||
ISBN: | 9782951740891 | ||||
Official Date: | 26 January 2016 | ||||
Dates: |
|
||||
Page Range: | pp. 2936-2941 | ||||
Status: | Peer Reviewed | ||||
Publication Status: | Published | ||||
Access rights to Published version: | Restricted or Subscription Access | ||||
Date of first compliant deposit: | 11 April 2016 | ||||
Date of first compliant Open Access: | 23 May 2016 | ||||
Funder: | Seventh Framework Programme (European Commission) (FP7), Spain. Ministerio de Economía y Competitividad [Ministry of Economy and Competitiveness] (MINECO) | ||||
Grant number: | FP7 PEOPLE-2012-IAPP P7, FP7 grant No. 611233, MINECO TIN2012-38523-C02-01, MINECO FFI2014-51978-C2-1-R | ||||
Conference Paper Type: | Paper | ||||
Title of Event: | Language Resources and Evaluation Conference | ||||
Type of Event: | Conference | ||||
Location of Event: | Portorož (Slovenia) | ||||
Date(s) of Event: | 23-28 May 2016 | ||||
Related URLs: | |||||
Open Access Version: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year