Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

TweetMT : a parallel microblog corpus

Tools
- Tools
+ Tools

Vicente, Iñaki San, Alegria, Iñaki , Aranberri, Nora , España-Bonet, Cristina , Gamallo, Pablo , Gonçalo Oliveira, Hugo, Martinez Garcia, Eva , Toral , Antonio and Zubiaga, Arkaitz (2016) TweetMT : a parallel microblog corpus. In: Language Resources and Evaluation Conference, Portorož (Slovenia), 23-28 May 2016. Published in: LREC 2016 Proceedings pp. 2936-2941. ISBN 9782951740891.

[img]
Preview
PDF
WRAP_ Zubiaga_465_Paper.pdf - Published Version - Requires a PDF viewer.
Available under License Creative Commons: Attribution-Noncommercial 4.0.

Download (1703Kb) | Preview
Official URL: http://www.lrec-conf.org/proceedings/lrec2016/inde...

Request Changes to record.

Abstract

We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula. The corpus has been created by combining automatic collection and crowdsourcing approaches, and it is publicly available. It is intended for the development and testing of microtext machine translation systems. In this paper we describe the methodology followed to build the corpus, and present the results of the shared task in which it was tested.

Item Type: Conference Item (Paper)
Subjects: P Language and Literature > P Philology. Linguistics
Divisions: Faculty of Science, Engineering and Medicine > Science > Computer Science
Library of Congress Subject Headings (LCSH): Machine translating -- Microblogs -- Research, Social Media, Spanish language, Basque language, Catalan language, Galician language, Portuguese language
Journal or Publication Title: LREC 2016 Proceedings
Publisher: European Language Resources Association
ISBN: 9782951740891
Official Date: 26 January 2016
Dates:
DateEvent
26 January 2016Accepted
Page Range: pp. 2936-2941
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Restricted or Subscription Access
Funder: Seventh Framework Programme (European Commission) (FP7), Spain. Ministerio de Economía y Competitividad [Ministry of Economy and Competitiveness] (MINECO)
Grant number: FP7 PEOPLE-2012-IAPP P7, FP7 grant No. 611233, MINECO TIN2012-38523-C02-01, MINECO FFI2014-51978-C2-1-R
Conference Paper Type: Paper
Title of Event: Language Resources and Evaluation Conference
Type of Event: Conference
Location of Event: Portorož (Slovenia)
Date(s) of Event: 23-28 May 2016
Related URLs:
  • Organisation
Open Access Version:
  • Publisher

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics

twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us