Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Zero-cost labelling with web feeds for weblog data extraction

Tools
- Tools
+ Tools

Gkotsis, George, Stepanyan, Karen, Cristea, Alexandra I. and Joy, Mike (2013) Zero-cost labelling with web feeds for weblog data extraction. In: 23rd International World Wide Web Conference (WWW 2013), Rio de Janeiro, Brazil, 13-17 May 2013. Published in: WWW '13 Companion : Proceedings of the 22nd international conference on World Wide Web companion pp. 73-74. ISBN 9781450320382.

[img] Text
WRAP_Gkotsis_gkotsis_stepanyan_cristea_joy_www_2013.pdf
Embargoed item. Restricted access to Repository staff only

Download (457Kb)

Request Changes to record.

Abstract

Data extraction from web pages often involves either human intervention for training a wrapper or a reduced level of granularity in the information acquired. Even though the study of social media has drawn the attention of researchers, weblogs remain a part of the web that cannot be harvested efficiently. In this paper, we propose a fully automated approach in generating a wrapper for weblogs, which exploits web feeds for cheap labelling of weblog properties. Instead of performing a pairwise comparison between posts, the model matches the values of the web feeds against their corresponding HTML elements retrieved from multiple weblog posts. It adopts a probabilistic approach for deriving a set of rules and automating the process of wrapper generation. Our evaluation shows that our approach is robust, accurate and efficient in handling different types of weblogs.

Item Type: Conference Item (Poster)
Subjects: Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
Divisions: Faculty of Science > Computer Science
Library of Congress Subject Headings (LCSH): Blogs, Data mining
Journal or Publication Title: WWW '13 Companion : Proceedings of the 22nd international conference on World Wide Web companion
Publisher: International World Wide Web Conferences Steering Committee
ISBN: 9781450320382
Official Date: 13 May 2013
Dates:
DateEvent
13 May 2013Published
Page Range: pp. 73-74
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Restricted or Subscription Access
Funder: Seventh Framework Programme (European Commission) (FP7)
Grant number: 269963 (FP7)
Conference Paper Type: Poster
Title of Event: 23rd International World Wide Web Conference (WWW 2013)
Type of Event: Conference
Location of Event: Rio de Janeiro, Brazil
Date(s) of Event: 13-17 May 2013
Related URLs:
  • Organisation
  • Publisher

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item
twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us