Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Supervised sampling for clustering large data sets

Tools
- Tools
+ Tools

Kosmidis, Ioannis and Karlis, Dimitris (2010) Supervised sampling for clustering large data sets. Working Paper. Coventry: University of Warwick. Centre for Research in Statistical Methodology. Working papers, Vol.2010 (No.10).

[img]
Preview
PDF
WRAP_Kosmidis_10-10w.pdf - Published Version - Requires a PDF viewer.

Download (1226Kb)
Official URL: http://www2.warwick.ac.uk/fac/sci/statistics/crism...

Request Changes to record.

Abstract

The problem of clustering large data sets has attracted a lot of current research.
The approaches taken are mainly based either on the more efficient implementation or
modification of existing methods or/and on the construction of clusters from a small
sub-sample of the data and then the assignment of all observations in those clusters.
The current paper focuses on the latter direction. An alternative supervised procedure
to create the clusters is proposed. For learning the clusters, the procedure is using
subsets of the data which are still constructed via sub-sampling but within partitions of
the observation space. The general applicability of the approach is discussed together
with tuning the parameters that it depends on to increase its ability. The procedure
is applied to clustering the navigation patterns in the msnbc.com database.

Item Type: Working or Discussion Paper (Working Paper)
Subjects: Q Science > QA Mathematics
Divisions: Faculty of Science > Statistics
Library of Congress Subject Headings (LCSH): Cluster analysis, Sampling (Statistics)
Series Name: Working papers
Publisher: University of Warwick. Centre for Research in Statistical Methodology
Place of Publication: Coventry
Official Date: June 2010
Dates:
DateEvent
June 2010Published
Volume: Vol.2010
Number: No.10
Number of Pages: 17
Institution: University of Warwick
Status: Not Peer Reviewed
Access rights to Published version: Open Access

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics

twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us