Supervised sampling for clustering large data sets
Kosmidis, Ioannis and Karlis, Dimitris (2010) Supervised sampling for clustering large data sets. Working Paper. Coventry: University of Warwick. Centre for Research in Statistical Methodology. Working papers, Vol.2010 (No.10).
WRAP_Kosmidis_10-10w.pdf - Published Version - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Official URL: http://www2.warwick.ac.uk/fac/sci/statistics/crism...
The problem of clustering large data sets has attracted a lot of current research.
The approaches taken are mainly based either on the more efficient implementation or
modification of existing methods or/and on the construction of clusters from a small
sub-sample of the data and then the assignment of all observations in those clusters.
The current paper focuses on the latter direction. An alternative supervised procedure
to create the clusters is proposed. For learning the clusters, the procedure is using
subsets of the data which are still constructed via sub-sampling but within partitions of
the observation space. The general applicability of the approach is discussed together
with tuning the parameters that it depends on to increase its ability. The procedure
is applied to clustering the navigation patterns in the msnbc.com database.
|Item Type:||Working or Discussion Paper (Working Paper)|
|Subjects:||Q Science > QA Mathematics|
|Divisions:||Faculty of Science > Statistics|
|Library of Congress Subject Headings (LCSH):||Cluster analysis, Sampling (Statistics)|
|Series Name:||Working papers|
|Publisher:||University of Warwick. Centre for Research in Statistical Methodology|
|Place of Publication:||Coventry|
|Official Date:||June 2010|
|Number of Pages:||17|
|Status:||Not Peer Reviewed|
|Access rights to Published version:||Open Access|
Bradley, P., U. Fayyad, and C. Reina (1998a). Scaling clustering algorithms to large
Actions (login required)