The Library
Histograms and wavelets on probabilistic data
Tools
Cormode, Graham and Garofalakis, Minos (2009) Histograms and wavelets on probabilistic data. In: IEEE 25th International Conference on Data Engineering, 2009. ICDE '09. , Shanghai, 29 Mar - 9-Apr 2009. Published in: IEEE 25th International Conference onData Engineering, 2009. ICDE '09. pp. 293-304. ISBN 9781424434220. doi:10.1109/ICDE.2009.74 ISSN 1084-4627.
Research output not available from this repository.
Request-a-Copy directly from author or use local Library Get it For Me service.
Official URL: http://dx.doi.org/10.1109/ICDE.2009.74
Abstract
There is a growing realization that uncertain information is a first-class citizen in modern database management. As such, we need techniques to correctly and efficiently process uncertain data in database systems. In particular, data reduction techniques that can produce concise, accurate synopses of large probabilistic relations are crucial. Similar to their deterministic relation counterparts, such compact probabilistic data synopses can form the foundation for human understanding and interactive data exploration, probabilistic query planning and optimization, and fast approximate query processing in probabilistic database systems. In this paper, we introduce definitions and algorithms for building histogram- and Haar wavelet-based synopses on probabilistic data. The core problem is to choose a set of histogram bucket boundaries or wavelet coefficients to optimize the accuracy of the approximate representation of a collection of probabilistic tuples under a given error metric. For a variety of different error metrics, we devise efficient algorithms that construct optimal or near optimal size B histogram and wavelet synopses. This requires careful analysis of the structure of the probability distributions, and novel extensions of known dynamic programming-based techniques for the deterministic domain. Our experiments show that this approach clearly outperforms simple ideas, such as building summaries for samples drawn from the data distribution, while taking equal or less time.
Item Type: | Conference Item (Paper) | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics | ||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||
Library of Congress Subject Headings (LCSH): | Wavelets (Mathematics), Probabilistic databases, Graph theory | ||||
Journal or Publication Title: | IEEE 25th International Conference onData Engineering, 2009. ICDE '09. | ||||
Publisher: | IEEE Computer Society | ||||
ISBN: | 9781424434220 | ||||
ISSN: | 1084-4627 | ||||
Book Title: | 2009 IEEE 25th International Conference on Data Engineering | ||||
Official Date: | 2009 | ||||
Dates: |
|
||||
Page Range: | pp. 293-304 | ||||
DOI: | 10.1109/ICDE.2009.74 | ||||
Status: | Peer Reviewed | ||||
Publication Status: | Published | ||||
Conference Paper Type: | Paper | ||||
Title of Event: | IEEE 25th International Conference on Data Engineering, 2009. ICDE '09. | ||||
Type of Event: | Conference | ||||
Location of Event: | Shanghai | ||||
Date(s) of Event: | 29 Mar - 9-Apr 2009 |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |