Home

E.: Declaratively cleaning your data using AJAX


Author(s) : Dennis Shasha Daniela Florescu Helena Galhardas Eric Simon, 
Publisher : N/A
Publication Date : 2000
ISSN : N/A
Abstract : Data quality concerns arise when correcting anomalies in a single data source, or integrating data coming from multiple sources into a single data repository. The information handled may also need to undergo a formatting and normalization process so that the resulting data is structured and presented according to the application requirements. The main causes of data anomalies are: (1) the absence of universal keys across dierent databases that is known as the object identity problem, (2) the use of dierent data formats, (3) the existence of data entry errors, and (4) the presence of inconsistencies in data. Dealing with these problems is globally called the data cleaning process. In this work, we propose a framework which models a data cleaning process as a graph of data transformations. Moreover, we propose an SQL extension for specifying each of these transformations. One important feature is the ability of explicitly including the human interaction in the process. We also permit the following performance optimizations which are tailored for data cleaning applications: mixed evaluation, neighborhood hash join and short-circuited computation. Finally, a data lineage mechanism for explanation purposes is supported.,