RoadRunner: Towards automatic data extraction from large Web sites
| Author(s) : | Tre Universit?? Universit?? Roma Paolo Merialdo Giansalvatore Mecca Valter Crescenzi Basilicata Universit?? Roma Tre, |
| Publisher : | N/A |
| Publication Date : | 2001 |
| ISSN : | N/A |
| Abstract : | The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper develops a novel technique to compare HTML pages and generate a wrapper based on their similarities and differences. Experimental results on real-life data-intensive Web sites confirm the feasibility of the approach. 1, |
