Conference item icon

Conference item

You Need Only One Clue for Effective Record Segmentation

Abstract:
Record segmentation is a core problem in data extraction. Previous approaches have focused on more and more sophisticated heuristics without knowledge of the concrete domain. In this work, we demonstrate that with only a single clue about mandatory attributes in a given domain, straightforward rules for record segmentation suffice to achieve 100% precise record extraction from the vast majority of web sites in that domain. These results are first outcomes of the just launched ERC project DIADEM on domain-specific intelligent automated data extraction.

Actions


Authors



Host title:
Proc. of 1st Intl Conf. on Web Intelligence‚ Mining and Semantics (WIMS)
Publication date:
2011-01-01


UUID:
uuid:6833bb7e-df67-42c1-94de-860e9f08bd68
Local pid:
cs:6424
Deposit date:
2015-03-31

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP