Conference item
You Need Only One Clue for Effective Record Segmentation
- Abstract:
- Record segmentation is a core problem in data extraction. Previous approaches have focused on more and more sophisticated heuristics without knowledge of the concrete domain. In this work, we demonstrate that with only a single clue about mandatory attributes in a given domain, straightforward rules for record segmentation suffice to achieve 100% precise record extraction from the vast majority of web sites in that domain. These results are first outcomes of the just launched ERC project DIADEM on domain-specific intelligent automated data extraction.
Actions
Authors
- Host title:
- Proc. of 1st Intl Conf. on Web Intelligence‚ Mining and Semantics (WIMS)
- Publication date:
- 2011-01-01
- UUID:
-
uuid:6833bb7e-df67-42c1-94de-860e9f08bd68
- Local pid:
-
cs:6424
- Deposit date:
-
2015-03-31
Terms of use
- Copyright date:
- 2011
If you are the owner of this record, you can report an update to it here: Report update to this record