Conference item icon

Conference item

RED: Redundancy-driven data extraction from result pages?

Abstract:

Data-driven websites are mostly accessed through search interfaces. Such sites follow a common publishing pattern that, surprisingly, has not been fully exploited for unsupervised data extraction yet: the result of a search is presented as a paginated list of result records. Each result record contains the main attributes about one single object, and links to a page dedicated to the details of that object. We present red, an automatic approach and a prototype system for extracting data recor...

Expand abstract
Publication status:
Published
Peer review status:
Peer reviewed
Version:
Publisher's version

Actions


Access Document


Files:
Publisher copy:
10.1145/3308558.3313529

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
Computer Science
Role:
Author
ORCID:
0000-0002-3918-3807
More by this author
Institution:
University of Oxford
Division:
MPLS Division
Department:
Computer Science
Role:
Author
Publisher:
Association for Computing Machinery Publisher's website
Pages:
605-615
Publication date:
2019-05-13
Acceptance date:
2019-01-21
DOI:
Pubs id:
pubs:965550
URN:
uri:081daa4a-01f9-430d-8d04-bf35226d72c2
UUID:
uuid:081daa4a-01f9-430d-8d04-bf35226d72c2
Local pid:
pubs:965550
ISBN:
978-1-4503-6674-8

Terms of use


Metrics


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP