Journal article icon

Journal article

DIADEM: Thousands of Websites to a Single Database

Abstract:

The web is overflowing with implicitly structured data, spread over hundreds of thousands of sites, hidden deep behind search forms, or siloed in marketplaces, only accessible as HTML. Automatic extraction of structured data at the scale of thousands of websites has long proven elusive, despite its central role in the ?web of data?. Through an extensive evaluation spanning over 10000 web sites from multiple application domains, we show that automatic, yet accurate full-site extraction is no l...

Expand abstract

Actions


Authors


Tim Furche More by this author
Georg Gottlob More by this author
Giovanni Grasso More by this author
Xiaonan Guo More by this author
Giorgio Orsi More by this author
Expand authors...
Journal:
Proceedings of the VLDB Endowment (PVLDB)
Publication date:
2014
URN:
uuid:8a9f0097-1085-4880-b87a-a98f732ae3b2
Local pid:
cs:8988

Terms of use


Metrics



If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP