Conference item icon

Conference item

FilteredWeb: A framework for the automated search-based discovery of blocked URLs

Abstract:
Various methods have been proposed for creating and maintaining lists of potentially filtered URLs to allow for measurement of ongoing internet censorship around the world. Whilst testing a known resource for evidence of filtering can be relatively simple, given appropriate vantage points, discovering previously unknown filtered web resources remains an open challenge. We present a new framework for automating the process of discovering filtered resources through the use of adaptive queries to well-known search engines. Our system applies information retrieval algorithms to isolate characteristic linguistic patterns in known filtered web pages; these are then used as the basis for web search queries. The results of these queries are then checked for evidence of filtering, and newly discovered filtered resources are fed back into the system to detect further filtered content. Our implementation of this framework, applied to China as a case study, shows that this approach is demonstrably effective at detecting significant numbers of previously unknown filtered web pages, making a significant contribution to the ongoing detection of internet filtering as it develops. Our tool is currently deployed and has been used to discover 1355 domains that are poisoned within China as of Feb 2017—30 times more than are contained in the most widely-used public filter list. Of these, 759 are outside of the Alexa Top 1000 domains list, demonstrating the capability of this framework to find more obscure filtered content. Further, our initial analysis of filtered URLs, and the search terms that were used to discover them, gives further insight into the nature of the content currently being blocked in China.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:
Publisher copy:
10.23919/TMA.2017.8002914

Authors


More by this author
Institution:
University of Oxford
Division:
SSD
Department:
Oxford Internet Institute
Role:
Author


Publisher:
IEEE/IFIP
Host title:
2017 Network Traffic Measurement and Analysis Conference (TMA)
Journal:
etwork Traffic Measurement and Analysis Conference More from this journal
Publication date:
2017-08-08
Acceptance date:
2017-04-03
DOI:
ISBN:
9783901882951


Keywords:
Pubs id:
pubs:698908
UUID:
uuid:6321b48a-9a84-4452-8e2b-9e0ccb59ff67
Local pid:
pubs:698908
Source identifiers:
698908
Deposit date:
2017-06-09

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP