Journal article icon

Journal article

Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics

Abstract:
Comparing strings and assessing their similarity is a basic operation in many application domains of machine learning, such as in information retrieval, natural language processing and bioinformatics. The practitioner can choose from a large variety of available similarity measures for this task, each emphasizing different aspects of the string data. In this article, we present Harry, a small tool specifically designed for measuring the similarity of strings. Harry implements over 20 similarity measures, including common string distances and string kernels, such as the Levenshtein distance and the Subsequence kernel. The tool has been designed with efficiency in mind and allows for multi-threaded as well as distributed computing, enabling the analysis of large data sets of strings. Harry supports common data formats and thus can interface with analysis environments, such as Matlab, Pylab and Weka.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Authors


More by this author
Institution:
University of Oxford
Oxford college:
New College
Role:
Author
More by this author
Institution:
University of Oxford
Oxford college:
University College
Role:
Author


Publisher:
Journal of Machine Learning Research
Journal:
Journal of Machine Learning Research More from this journal
Volume:
17
Issue:
9
Pages:
1-33
Publication date:
2016-01-01
Acceptance date:
2015-12-21
ISSN:
1532-4435


Keywords:
Pubs id:
pubs:581063
UUID:
uuid:fcbd2a3c-16ed-4306-a024-e11c41c0da17
Local pid:
pubs:581063
Deposit date:
2016-11-18

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP