Journal article
Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics
- Abstract:
- Comparing strings and assessing their similarity is a basic operation in many application domains of machine learning, such as in information retrieval, natural language processing and bioinformatics. The practitioner can choose from a large variety of available similarity measures for this task, each emphasizing different aspects of the string data. In this article, we present Harry, a small tool specifically designed for measuring the similarity of strings. Harry implements over 20 similarity measures, including common string distances and string kernels, such as the Levenshtein distance and the Subsequence kernel. The tool has been designed with efficiency in mind and allows for multi-threaded as well as distributed computing, enabling the analysis of large data sets of strings. Harry supports common data formats and thus can interface with analysis environments, such as Matlab, Pylab and Weka.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Authors
- Publisher:
- Journal of Machine Learning Research
- Journal:
- Journal of Machine Learning Research More from this journal
- Volume:
- 17
- Issue:
- 9
- Pages:
- 1-33
- Publication date:
- 2016-01-01
- Acceptance date:
- 2015-12-21
- ISSN:
-
1532-4435
- Keywords:
- Pubs id:
-
pubs:581063
- UUID:
-
uuid:fcbd2a3c-16ed-4306-a024-e11c41c0da17
- Local pid:
-
pubs:581063
- Deposit date:
-
2016-11-18
Terms of use
- Copyright holder:
- Yee Whye Teh et al
- Copyright date:
- 2016
- Notes:
-
This is the
publisher's version of a journal article published by the Journal of Machine Learning Research in 2016, available online: http://jmlr.org/papers/v17/teh16a.html
If you are the owner of this record, you can report an update to it here: Report update to this record