Hierarchical clustering: Objective functions and algorithms

Cohen-Addad, V; Kanade, V; Mallmann-Trenn, F; Mathieu, C

Conference item

Hierarchical clustering: Objective functions and algorithms

Abstract:: Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly finer granularity. Motivated by the fact that most work on hierarchical clustering was based on providing algorithms, rather than optimizing a specific objective, Dasgupta (2016) framed similarity-based hierarchical clustering as a combinatorial optimization problem, where a `good' hierarchical clustering is one that minimizes some cost function. He showed that this cost function has certain desirable properties, such as in order to achieve optimal cost disconnected components must be separated first and that in `structureless' graphs, i.e., cliques, all clusterings achieve the same cost.

We take an axiomatic approach to defining `good' objective functions for both similarity and dissimilarity-based hierarchical clustering. We characterize a set of admissible objective functions (that includes the one introduced by Dasgupta) that have the property that when the input admits a `natural' ground-truth hierarchical clustering, the ground-truth clustering has an optimal value.

Equipped with a suitable objective function, we analyze the performance of practical algorithms, as well as develop better and faster algorithms for hierarchical clustering. For similarity-based hierarchical clustering, Dasgupta (2016) showed that a simple recursive sparsest-cut based approach achieves an O(log^3/2 n)-approximation on worst-case inputs. We give a more refined analysis of the algorithm and show that it in fact achieves an O(√log n)-approximation. This improves upon the LP-based O(log n)-approximation of Roy and Pokutta (2016). For dissimilarity-based hierarchical clustering, we show that the classic average-linkage algorithm gives a factor 2 approximation, and provide a simple and better algorithm that gives a factor 3/2 approximation. This aims at explaining the success of this heuristics in practice. Finally, we consider `beyond-worst-case' scenario through a generalisation of the stochastic block model for hierarchical clustering. We show that Dasgupta's cost function also has desirable properties for these inputs and we provide a simple algorithm that for graphs generated according to this model yields a 1 + o(1) factor approximation.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Cohen-Addad, V., Kanade, V., Mallmann-Trenn, F., & Mathieu, C. (2018). Hierarchical clustering: Objective functions and algorithms. ACM-SIAM Symposium on Discrete Mathematics 2018.

MLA Style

Cohen-Addad, V., et al. “Hierarchical Clustering: Objective Functions and Algorithms.” ACM-SIAM Symposium on Discrete Mathematics 2018, Society for Industrial and Applied Mathematics, 2018.

Chicago Style

Cohen-Addad, V, V Kanade, F Mallmann-Trenn, and C Mathieu. 2018. “Hierarchical Clustering: Objective Functions and Algorithms.” In ACM-SIAM Symposium on Discrete Mathematics 2018. Society for Industrial and Applied Mathematics.
Share
Print

Access Document

Files:: Hierarchical clustering - Objective functions and algorithms.pdf

(Preview, Accepted manuscript, pdf, 517.4KB, Terms of use)

Publisher copy:: 10.1137/1.9781611975031.26

Authors

+ Cohen-Addad, V More by this author

Role:: Author

+ Kanade, V More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Author

+ Mallmann-Trenn, F More by this author

Role:: Author

+ Mathieu, C More by this author

Role:: Author

+ Engineering and Physical Sciences Research Council More from this funder

Grant:: EP/N510129/1

Publisher:: Society for Industrial and Applied Mathematics
Host title:: ACM-SIAM Symposium on Discrete Mathematics 2018
Journal:: ACM-SIAM Symposium on Discrete Mathematics 2018 More from this journal
Publication date:: 2018-01-10
Acceptance date:: 2017-09-27
DOI:: 10.1137/1.9781611975031.26

Pubs id:: pubs:735174
UUID:: uuid:b14a2ba4-7ca9-4335-917b-fed539a012b6
Local pid:: pubs:735174
Source identifiers:: 735174
Deposit date:: 2017-10-12

Terms of use

Copyright holder:: Cohen-Addad et al
Notes:: Copyright © 2018 Copyright for this paper is retained by authors. This is the accepted manuscript version of the article. The final version is available online from Society for Industrial and Applied Mathematics at: http://dx.doi.org/10.1137/1.9781611975031.26

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

Hierarchical clustering: Objective functions and algorithms

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

Hierarchical clustering: Objective functions and algorithms

Actions

Access Document

Authors

Funding

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions