Thesis icon

Thesis

Power-law phenomena in Bayesian nonparametrics

Abstract:

Bayesian methods constitute a popular approach to perform statistical inference and predict phenomena of interest. Surely, part of the popularity of the Bayesian paradigm can be linked to their intuitive core idea: to take advantage of the user's prior knowledge and integrate it in the statistical procedure. The result of this synergy is the posterior distribution, focal point of Bayesian inference and instrument to quantify the uncertainty of the estimation. This thesis collects the work done as a research student on statistical models, built with the tools of Bayesian nonparametrics, to describe power-law distributed data. The motivation of the proposed models are to be found in two different fields of application: complex networks and privacy assessment.

After the introduction, in the first half of the thesis I concentrate on the proposal and analysis of models for networks, the mathematical objects that describe relations among entities by representing them as nodes connected through links. Power-laws usually appear in real-world networks as the distribution function of the degrees, namely the number of links of the nodes. The two pieces of work presented belong to the graphex process framework, a flexible generating process which allows to mimic empirically observed networks characteristics. Chapter 2 fits into this framework and extends the original proposal of Caron and Fox (2017) to provide a novel modelling approach to describe sparse networks with spatial structure or other covariates. An approximate inference strategy is provided and tested on simulated data. The paper presented in chapter 3 casts light on the asymptotic properties of networks generated under the graphex process, proving the desirable properties of sparsity, power-law degree distributions, clustering and two central limit theorems.

The second half of the thesis is devoted to the development of statistical methods to quantify the risk of disclosure, which arises whenever datasets with records of individuals are published: an intruder could match the data with prior information and disclose the identity, and therefore the sensitive features, of a person. Chapter 4 develops an estimator to quantify this risk using the Pitman-Yor process, a popular prior on probability distributions that has a distinguishable power-law tail. A closed form posterior distribution of the estimator is provided in a convenient mixture representation, and experiments on both simulated and real data show the effectiveness of the method. Chapter 5 deals with the estimation of the same risk under no distributional assumptions, using a fully nonparametric method. The estimator is extremely easy to understand, fast to compute and has provable guarantees of optimality.

Chapter 6 concludes the thesis with a final summary and some proposals to extend the current work towards new research directions.

Actions


Access Document


Files:

Authors


More by this author
Division:
MPLS
Department:
Statistics
Role:
Author

Contributors

Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Role:
Supervisor
ORCID:
0000-0002-3952-224X
Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Role:
Supervisor
ORCID:
0000-0002-0998-6174
Role:
Examiner
ORCID:
0000-0002-0583-4595
Role:
Examiner


More from this funder
Grant:
EP/L016710/1
Programme:
EP/L016710/1 - EPSRC and MRC Centre for Doctoral Training in Statistical Science: The Oxford-Warwick Statistics Programme.


Type of award:
DPhil
Level of award:
Doctoral
Awarding institution:
University of Oxford

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP