Thesis
Numerical optimization with applications to machine learning
- Abstract:
-
The aim of this thesis is to develop scalable numerical optimization methods that can be used to address large problems of practical interest in the Machine Learning (ML) community.
We begin with a motivating application, Bayesian Optimization (BO), which is a popular ML method employed for hyperparameter tuning of expensive, black-box functions. We show that significant improvements can be brought to the scaling and the computational resources required by batch BO by adopting a distributionally ambiguous framework which replaces expensive integrations with Semidefinite Programming problems (SDPs). Typically, thousands of semidefinite problems are solved when performing BO, which makes their efficient solution essential.
Driven by the above application, we adapt the popular Alternating Method of Multipliers (ADMM) to allow for faster solution of general semidefinite problems. In semidefinite programming, ADMM spends most of its time in projecting matrices into the semidefinite cone. We suggest using state-of-the-art eigensolvers for this task, which brings a significant, up to 20x, speedup to ADMM in standard problems and up to 10x for SDPs arising from our BO algorithm. We show that useful bounds for the projection errors can be computed efficiently. This considerably improves standard eigenvalue perturbation theory bounds (e.g. Davis-Kahan bounds) that suggest the projection accuracy to be inversely proportional to the spectral gap. Our results allow to tightly control the projection errors, so that convergence of ADMM either to a solution, or (for the first time in an approximate ADMM framework) to a certificate of infeasibility is guaranteed. Unlike SDP, in quadratic programming ADMM spends most of its time in solving linear systems. We show how these linear systems can be solved efficiently in the quadratic problem of doubly stochastic matrix approximation. Our analysis reveals a surprising structure in the doubly stochastic approximation problem which allows the scaling of very large, sparse matrices, arising, for example, from the 3D structure of the human genome, or spectral clustering.
Finally, we devise an active set algorithm for the solution of nonconvex quadratic problems that include a norm constraint. Our approach is based on repeated solutions of the famous Trust-Region Subproblem, for which we derive novel results regarding the existence and the computation of its local-nonglobal minimizer. The resulting algorithm was demonstrated to be useful in sparse PCA and sequential quadratic programming.
Actions
+ Engineering and Physical Sciences Research Council
More from this funder
- Funder identifier:
- http://dx.doi.org/10.13039/100002322
- Grant:
- EP/L015987/1
- Programme:
- EPSRC AIMS CDT
+ Industry partner of EPSRC AIMS CDT
More from this funder
- Programme:
- Industry partner of EPSRC AIMS CDT
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2020-06-30
Terms of use
- Copyright holder:
- Nikitas, R
- Copyright date:
- 2019
- Rights statement:
- © Nikitas, R
If you are the owner of this record, you can report an update to it here: Report update to this record