Structure-aware machine learning over multi-relational databases

Schleich, MJ

Thesis

Structure-aware machine learning over multi-relational databases

Abstract:: We consider the problem of computing machine learning models over multi-relational databases. The mainstream approach involves a costly repeated loop that data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and train the desired model using this tool.

In this thesis, we advocate for an alternative approach that avoids this loops and instead tightly integrates the query and learning tasks into one unified solution. The primary observation is that the data-intensive computation for a variety of learning tasks can be expressed as group-by aggregates over the join of the database relations.

This observation allows us to employ a combination of known and novel state-of-the-art query evaluation techniques, which exploit structure in the query and data to optimize the computation of the aggregates. As a result, we show that, for a class of machine learning models, our integrated, structure-aware approach for the end-to-end learning of models over databases can be asymptotically faster than the mainstream solution that first constructs the feature extraction query. This class of models includes supervised machine learning problems for regression and classification, as well as unsupervised learning problems.

This theoretical development informed the design and implementation of LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. LMFAO consists of several layers of logical and code optimizations that systematically exploit factorization, sharing of computation, parallelism, and code specialization.

We conducted two types of performance benchmarks. First, we benchmark LMFAO against PostgreSQL, MonetDB, and a commercial database management system for the computation of aggregate batches. Then, we compare the performance of LMFAO against several machine learning packages commonly used in data science for the end-to-end learning pipeline of a variety of models over databases. In all benchmarks, LMFAO is able to outperform the competitors with a speedup of up to three orders of magnitude. In many cases, LMFAO can compute the end-to-end learning pipeline even faster than it takes the machine learning competitors to construct the input training dataset.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Schleich, M. J. (2019). Structure-aware machine learning over multi-relational databases [PhD thesis]. University of Oxford.

MLA Style

Schleich, M. J. Structure-Aware Machine Learning over Multi-Relational Databases. University of Oxford, 2019.

Chicago Style

Schleich, MJ. 2019. “Structure-Aware Machine Learning over Multi-Relational Databases.” PhD thesis, University of Oxford.
Share
Print

Access Document

Files:: main.pdf

(Preview, Dissemination version, Version of record, 1.5MB, Terms of use)

Authors

+ Schleich, MJ More by this author

Division:: MPLS
Department:: Computer Science
Role:: Author

Contributors

+ Olteanu, D

Role:: Supervisor

+ European Commission More from this funder

Type of award:: DPhil
Level of award:: Doctoral
Awarding institution:: University of Oxford

Language:: English
Deposit date:: 2020-06-10

Terms of use

Copyright holder:: Schleich, MJ

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Thesis

Structure-aware machine learning over multi-relational databases

Actions

Access Document

Authors

Contributors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Thesis

Structure-aware machine learning over multi-relational databases

Actions

Access Document

Authors

Contributors

Funding

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions