Thesis
Empirical performance of simple statistical inference-type models with feature extraction in comparison to modern machine learning methods concerning regression and classification problems
- Abstract:
-
While most statistical methods used in the industry are of simple structure -- like decision trees and logistic regression -- many machine learning competitions are won by modern complex methods, which also outperformed the more simple methods in previous benchmark studies. In these benchmark studies, the results were usually obtained over raw and unprocessed datasets. In practice, when developing a model, a feature extraction is performed first to create non-redundant and useful covariates for the model. This is especially important for the simple methods, which often require approximately linear features without severe outliers.
In this thesis, we investigate the performance of modern machine learning methods on raw data, without any additionally extracted features, in comparison to simpler and more interpretable models on data together with feature extraction. For this purpose we employ certain preprocessing and feature extraction algorithms to objectively approximate this task over 32 different datasets.
Our results show that, for classification problems, the simple methods perform almost as well as the more complex methods. This is true for both binary classification, regarding the metric area under the ROC curve, and for multiclass classification with regards to the weighted F1 Score metric.
In our regression problems, the modern methods outperform the simpler methods and seem superior to the more interpretable methods in terms of their respective R-squared or explained variance.
We suspect that the difference between classification and regression problems might be due to the evaluation metrics used; specifically how they react to the monotonicity or linearity of the extracted features, but more research is necessary to investigate this difference.
Actions
Authors
Contributors
- Role:
- Supervisor
- Institution:
- University of Oxford
- Division:
- MPLS
- Department:
- Mathematical Institute
- Role:
- Supervisor
- DOI:
- Type of award:
- MSc
- Level of award:
- Masters
- Awarding institution:
- University of Oxford
- Keywords:
- Subjects:
- UUID:
-
uuid:470caa81-e106-4c87-af02-1aeaf6380269
- Deposit date:
-
2019-10-03
Terms of use
- Copyright holder:
- Dittgen, M
- Copyright date:
- 2019
If you are the owner of this record, you can report an update to it here: Report update to this record