Book section : Chapter
Topological analysis of credit data: preliminary findings
- Abstract:
- There is plenty of room for improvement in credit risk prediction. Intuitively, similar customers should have similar credit risk. Capturing this similarity is often carried out using Euclidean distances between customer features and predicting credit default via logistic regression. Here we explore the use of topological data analysis for describing this similarity. In particular, persistent homology algorithms provide summaries of point clouds which relate to their topology. This approach has been shown to be useful in many applications but to the best of our knowledge, applying topological data analysis to prediction of credit risk is novel. We develop a pipeline which is based on the topological analysis of neighbourhoods of customers, with the neighbourhoods given through a geometric network construction. Using two data sets from the Lending Club we find a modest signal; the results have high variance, but they could be seen as indication that including such topological features could improve credit risk prediction when used as additional explanatory variable in a logistic regression.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 749.7KB, Terms of use)
-
- Publisher copy:
- 10.1007/978-3-031-21753-1_42
Authors
- Publisher:
- Springer
- Host title:
- Proceedings of the 23rd International Conference on Intelligent Data Engineering and Automated Learning (IDEAL)
- Volume:
- 13756
- Pages:
- 432–442
- Series:
- Lecture Notes in Computer Science
- Publication date:
- 2022-11-21
- DOI:
- EISSN:
-
1611-3349
- ISSN:
-
0302-9743
- EISBN:
- 9783031217531
- ISBN:
- 9783031217524
- Language:
-
English
- Keywords:
- Subtype:
-
Chapter
- Pubs id:
-
1308853
- Local pid:
-
pubs:1308853
- Deposit date:
-
2022-11-25
Terms of use
- Copyright holder:
- Cooper et al
- Copyright date:
- 2022
- Rights statement:
- © 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
- Notes:
- This paper was presented at the 23rd International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2022), 24th-26th November 2022, Manchester, UK. This is the accepted manuscript version of the article. The final version is available online from Springer at: https://doi.org/10.1007/978-3-031-21753-1_42
If you are the owner of this record, you can report an update to it here: Report update to this record