Conference item
On the dangers of bootstrapping generation for continual learning and beyond
- Abstract:
- The use of synthetically generated data for training models is becoming a common practice. While generated data can augment the training data, repeated training on synthetic data raises concerns about distribution drift and degradation of performance due to contamination of the dataset. We investigate the consequences of this bootstrapping process through the lens of continual learning, drawing a connection to Generative Experience Replay (GER) methods. We present a statistical analysis showing that synthetic data introduces significant bias and variance into training objectives, weakening the reliability of maximum likelihood estimation. We provide empirical evidence showing that popular generative models collapse under repeated training with synthetic data. We quantify this degradation and show that state-of-the-art GER methods fail to maintain alignment in the latent space. Our findings raise critical concerns about the use of synthetic data in continual learning.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 31.5MB, Terms of use)
-
- Publisher copy:
- 10.1007/978-3-032-12840-9_16
Authors
- Publisher:
- Springer
- Host title:
- Pattern Recognition: 47th DAGM German Conference, DAGM GCPR 2025, Freiburg, Germany, September 23–26, 2025, Proceedings
- Pages:
- 237-250
- Series:
- Lecture Notes in Computer Science
- Series number:
- 16125
- Publication date:
- 2026-01-02
- Acceptance date:
- 2025-07-16
- Event title:
- 47th DAGM German Conference on Pattern Recognition (GCPR 2025)
- Event location:
- Freiburg, Germany
- Event website:
- https://www.dagm-gcpr.de/year/2025
- Event start date:
- 2025-09-23
- Event end date:
- 2025-09-26
- DOI:
- EISSN:
-
1611-3349
- ISSN:
-
0302-9743
- EISBN:
- 9783032128409
- ISBN:
- 9783032128393
- Language:
-
English
- Keywords:
- Pubs id:
-
2369891
- Local pid:
-
pubs:2369891
- Deposit date:
-
2026-03-25
- ARK identifier:
Terms of use
- Copyright holder:
- Zverev et al
- Copyright date:
- 2026
- Rights statement:
- © 2026 The Author(s), under exclusive license to Springer Nature Switzerland AG.
- Notes:
- The author accepted manuscript (AAM) of this paper has been made available under the University of Oxford's Open Access Publications Policy, and a CC BY public copyright licence has been applied.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record