Enhancing design of experiments through uncertainty estimation and synthetic data generation
| dc.contributor.author | Moles, Luis | |
| dc.contributor.author | Andrés Fernández, Alain | |
| dc.contributor.author | Echegaray López, Goretti | |
| dc.contributor.author | Boto Sánchez, Fernando | |
| dc.date.accessioned | 2026-03-13T11:49:23Z | |
| dc.date.available | 2026-03-13T11:49:23Z | |
| dc.date.issued | 2026-03 | |
| dc.date.updated | 2026-03-13T11:49:23Z | |
| dc.description.abstract | Design of Experiments is a key methodology for optimizing machine learning models, but traditional methods often depend on extensive real data collection, which is costly and time-consuming. Moreover, predefined experimental designs may struggle at adapting to complex or high-dimensional input spaces, sometimes leading to inefficient exploration, especially when data are scarce and uncertainty is high. To address these challenges, we propose a methodology that integrates uncertainty estimation with synthetic data generation. First, we evaluate several uncertainty estimators (Gaussian Process, Monte Carlo Dropout and Tree-based ensembles) which identify the input regions where the current model is most uncertain. Next, we analyze different generative models (Variational Autoencoders, Generative Adversarial Networks, and Large Language Models) trained under varying levels of data availability (from only 10% of the real dataset up to full data), to test their robustness in extreme scarcity conditions. Finally, we combine the best uncertainty estimator with the most reliable generative model in a hybrid active learning pipeline. Beyond the standard setting, we systematically vary the number and proportion of synthetic versus real samples, showing how the mixture affects predictive accuracy and uncertainty reduction. Results of the experimentation show that Gaussian Process uncertainty estimation outperforms other tested methods under extreme data scarcity, and that Variational Autoencoders produce the most stable synthetic samples with as little as 10% of the real data used for training. The full hybrid loop (Gaussian Process + Variational Autoencoder) achieves similar R2 to baselines while driving down uncertainty significantly faster, offering a data-efficient strategy for costly experimental contexts. | en |
| dc.description.sponsorship | The authors gratefully acknowledge the financial support given by the Basque Government (Eusko Jaurlaritza) under “Programa de apoyo a la investigación colaborativa en áreas estratégicas” (Project BISUM II: Ref. KK-2024/00048) programs | en |
| dc.identifier.citation | Moles, L., Andres, A., Echegaray, G., & Boto, F. (2026). Enhancing design of experiments through uncertainty estimation and synthetic data generation. Results in Engineering, 29. https://doi.org/10.1016/J.RINENG.2026.109409 | |
| dc.identifier.doi | 10.1016/J.RINENG.2026.109409 | |
| dc.identifier.eissn | 2590-1230 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14454/5440 | |
| dc.language.iso | eng | |
| dc.publisher | Elsevier B.V. | |
| dc.subject.other | Data augmentation | |
| dc.subject.other | Design of experiments | |
| dc.subject.other | Gaussian process | |
| dc.subject.other | Synthetic data | |
| dc.subject.other | Uncertainty estimation | |
| dc.title | Enhancing design of experiments through uncertainty estimation and synthetic data generation | en |
| dc.type | journal article | |
| dcterms.accessRights | open access | |
| oaire.citation.title | Results in Engineering | |
| oaire.citation.volume | 29 | |
| oaire.licenseCondition | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
| oaire.version | VoR |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- moles_enhancing_2026.pdf
- Tamaño:
- 7.32 MB
- Formato:
- Adobe Portable Document Format