DeustoTeka Examinando por Autor "Moles, Luis"

Examinando por Autor "Moles, Luis"

Mostrando 1 - 3 de 3

Enhancing design of experiments through uncertainty estimation and synthetic data generation
(Elsevier B.V., 2026-03) Moles, Luis; Andrés Fernández, Alain ; Echegaray López, Goretti ; Boto Sánchez, Fernando
Design of Experiments is a key methodology for optimizing machine learning models, but traditional methods often depend on extensive real data collection, which is costly and time-consuming. Moreover, predefined experimental designs may struggle at adapting to complex or high-dimensional input spaces, sometimes leading to inefficient exploration, especially when data are scarce and uncertainty is high. To address these challenges, we propose a methodology that integrates uncertainty estimation with synthetic data generation. First, we evaluate several uncertainty estimators (Gaussian Process, Monte Carlo Dropout and Tree-based ensembles) which identify the input regions where the current model is most uncertain. Next, we analyze different generative models (Variational Autoencoders, Generative Adversarial Networks, and Large Language Models) trained under varying levels of data availability (from only 10% of the real dataset up to full data), to test their robustness in extreme scarcity conditions. Finally, we combine the best uncertainty estimator with the most reliable generative model in a hybrid active learning pipeline. Beyond the standard setting, we systematically vary the number and proportion of synthetic versus real samples, showing how the mixture affects predictive accuracy and uncertainty reduction. Results of the experimentation show that Gaussian Process uncertainty estimation outperforms other tested methods under extreme data scarcity, and that Variational Autoencoders produce the most stable synthetic samples with as little as 10% of the real data used for training. The full hybrid loop (Gaussian Process + Variational Autoencoder) achieves similar R2 to baselines while driving down uncertainty significantly faster, offering a data-efficient strategy for costly experimental contexts.
Exploring data augmentation and active learning benefits in imbalanced datasets
(Multidisciplinary Digital Publishing Institute (MDPI), 2024-06) Moles, Luis; Andrés Fernández, Alain; Echegaray, Goretti; Boto Sánchez, Fernando
Despite the increasing availability of vast amounts of data, the challenge of acquiring labeled data persists. This issue is particularly serious in supervised learning scenarios, where labeled data are essential for model training. In addition, the rapid growth in data required by cutting-edge technologies such as deep learning makes the task of labeling large datasets impractical. Active learning methods offer a powerful solution by iteratively selecting the most informative unlabeled instances, thereby reducing the amount of labeled data required. However, active learning faces some limitations with imbalanced datasets, where majority class over-representation can bias sample selection. To address this, combining active learning with data augmentation techniques emerges as a promising strategy. Nonetheless, the best way to combine these techniques is not yet clear. Our research addresses this question by analyzing the effectiveness of combining both active learning and data augmentation techniques under different scenarios. Moreover, we focus on improving the generalization capabilities for minority classes, which tend to be overshadowed by the improvement seen in majority classes. For this purpose, we generate synthetic data using multiple data augmentation methods and evaluate the results considering two active learning strategies across three imbalanced datasets. Our study shows that data augmentation enhances prediction accuracy for minority classes, with approaches based on CTGANs obtaining improvements of nearly 50% in some cases. Moreover, we show that combining data augmentation techniques with active learning can reduce the amount of real data required.
On the use of machine learning for predicting femtosecond laser grooves in tribological applications
(Elsevier Ltd, 2024-12) Moles, Luis; Llavori, Iñigo; Aginagalde, Andrea; Echegaray, Goretti; Bruneel, David; Boto Sánchez, Fernando; Zabala Eguren, Alaitz
Femtosecond laser surface texturing is gaining increased interest for optimizing tribological behaviour. However, the laser surface texturing parameter selection is often conducted through time-consuming and inefficient trial-and-error processes. Although machine learning emerges as an interesting option, multitude of models exists, and determining the most suitable one for predicting femtosecond laser textures remains uncertain. Furthermore, the absence of open-source implementations and the expertise required for their utilization hinders their adoption within the tribology community. In this study, two novel inverse modelling approaches for the optimal prediction of femtosecond laser parameters are proposed, based on the results of a comparison between six different machine learning models conducted within this research. The entire development relies on open-source tools, and the models employed are shared, with the aim of democratizing these techniques and facilitating their adoption by non-expert users within the tribology community.

Examinando por Autor "Moles, Luis"

Resultados por página

Opciones de ordenación