Explainable multimodal foundational models for retinal disease stratification: a robustness study across 15+ heterogeneous datasets

Osa Sánchez, Ainhoa; El-Baz, Ayman; Oleagordia Ruiz, Ibon; García-Zapirain, Begoña

Explainable multimodal foundational models for retinal disease stratification: a robustness study across 15+ heterogeneous datasets

dc.contributor.author	Osa Sánchez, Ainhoa
dc.contributor.author	El-Baz, Ayman
dc.contributor.author	Oleagordia Ruiz, Ibon
dc.contributor.author	García-Zapirain, Begoña
dc.date.accessioned	2026-04-13T14:59:46Z
dc.date.available	2026-04-13T14:59:46Z
dc.date.issued	2026-02-25
dc.date.updated	2026-04-13T14:59:46Z
dc.description.abstract	The automated stratification of retinal diseases remains a significant challenge due to data heterogeneity and the closed-box nature of deep learning models. Although foundational models have demonstrated remarkable success in general computer vision, their clinical reliability and interpretability in multimodal ophthalmology remain insufficiently explored. In this work, we introduce an Explainable Multimodal Foundational AI framework trained on a large-scale integrated corpus of 760,243 retinal images collected from over 15 heterogeneous repositories, encompassing both fundus photography and optical coherence tomography (OCT). We systematically evaluate self-supervised learning (SSL) paradigms DINO and iBOT across convolutional (ResNet) and Transformer-based (Vision Transformer, ViT) architectures. Our results show that ResNet-DINO achieves state-of-the-art performance, reaching 93.53% accuracy and a 0.935 F1-score in 6-class multimodal retinal disease classification, while exhibiting superior robustness under data-limited conditions, attributed to its inductive bias. Notably, we observe emergent clinical localization capabilities in Vision Transformer models (ViT-DINOv2 and ViT-iBOT). Using frozen pre-trained weights and without exposure to expert-labeled data or ground truth labels, these models autonomously highlight clinically relevant biomarkers, including subretinal fluid and drusen, demonstrating intrinsic pathological awareness. By bridging the semantic gap between unsupervised representation learning and targeted clinical diagnosis, this study establishes a benchmark for robust, explainable, and label-efficient AI in ophthalmology. Our findings indicate that large-scale foundational pre-training not only enhances diagnostic accuracy but also induces meaningful visual priors aligned with established clinical biomarkers, supporting the deployment of trustworthy AI systems in real-world clinical decision support.	en
dc.description.sponsorship	This work was supported by the Basque Government through the Hazitek 2024 program, Spain, within the framework of the IRUD-IA project: ‘‘Medical Image Analysis Technologies with Artificial Intelligence for the Development of Medical Devices,’’ project code ZE-2024/00030	en
dc.identifier.citation	Osa-Sanchez, A., El-Baz, A., Oleagordia-Ruiz, I., & Garcia-Zapirain, B. (2026). Explainable multimodal foundational models for retinal disease stratification: a robustness study across 15+ heterogeneous datasets. IEEE Access, 14, 31567-31579. https://doi.org/10.1109/ACCESS.2026.3668034
dc.identifier.doi	10.1109/ACCESS.2026.3668034
dc.identifier.eissn	2169-3536
dc.identifier.uri	https://hdl.handle.net/20.500.14454/5627
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.rights	© 2026 The Authors
dc.subject.other	Explainable AI (XAI)
dc.subject.other	Foundation models
dc.subject.other	Large-scale ophthalmic benchmark
dc.subject.other	Multimodal fusion
dc.subject.other	Retinal pathology
dc.subject.other	Self-supervised learning
dc.title	Explainable multimodal foundational models for retinal disease stratification: a robustness study across 15+ heterogeneous datasets	en
dc.type	journal article
dcterms.accessRights	open access
oaire.citation.endPage	31579
oaire.citation.startPage	31567
oaire.citation.title	IEEE Access
oaire.citation.volume	14
oaire.licenseCondition	https://creativecommons.org/licenses/by/4.0/
oaire.version	VoR

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: osa_explainable_2026.pdf
Tamaño:: 1.89 MB
Formato:: Adobe Portable Document Format

Descargar

Colecciones

Artículos