Identification of mathematical patterns in genomic spectrograms linked to variant classification in complete SARS-CoV-2 sequences

dc.contributor.authorGuerrero Tamayo, Ana
dc.contributor.authorSanz Urquijo, Borja
dc.contributor.authorMoragues Tosantos, María Dolores
dc.contributor.authorOlivares, Isabel
dc.contributor.authorCasado, Concepción
dc.contributor.authorPastor López, Iker
dc.date.accessioned2026-01-12T10:52:59Z
dc.date.available2026-01-12T10:52:59Z
dc.date.issued2025-12-05
dc.date.updated2026-01-12T10:52:59Z
dc.description.abstractBuilding on previous studies, we identified mathematical patterns in HIV-1 and SARS-CoV-2 genomes using transfer learning and explainability with a pre-trained CNN on genomic spectrograms. These patterns seemed to define viral characteristics, leading us to hypothesize that inherent mathematical patterns in a virus’s genome determine its features. To explore this further, we focused on SARS-CoV-2 variant classification, designing a methodology with genomic spectrograms, a two-stage transfer learning approach, and two-step explainability. This approach identified genomic regions and nucleotide frequency patterns that characterize specific variants, revealing clear, distinguishable patterns for each category. The distinct and consistent total regions of high activation for each variant highlight the significance of the genomic region from the beginning of S gene to the end of 3’UTR in identifying the variants under study. The frequencies and particularly within this region appeared to play a key role in their identification. The shared prominence of in the final segment of the genome for both pre-VOC and Omicron (despite different pattern shapes) may hint at a phylogenetic connection in SARS-CoV-2 or even suggest that Omicron evolved from a pre-VOC lineage. The confirmation that mathematical patterns are associated with variant classification represents a step forward in demonstrating that these patterns play a role in viral characterization, suggesting the existence of an additional layer of genomic information that may enable virus characterization in a low-computing, and efficient manner compared to traditional methodologies.en
dc.description.sponsorshipThis work was supported by the Research Training Grants Program - University of Deusto. Ref. FPI UD_2021_10en
dc.identifier.citationGuerrero-Tamayo, A., Urquijo, B. S., Tosantos, M.-D. M., Olivares, I., Casado, C., & Pastor-López, I. (2025). Identification of mathematical patterns in genomic spectrograms linked to variant classification in complete SARS-CoV-2 sequences. Scientific Reports, 15(1). https://doi.org/10.1038/S41598-025-27279-0
dc.identifier.doi10.1038/S41598-025-27279-0
dc.identifier.eissn2045-2322
dc.identifier.urihttps://hdl.handle.net/20.500.14454/4682
dc.language.isoeng
dc.publisherNature Research
dc.rights© The Author(s) 2025
dc.subject.otherExplainability
dc.subject.otherGenomic spectrogram
dc.subject.otherMathematical pattern
dc.subject.otherSARS-CoV-2
dc.subject.otherTransfer learning
dc.subject.otherVariant
dc.titleIdentification of mathematical patterns in genomic spectrograms linked to variant classification in complete SARS-CoV-2 sequencesen
dc.typejournal article
dcterms.accessRightsopen access
oaire.citation.issue1
oaire.citation.titleScientific Reports
oaire.citation.volume15
oaire.licenseConditionhttps://creativecommons.org/licenses/by-nc-nd/4.0/
oaire.versionVoR
Archivos
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
guerrero_identification_2025.pdf
Tamaño:
4.45 MB
Formato:
Adobe Portable Document Format
Colecciones