Identification of mathematical patterns in genomic spectrograms linked to variant classification in complete SARS-CoV-2 sequences

Resumen
Building on previous studies, we identified mathematical patterns in HIV-1 and SARS-CoV-2 genomes using transfer learning and explainability with a pre-trained CNN on genomic spectrograms. These patterns seemed to define viral characteristics, leading us to hypothesize that inherent mathematical patterns in a virus’s genome determine its features. To explore this further, we focused on SARS-CoV-2 variant classification, designing a methodology with genomic spectrograms, a two-stage transfer learning approach, and two-step explainability. This approach identified genomic regions and nucleotide frequency patterns that characterize specific variants, revealing clear, distinguishable patterns for each category. The distinct and consistent total regions of high activation for each variant highlight the significance of the genomic region from the beginning of S gene to the end of 3’UTR in identifying the variants under study. The frequencies and particularly within this region appeared to play a key role in their identification. The shared prominence of in the final segment of the genome for both pre-VOC and Omicron (despite different pattern shapes) may hint at a phylogenetic connection in SARS-CoV-2 or even suggest that Omicron evolved from a pre-VOC lineage. The confirmation that mathematical patterns are associated with variant classification represents a step forward in demonstrating that these patterns play a role in viral characterization, suggesting the existence of an additional layer of genomic information that may enable virus characterization in a low-computing, and efficient manner compared to traditional methodologies.
Palabras clave
Explainability
Genomic spectrogram
Mathematical pattern
SARS-CoV-2
Transfer learning
Variant
Descripción
Materias
Cita
Guerrero-Tamayo, A., Urquijo, B. S., Tosantos, M.-D. M., Olivares, I., Casado, C., & Pastor-López, I. (2025). Identification of mathematical patterns in genomic spectrograms linked to variant classification in complete SARS-CoV-2 sequences. Scientific Reports, 15(1). https://doi.org/10.1038/S41598-025-27279-0
Colecciones