Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability
Cargando...
Fecha
2024-08
Autores
Guerrero Tamayo, Ana
Sanz Urquijo, Borja
Olivares, Isabel
Moragues Tosantos, María Dolores
Casado, Concepción
Pastor López, Iker
Título de la revista
ISSN de la revista
Título del volumen
Editor
Public Library of Science
Resumen
The global impact of the SARS-CoV-2 pandemic has underscored the need for a deeper understanding of viral evolution to anticipate new viruses or variants. Genetic recombination is a fundamental mechanism in viral evolution, yet it remains poorly understood. In this study, we conducted a comprehensive research on the genetic regions associated with genetic recombination features in SARS-CoV-2. With this aim, we implemented a two-phase transfer learning approach using genomic spectrograms of complete SARS-CoV-2 sequences. In the first phase, we utilized a pre-trained VGG-16 model with genomic spectrograms of HIV-1, and in the second phase, we applied HIV-1 VGG-16 model to SARS-CoV-2 spectrograms. The identification of key recombination hot zones was achieved using the Grad-CAM interpretability tool, and the results were analyzed by mathematical and image processing techniques. Our findings unequivocally identify the SARS-CoV-2 Spike protein (S protein) as the pivotal region in the genetic recombination feature. For non-recombinant sequences, the relevant frequencies clustered around 1/6 and 1/12. In recombinant sequences, the sharp prominence of the main hot zone in the Spike protein prominently indicated a frequency of 1/ 6. These findings suggest that in the arithmetic series, every 6 nucleotides (two triplets) in S may encode crucial information, potentially concealing essential details about viral characteristics, in this case, recombinant feature of a SARS-CoV-2 genetic sequence. This insight further underscores the potential presence of multifaceted information within the genome, including mathematical signatures that define an organism’s unique attributes.
Palabras clave
Descripción
Materias
Cita
Guerrero-Tamayo, A., Sanz Urquijo, B., Olivares, I., Moragues Tosantos, M.-D., Casado, C., & Pastor-López, I. (2024). Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability. PLoS ONE, 19(8). https://doi.org/10.1371/JOURNAL.PONE.0309391
