< Back to previous page

Publication

Third generation sequencing and herpesvirus diversity: Towards complete double-stranded DNA virus genomes

Book - Dissertation

Viral variability can be understood as genetic differences between nucleotide sequences. It can be used to define viral taxonomic units (species, genera, class, or family), viral subgroups, strains or life-cycle stages (lytic and latent phases). Viral diversity has been extensively studied, mostly through second-generation sequencing techniques, such as IonTorrent or Illumina sequencing. In the last years, third-generation technologies, such as Nanopore sequencing, have been used to discover novel viruses, complete genomes or outbreak tracing (as currently with SARS-CoV-2) [1]. Nanopore sequencing consists of an artificial semi-conductive membrane, over which an electrical potential is applied. DNA molecules are present in one side of this membrane and diffuse towards one of the artificial nanopores present in the membrane. A given DNA molecule contains at its end a sequencing adapter and a motor protein attached to it. This motor protein can be temporary attached to an artificial pore, unwind the dsDNA molecule and translocate a ssDNA strand 3' -> '5, consuming ATP. As bases pass through the pore, base stretches generate specific and unique disturbance (or squiggle) patterns in the electric potential flowing through the membrane. This disturbance is mathematically modeled through a basecaller to specific nucleotides with a quality score, in real-time [2]. Reads can be later processed for genome assembly, variant calling or modified base discovery. Nanopore sequencing, despite its relatively low per read accuracy rate (93-95%) [3], offers several advantages for viral genomics such as long-read sizes (improved assembly contiguity and study of genome rearrangements), base modifications (viral epigenetics) and rapid field-based sequencing (small size device with real-time sequencing). Overall sequence consensus accuracy is on par with other sequencing techniques (99,9%)[4]. As detailed elsewhere, longer read sizes improve the contiguity of assembled genomes, mostly through the resolution of repeats [5]. This is of special relevance for producing high-quality reference genomes. Moreover, longer read sizes can improve the resolution of structural rearrangements on complex viral genomes [5]. Sequencing native DNA or RNA may allow for direct detection of epigenetic base modification. For example, nanopore sequencing has successfully been used to detect and quantify 5'-methylcytosine (5mC) in hepatitis B virus DNA (HBV) [6]. Limited attempts have been published for direct sequencing for dsDNA viruses with large genomes (> 100 kb) with nanopore sequencing, mostly in the families Herpesviridae, Poxviridae and Asfarviridae [7]. The order Herpesvirales, or the family Herpesviridae (average genome size 120 - 250 kb) offers an interesting working system to study viral diversity due to their wide host range, world-wide endemicity, and their capacity to establish lifelong infections [5]. Whole-genome sequencing and comparative evolution of herpesviruses highlights the diversity of their genome structure between families (Herpesvirinae, Alloherpesvirinae, and Malacoherpesvirinae) and genera (5 different genome structures from A to E). Currently, there are several herpesvirus species without a complete sequenced genome. Gene set and repeat content tend to vary within the same herpesvirus species, between strains or subtypes [5]. A good example is the human cytomegalovirus (or HCMV), which frequently harbors structural rearrangements and nucleotide variants [5]. HCMV consists of a linear double-stranded DNA genome with an average length of 235 kb ± 1.9 kb. The genome is packaged in an icosahedral capsid (T = 16) surrounded by a matrix of proteins, the tegument, and enveloped by lipid bilayer. Although the genome is linear inside the nucleocapsid, it is circularized upon entry in the nucleus. HCMV is composed by 2 big inverted domains: long (L) and short (S), flanked by repeated regions, one at the terminal (TR) end and the other at the internal end (IR), resulting in TRL-UL-IRL-IRS-US-TRS as a class E genome structure. Recombination between repetitive regions can yield four possible genomic isomers. All four genomic isomers can be found in any infective viral population in equimolar proportion. Recombination of terminal and internal repeats is common, and it has been hypothesized that their recombination is the cause of long-range structural rearrangements. 1. Dellicour, S.; Durkin, K.; Hong, S.L.; Vanmechelen, B.; Martí-Carreras, J.; Gill, M.S.; Meex, C.; Bontems, S.; André, E.; Gilbert, M.; et al. A phylodynamic workflow to rapidly gain insights into the dispersal history and dynamics of SARS-CoV-2 lineages. Mol. Biol. Evol. 2020. 2. de Lannoy, C.; de Ridder, D.; Risse, J. A sequencer coming of age: De novo genome assembly using MinION reads. F1000Research 2017, 6, 1083. 3. R10.3: the newest nanopore for high accuracy nanopore sequencing Available online: https://nanoporetech.com/about-us/news/r103-newest-nanopore-high-accuracy-nanopore-sequencing-now-available-store]. 4. Bowden, R.; Davies, R.W.; Heger, A.; Pagnamenta, A.T.; de Cesare, M.; Oikkonen, L.E.; Parkes, D.; Freeman, C.; Dhalla, F.; Patel, S.Y.; et al. Sequencing of human genomes with nanopore technology. Nat. Commun. 2019, 10, 1869. 5. Martí-Carreras, J.; Maes, P. Human cytomegalovirus genomics and transcriptomics through the lens of next-generation sequencing: revision and future challenges. Virus Genes 2019, 55, 138-164. 6. Goldsmith, C.D.; Cohen, D.; Dubois, A.; Martinez, M.-G.; Petitjean, K.; Corlu, A.; Hernandez-Vargas, H.; Chemin, I. Epigenetic heterogeneity after de novo assembly of native full-length Hepatitis B Virus genomes. bioRxiv 2020, 2020.05.29.122259. 7. Karamitros, T.; van Wilgenburg, B.; Wills, M.; Klenerman, P.; Magiorkinis, G. Nanopore sequencing and full genome de novo assembly of human cytomegalovirus TB40/E reveals clonal diversity and structural variations. BMC Genomics 2018, 19, 577.
Publication year:2021
Accessibility:Open