Genomic epidemiology of the rotavirus G2P strains in Coastal Kenya pre-and post-rotavirus vaccine introduction, 2012-2018

The introduction of rotavirus vaccines into the national immunization programme in many countries has led to a decline of childhood diarrhoea disease burden. However, it remains unclear whether implementation of the monovalent Rotarix vaccine (G1P[8]) into national immunization programmes of countries drives the temporal shifts of Rotavirus A (RVA) genotypes in the pre- and post-vaccine periods. Here we investigate the evolutionary genomics of rotavirus G2P[4] which has shown an increase in countries that introduced the monovalent Rotarix vaccine. We examined the 63 RVA G2P[4] strains sampled from children (aged below 13 years) admitted to Kilifi County Hospital, Coastal Kenya, pre- (2012 to June 2014) and post-(July 2014-2018) rotavirus vaccine introduction. All the 63 genome sequences showed a typical DS-1 like genome constellation G2-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2. G2 sub-lineage IVa-3 strains predominated in the pre-vaccine era co-circulating with low numbers of G2 sub-lineage IVa-1 strains, whereas sub-lineage IVa-3 strains dominated the post-vaccine period. In addition, in the pre-vaccine period, P[4] sub-lineage IVa strains co-circulated with low numbers of P[4] lineage II strains, but P[4] sub-lineage IVa strains predominated in the post-vaccine period. On the global phylogeny, the Kenyan pre- and post-vaccine G2P[4] strains clustered separately, suggesting that different virus populations circulated in the two periods. However, the strains from both periods exhibited conserved amino acid chnages in the known antigenic epitopes, suggesting that replacement of the predominant G2P[4] cluster was unlikely a result of immune escape. Our findings demonstrate that the pre-and post-vaccine G2P[4] strains circulating in Kilifi, coastal Kenya, differed genetically, but likely were antigenically similar. This information informs the discussion on the consequences of rotavirus vaccination on rotavirus diversity.

INTRODUCTION 5 amplification of structural gene segments (VP1, VP2, VP3, VP4, VP6, and VP7) included 40 1 4 0 cycles of thermocycling (90°C for 30 seconds, 61°C for one minute and 68°C for six 1 4 1 minutes), and included a final extension at 72°C for four minutes. PCR amplicons were 1 4 2 resolved under a 2% agarose gel stained with RedSafe (iNtRON Biotechnology, Inc) for 1 4 3 visualization of DNA bands. PCR products were purified using Exonuclease I (#EN0581; 1 4 4 Thermo Fisher Scientific, Waltham, USA) as described by the manufacturer and pooled for 1 4 5 each sample. 1 4 6 Next generation sequencing 1 4 7 Preparation of standard Illumina libraries for pre-vaccine samples was performed 1 4 8 according to the published protocol [32]. Briefly, the double-stranded cDNA for each sample 1 4 9 was sheared to obtain 400 -500 nucleotide fragments. Each sample was then indexed 1 5 0 separately to unique adapters and multiplexed at 95 samples and then sequenced on a HiSeq 1 5 1 platform to generate about 1.5 million 250bp paired end reads per sample. 1 5 2 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
For the post-vaccine samples was done by purifying the pooled amplicons for each sample 1 5 3 using the Agencourt AMPure XP Kit (#A63881; Beckman Coulter, USA) as described by the 1 5 4 manufacturer. Library preparation was performed using the Illumina DNA flex (#20025519, 1 5 5 Illumina, San Diego, USA) as per the manufacturer's specifications. Briefly, bead-linked 1 5 6 transposomes were used to tagment the DNA, followed by addition of adapters to the DNA 1 5 7 fragments using a limited PCR program. The adapter-linked DNA was cleaned using the 1 5 8 tagment wash buffer. After that, the purified tagmented DNA was amplified via a limited-1 5 9 cycle PCR program that adds the i7, i5 adapters and sequences required for cluster generation 1 6 0 during sequencing. Next, the amplified libraries were purified using a double-sided bead 1 6 1 purification method. Subsequently, each DNA library was quantitated, and correct insert sizes 1 6 2 confirmed on an Agilent 2100 Bioanalyzer using the Agilent high sensitivity DNA kit 1 6 3 (#5067; Agilent, Santa Clara, USA . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 24, 2022. and any information, including location and collection year, that could be found in the 1 8 7 primary publications was included in the respective sequence data. The sequences were 1 8 8 subset to obtain datasets of each genome segment. The datasets of all the genome segments 1 8 9 were filtered to only include samples with all the 11 segments. For all the 11 segments, more 1 9 0 that 80 % of the coding sequence (CDS) region was considered for analysis. Overall, 350 1 9 1 global sequences for each segment met the inclusion criteria for phylogenetic analyses ( Table  1 9 2 S2). 1 9 3 Phylogenetic analysis 1 9 4 The global dataset was combined with the sequences of this study for each genome 1 9 5 segment and aligned using MAFFT (v7.487) with the command "mafft --auto --reorder -- . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

3 1
Analysis of the VP7 gene 2 3 2 The VP7 gene is highly variable and encodes the humoral immune response glycoprotein 2 3 3 [50]. The VP7 genetic distance-resolved phylogenetic tree showed that the Kilifi sequences 2 3 4 formed three clusters: a monophyletic cluster, a minor monophyletic cluster, and a singleton 2 3 5 (Fig. 2). Within the major cluster, the Kilifi strains separated by vaccination period, with one 2 3 6 subcluster consisting of strains circulating two years after Rotarix® vaccine introduction and 2 3 7 were interspersed with three strains isolated from children admitted to Kenyatta National 2 3 8 Hospital (KNH), Kenya in 2017 (Fig. 2). These sequences shared two non-synonymous 2 3 9 amino acid substitutions (S72G, S75L) with respect to the pre-vaccine strains (Table S4). The 2 4 0 second subcluster mainly consisted of strains circulating in the pre-vaccine period and two 2 4 1 strains that circulated in July 2014, i.e., early post-vaccine period (Fig. 2). The sequences in With regards to the VP7 lineages, the Kilifi G2 strains were classified into lineage IV and 2 4 6 further classified into sub-lineage IVa-1 and IVa-3 (Fig. 2). In 2012, sub-lineages IVa-1 and 2 4 7 IVa-3 sequences co-circulated in Kilifi, while in the global context sub-lineages IVa-1, IVa-3 2 4 8 and IV non-a co-circulated (Fig. 3A). However, IVa-1 strains in Kilifi were replaced with 2 4 9 sub-lineage IVa-3 strains in 2013 that dominated until 2018 (Fig. 3A), unlike in the global 2 5 0 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

5 3
No lineage shift was observed pre-and post-vaccine introduction.

5 4
Analysis of the VP4 gene 2 5 5 The VP4 gene is highly variable and encodes a highly immunogenic protease sensitive 2 5 6 protein involved in receptor binding and cell penetration [50]. In the VP4 phylogenetic tree, 2 5 7 the P[4] Kilifi sequences formed clusters (n>2) mainly based on the vaccination period 2 5 8 separated from global sequences (Fig. 2). However, two Kilifi sequences formed singletons, 2 5 9 with the KLF1033/2018 strain clustering with a sequence isolated from a child admitted to 2 6 0 KNH, while the KLF0616/2012 strain was interspersed with sequences from Mozambique 2 6 1 (Fig. 2). A major cluster of Kilifi sequences further sub-clustered based on the vaccination 2 6 2 period, with the post-vaccine sequences interspersing with Kenyan sequences isolated from 2 6 3 children admitted to KNH (Fig. 2). In addition, Kilifi strains collected in 2014 and some 2012 2 6 4 strains formed two distinct clades, clustering separately from global sequences (Fig. 2). The backbone genome segments of the Kilifi G2P[4] strains (VP6, VP1-VP3, and NSP1-2 7 2 NSP5) formed up to four clusters on the global phylogenetic trees (Fig.S1). In the VP6, VP1, 2 7 3 VP2, VP3, NSP1, and NSP2 genes, majority of the Kilifi sequences formed one major cluster 2 7 4 which further separated into two sub-clusters of only pre-and post-vaccine sequences ( Fig.  2 7 5 S1). The post-vaccine strains in these genes clustered closely with 2017 sequences from 2 7 6 KNH, Kenya (Fig.S1). In addition, the Kilifi 2014 sequences in the VP6, VP3, NSP1, and 2 7 7 NSP2 segments exhibited a different clustering pattern of a further minor sub-cluster 2 7 8 irrespective of the vaccination period, consistent with the VP4 gene (Fig. S1). Four post-2 7 9 vaccine sequences were interspersed with the pre-vaccine sequences in the NSP4 gene and a 2 8 0 singleton of post-vaccine sequence clustered with pre-vaccine sequences in the NSP3 gene 2 8 1 (Fig. S1). The NSP5 post-vaccine sequences formed one cluster, while the pre-vaccine 2 8 2 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The VP4 surface protein is cleaved into the VP8* and the VP5* domains containing the 8-3 1 2 1 to 8-4 and 5-1 to 5-5 antigenic epitopes [52]. Analysis of the VP4 aa mutations revealed 3 1 3 that the Kilifi strains differed only at three positions; Q114P or L114P and N133S in the 8-3 3 1 4 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 24, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 epitope and N89D in the 8-4 epitope, relative to the DS-1 prototype sequence (Fig 4 & Table  3 1 5 S5 Australia and Belgium [55], which were interpreting as reflecting natural genetic fluctuations 3 2 8 rather than vaccine induced evolution. However, the study NSP4 and NSP3 genes exhibited . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 24, 2022. ; https://doi.org/10. 1101/2022 This supported our hypothesis that local drivers were responsible for the diversity within the 3 4 6 Kilifi setting.

4 7
The Kilifi strains harbored six conserved amino acid (aa) substitutions in the VP7 3 4 8 antigenic epitopes; 7-1a and 7-1b with respect to the ancestral DS-1 G2P[4] strain. Three of 3 4 9 these positions (A87T, D96N, and N213D) are critical for antibody binding and sequence 3 5 0 changes here may lead to escape from host neutralizing antibodies [57]. The I44M aa change 3 5 1 may affect cellular immunity as this region harbors a known T lymphocyte epitope (40-52) of 3 5 2 the VP7 genes. All Kilifi strains had this change that potentially result in loss of recognition 3 5 3 by T cells leading to escape from host immune responses [58,59]. Three aa acid changes were 3 5 4 observed in VP4 antigenic epitopes in 8-4 (N89D) and 8-3 (Q114P or L114P and N133S) in 3 5 5 the Kilifi strains. These have been associated with escape of attachment of the virus to host 3 5 6 neutralizing monoclonal antibodies [60]. These aa substitutions were present in both pre-and 3 5 7 post-vaccine Kilifi strains, suggesting they were not brought about by vaccine use.

5 8
This study had limitations. First, only sequences sampled from hospitalized children were 3 5 9 analysed, thus may not conclusively reflect diversity that was in circulation in the entire 3 6 0 coastal Kenya population. We only analysed a few genomes across the years. Second, we 3 6 1 only recovered near complete genomes. Only 68% coverage was recovered in the VP4 3 6 2 segment. 3 6 3 In conclusion, our study reinforces the significance of genomic sequencing in monitoring 3 6 4 the effect of vaccine pressure on circulating RVA strains in Kenya. The Kilifi strains to a 3 6 5 large extent clustered based on the vaccination period and were separate from the global 3 6 6 strains. Furthermore, conserved amino acid mutations were observed in the VP7 and VP4 3 6 7 antigenic epitopes of the pre-and post-vaccine strains, suggesting that the Rotarix® vaccine 3 6 8 did not have a direct impact on the evolution of the circulating strains. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 24, 2022.  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 24, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022