Spatial‐temporal distribution and sequence diversity of group a human respiratory syncytial viruses in Kenya preceding the emergence of ON1 genotype

Abstract Background Human respiratory syncytial virus (HRSV) is a major cause of severe viral acute respiratory illness and contributes significantly to severe pneumonia cases in Africa. Little is known about its spatial–temporal distribution as defined by its genetic diversity. Methods A retrospective study conducted utilizing archived nasopharyngeal specimens from patients attending outpatient clinics in hospitals located in five demographically and climatically distinct regions of Kenya; Coast, Western, Highlands, Eastern and Nairobi. The viral total RNA was extracted and tested using multiplex real time RT‐PCR (reverse transcriptase polymerase chain reaction). A segment of the G‐gene was amplified using one‐step RT‐PCR and sequenced by Sanger di‐deoxy method. Bayesian analysis of phylogeny was utilized and subsequently median joining methods for haplotype network reconstruction. Results Three genotypes of HRSVA were detected; GA5 (14.0%), GA2 (33.1%), and NA1 (52.9%). HRSVA prevalence varied by location from 33% to 13.2% in the Highlands and the Eastern regions respectively. The mean nucleotide diversity (Pi[π]) varied by genotype: highest of 0.018 for GA5 and lowest of 0.005 for NA1. A total of 58 haplotypes were identified (GA5 10; GA2 20; NA1 28). These haplotypes were introduced into the population locally by single haplotypes and additional subsidiary seeds amongst the GA2 and the NA1 haplotypes. Conclusions HRSVA was found across all the regions throughout the study period and comprised three genotypes; GA5, GA2, and NA1 genotypes. The genotypes were disproportionately distributed across the regions with GA5 gradually increasing toward the Western zones and decreasing toward the Eastern zones of the country.


| INTRODUCTION
Human respiratory syncytial virus (HRSV) is a major cause of bronchiolitis and pneumonia in children and infants. It is estimated to cause over 3 million hospitalizations and over 50,000 deaths in children below 5 years of age each year. 1 Most individuals are infected during the first 2 years of life and re-infections occur throughout lifetime. [2][3][4][5] HRSV is characteristically seasonal with highest incidence seen during the winter months in temperate countries and during the rainy seasons in the tropics. 6 In Kenya, increased infections are seen in the months of March to August and October to May with least infections in September. [7][8][9] HRSV is an enveloped virus with a negative-sense single stranded RNA genome of approximately 15 200 nucleotides. It consists of 10 genes encoding 11 proteins. 2 Two of these are surface proteins: G (attachment) and F (fusion). The G-gene consists of two hypervariable regions separated by a central conserved region, which contains four cysteine residues at positions 173, 176, 182, and 186. 5,10 The second hypervariable region has been utilized in several studies to characterize the HRSV into genotypes. 4,[11][12][13] There are two genetically distinct HRSV groups; A and B. 13 These groups are further categorized into genotypes. Fourteen genotypes of HRSVA, have been reported to date: GA1-GA7, SAA1, NA1-NA4 and ON1-ON2). 7,[14][15][16][17] Information on the distribution of the HRSVA genotypes across Kenya is scarce, and the inter-relationship of these viruses across the regions of the country remains unknown. This study presents data on viruses detected in five regions across Kenya over a 4-year calendar period. The rainy seasons in Kenya's extensive geographical region are from March-May (long rains) and October-December (short rains).
The objectives of this study were to identify the HRSVA genotypes, describe spatial-temporal distribution and their genetic diversity, and hence gain insight into patterns of introduction into and spread within the population. The specimens were collected from consenting adult patients or parents consenting on behalf of their children, with influenza-like illness (ILI) and attending the outpatient clinics in public hospitals across Kenya. Individuals were recruited if they were at least 2 months of age, presented within 72 h of onset of illness showing the following symptoms; a cough or sore throat and a fever of ≥38 C at the time of presentation. 18,19 The public hospitals (numbering 8) were part of the DEID surveillance network that had distinct geographic and demographic representation of Kenya. 19 These sentinel centers were further clustered into five geographically distinct regions, that is, Coast, Eastern, Western, Highlands and Nairobi. Since increased cases of HRSV were observed in the months of March to August and October to May with least infections in September, we considered the HRSV season in Kenya to start in September of each year and end in August of the following year. 15

| Study specimens and laboratory procedures
The NP swabs were collected in viral transport media (VTM) at the sentinel centers and transported to the country's National Influenza Center (NIC) for testing and storage. 20 Total viral RNA was extracted from 2300 specimens using a commercial RNA extraction kit (QIAamp viral RNA mini kit ® , Qiagen Inc., Valencia, CA, USA). The RNA was subsequently tested using modified multiplex real time reverse transcription Polymerase Chain Reaction. 21 A segment of G-gene was amplified using one-step Reverse Transcriptase Polymerase Chain Reaction method (RT-PCR) by utilizing the QIAGEN Ltd kit. Each reaction comprised of two primers; AG20, forward primer (5 0 -GGGGCAAATGCAAACATGTCC-3 0 ) and F164, the reverse primer (5 0 -GTTATGACACTGGTATACCAACC-3 0 ).
These primers targeted the ectodomain region of the RSV G-gene and part of the F gene. 22 Thermocycling was performed at 50 C for 30 min (1 cycle), 95 C for 15 min (1 cycle), [94 C for 30 s, 54 C for 30 s, 72 C for 1 min] 40 cycles and 72 C for 10 min. 22 Thereafter, the One-Step RT-PCR product was utilized in a 50-μl reaction assay of the nested PCR method using the Qiagen Taqman PCR kit. This assay comprised of two primers; BG10 forward primer

| Sequence analysis
Raw sequence data was assembled using DNA baser version 3.2. 23 The sequence contigs were manually edited and trimmed using BioEdit program version 7.2.5. 24 Multiple sequence alignment was conducted using ClustalW algorithm in the BioEdit program 24 and Muscle v3.8.31 software. 25 These sequences were deposited in the GeneBank (accession numbers OK458563-OK458679).

| Phylogenetic analysis
The phylogenetic relationships were examined based on the nucleotide sequences in the full segment and the second hypervariable region of the G-gene and were inferred using MrBayes software version 3.2. 13,26,27 The tree was visualized using Fig tree v1.4.3 software. 28 Since the phylogenetic clustering of the sequences in the two sets of analysis was similar, we are reporting the analysis based on the second hypervariable region (300 nucleotides) corresponding to 592-891 nucleotides of the prototype A2, accession number JX198138. 29 To phylogenetically categorize our local sequences into their genotypes, 57 global sequences of known genotype designation were included.

| Population analysis and network reconstruction
The entire sequence obtained from the Sanger dideoxy termination method (633 nucleotides) was utilized. The DnaSP version 5.10 was utilized to deduce the existing haplotypes based on single nucleotide polymorphism. 30,31 MEGA X was used to compute the genetic diversity of these viruses. The mean nucleotide diversity in each genotype (entire population) [Pi(π) a ] and within each region by genotype [Pi(π) b ] were computed. 32 Median joining algorithm implemented in the Network version 5.0.0.3 software was utilized to calculate the parameters for reconstruction of the haplotype networks. 30 3 | RESULTS

| Phylogenetic analysis
Three of the fourteen known HRSVA genotypes were found in this population. These were GA2, GA5, and NA1 genotypes. Most of the viruses belonged to the NA1 genotype (64 sequences; 52.9%), followed by GA2 genotype (40 sequences; 33.1%) and the rest of GA5 genotype with 17 sequences (14%).
The local sequences were found on two major clades of the phylogenetic tree ( Figure 1A); one clade comprised of the NA1 and GA2 genotypes while the other clade comprised of the GA5 genotype (clade two). The local GA5 sequences clustered away from those from elsewhere. These sequences were characterized by C643A resulting in an amino acid change (P215I) while those from elsewhere were characterized by P215L. Only one sequence from Mexico was similar to the prototype at this position.
The local GA2 sequences formed several clusters. Characteristically, these clusters comprised of sequences from the same region with only four remaining at the root of the clade ( Figure 1B). In contrast, the majority of the NA1 sequences remained at the root of the clade with only fifteen of the local sequences (23%) forming additional clusters ( Figure 1C). An additional cluster of two local sequences with clade credibility of 100% was observed on clade one ( Figure 1B).

| Spatial-temporal distribution of HRSVA genotypes across Kenya
Most of the viruses were from the Highlands region (33%) followed by the Western region (20%) while the least were from the Eastern region (Table 1). Overall, the genotype dominance varied across the five regions. GA5 genotype was predominant in the Western (47%) region, GA2 genotype (31%) in the Eastern region while NA1 genotype in three regions; Coast (24%), Nairobi (25%) and Highlands (34%).
Further variation was observed in the distribution of the genotypes from season to season (Table 2). GA2 genotype was the most common genotype in season 1 while GA5 during season 2, and NA1 genotype in seasons 3 and 4. The GA5 genotype was not detected during season 4.

| The genetic diversity and haplotype analysis of HRSVA
The nucleotide mean distance within the GA2, GA5 and NA1 genotypes was 0.0085, 0.018 and 0.005 respectively. We further investigated the nucleotide mean distances per region and genotype and found that amongst the GA2 viruses, it was lowest in the Eastern region (0.005) and highest in the Western region (0.008). Amongst the NA1 viruses, it was lowest in Nairobi and Western regions (0.003) while Coast and Eastern regions had the highest (0.008). Amongst the GA5 viruses, there was an absence of diversity on the Coast region (nucleotide mean distance of 0.00), and highest nucleotide mean distance in the Western region (0.017) ( Table 3).
Haplotype diversity across the genotypes ranged from 0.919 in the NA1 genotype to 0.952 in the GA2 genotype (Table 4).

| Spatial distribution of local HRSVA haplotypes per genotype
The majority of the NA1 sequences were found in the Highlands region ( Figure 2A); however, the Nairobi region had the highest number of haplotypes (Table 1)  3.6 | Inter-relationship of the local HRSVA haplotypes in the five regions across We further explored the interrelationship and divergence of the local haplotypes as defined in the three genotypes across the five regions F I G U R E 1 (A) Phylogenetic tree based on the 2nd hypervariable region of the G-gene from samples collected from ILI presentations to eight health facilities across Kenya 2007-2010. This tree was constructed using the Bayesian method with a scale of 0.3 and the clade credibility of above 60%. Due to the large data size (178 sequences), we clustered our local sequences into their haplotypes (refer to methods) and obtained a representation of 97 of 121 local sequences for clear visibility on the phylogenetic tree. The tree was rooted on the prototype A2, accession number JX198138. Colored tips represent the global sequences accessed from Genbank. Clade two sequences belonging to genotypes GA3-7 represented in detail while the branches in clade one has been collapsed (NA1-4), GA2 and ON1 genotypes (see Figure 1B,C). (B) Clade one of the phylogenetic tree showing the GA2 genotype sequences. Colored tips belong to the global sequences. Among the GA2 sequences, there is an additional cluster (clade credibility of 100%) comprising of two sequences that were not previously identified. We categorized these sequences as variants of the GA2 genotype. (C) Clade one of the phylogenetic tree showing sequences in the NA1-4 and ON1 genotypes. Colored tips comprise of the global sequences. All our local sequences were within the NA1 genotype ( Figure 3). These haplotypes were widely dispersed across the regions and a few were found in multiple regions.
All the GA5 haplotypes were unique to single regions while 15% of GA2 and 25% of NA1 haplotypes were found in multiple regions.
The network tree further displayed divergence of haplotypes across the regions.

| Seeding of the Kenyan HRSVA viruses
Spatial-temporal clustering of haplotypes belonging to one hundred and thirty sequences (including thirteen global sequences) was utilized to investigate the seeding patterns of the local viruses. The network trees were reconstructed based on the circulating genotypes in the population as previously identified, that is, GA5, GA2, and NA1 ( Figure 4).
The clustering of seventeen haplotypes found in the GA5 genotype illustrated a single seed amongst the local haplotypes ( Figure 4A   the Eastern region. These findings were similar to another local study conducted during the period coinciding with our study period which reported the GA5 (12.5%) to be the least dominant while GA2 genotype and its variants the most abundant (75%). 9 However, when we examined the distribution of the genotypes by region we observed increased dominance of the GA5 genotype toward the Western region of the country. While this is true, the GA5 genotype was not detected during the fourth season of the study and the period coinciding with the season in other studies conducted locally. 9 The phylogenetic tree displayed divergent clustering amongst the Kenyan genotypes. Amongst the GA5 viruses, our local sequences clustered away from the global sequences hence forming a distinctive clade. The leaves of this clade comprised of sub-clusters with sequences mainly from the same regions thereby showing a tendency toward regional clustering. This is further supported by the low mean nucleotide diversity within regions and high mean nucleotide diversity for the entire GA5 population.
The clustering amongst the GA2 sequences was diverse hence forming three distinct clades; (i) comprised of global sequences F I G U R E 3 Spatial interaction of the Kenyan haplotypes identified through sentinel hospital outpatient surveillance from 2007-2010: Median-joining network tree displaying the interrelationship of haplotypes per genotype across five regions. Each tree node (circle) represents a haplotype, color coded as per the region where it was detected. A multi-colored node represents a haplotype found in multiple regions. Node size is directly proportional to the number of viruses in each haplotype. The tree displays haplotype interlinkage within and across regional boundaries. The branch length between haplotypes is not proportional to the number of mutations detected between the haplotypes. The fourcolor coded circles represent the genotypes. The two haplotypes in the smaller circle belong to the GA2 variant viruses The NA1 cluster was the largest, comprising of sixty-four of our local sequences and additional global sequences. Some of these sequences had previously been identified as GA2 variants prior of the variant to a genotype. 16 Unlike in the GA2 and the GA5 clusters, three quarters of sequences of this genotype remained at the root of the clade. This phenomenon is suggestive low diversity amongst these viruses. Only a quarter of these sequences formed additional clusters.
An additional cluster comprising of global sequences of the ON1 genotype was also observed. This divergent clustering in the NA1 cluster supports the theory that the ON1 genotype was derived from NA1 genotype through a single duplication event. 34