Science -- Gaschen et al. 296 (5577): 2354

| Sign In as Individual | FAQ

		Abstract of this Article
		Reprint (PDF) Version of this Article
		dEbates: Submit a response to this article
		Related articles in Science
		Similar articles found in: SCIENCE Online
		Supplemental Data
		Search Medline for articles by: Gaschen, B. \|\| Korber, B.
		Alert me when: new articles cite this article
		Download to Citation Manager

		Collections under which this article appears: Medicine/Diseases

Diversity Considerations in HIV-1 Vaccine Selection

Brian Gaschen,¹ Jesse Taylor,¹ Karina Yusim,¹ Brian Foley,¹ Feng Gao,² Dorothy Lang,¹ Vladimir Novitsky,³ Barton Haynes,² Beatrice H. Hahn,⁴ Tanmoy Bhattacharya,¹ Bette Korber¹⁵^*

Globally, human immunodeficiency virus-type 1 (HIV-1) is extraordinarily variable, and this diversity poses a major obstacleto AIDS vaccine development. Currently, candidate vaccines arederived from isolates, with the hope that they will be sufficientlycross-reactive to protect against circulating viruses. This maybe overly optimistic, however, given that HIV-1 envelope proteinscan differ in more than 30% of their amino acids. To contend withthe diversity, country-specific vaccines are being considered,but evolutionary relationships may be more useful than regionalconsiderations. Consensus or ancestor sequences could be usedin vaccine design to minimize the genetic differences betweenvaccine strains and contemporary isolates, effectively reducingthe extent of diversity by half.

¹ Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
² Duke University AIDS Center, Durham, NC 27710, USA.
³ Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115, USA.
⁴ University of Alabama at Birmingham, Birmingham, AL 35294, USA.
⁵ Santa Fe Institute, Santa Fe, NM 87501, USA.
^* To whom correspondence should be addressed. E-mail: btk@t10.lanl.gov

Since HIV-1 M group began its expansion in humans roughly 70 years ago (1, 2) it has diversified rapidly(3), now comprising a number of different subtypes andcirculating recombinant forms (CRFs). The HIV-1 M group is theset of diverse viruses that dominates the global AIDS epidemic.Subtypes are genetically defined lineages that can be resolvedthrough phylogenetic analysis of the HIV-1 M group as well-definedclades, or branches, in a tree. Recombination occurs frequently,and a CRF carries sections of two or more subtypes in a mosaicgenome; a recombinant lineage is designated a CRF when relatedforms are found in multiple epidemiologically unlinked individuals.Currently, strains belonging to the same subtype can differ byup to 20% in their envelope proteins, and between-subtype distancescan soar to 35%. Moreover this diversity is continually growing.The need for frequent changes in the annual influenza vaccineputs into perspective the implications of such diversity--lessthan 2% amino acid change can cause a failure in the cross-reactivityof the polyclonal response to the influenza vaccine and necessitateschanging the vaccine strain (4).

Although the scale of the HIV-1 pandemic makes action imperative, there is still much to learn about the extent and immunologicalimplications of HIV-1 sequence diversity. We do, however, havein hand the fruits of an extensive global HIV sequencing effort[currently there are 72,221 HIV sequences in the database (3)]that can provide a framework for reasoned vaccine strain selection.Optimizing selection is of the utmost urgency, as a number ofhuman vaccine trials are being planned and initiated (5,6), and it is difficult to change strains during thelong course of vaccine development, from initial concept to humantrial. Subtype C is the most prevalent HIV-1 subtype globally,and it predominates in several geographic regions where vaccinesmight be evaluated. In these regions, the epidemiologically unlinkedprevalence of subtype C infections can exceed 30% of the adultpopulation (7). Therefore, this exploration of the implicationsof HIV variation for vaccine strain selection focuses on subtypeC; we believe, however, that our reasoning and findings can beextrapolated to other intra- and intersubtype scenarios.

Although there is hope that a single vaccine strain may elicit a sufficiently cross-reactive response to confer a benefit,there is great interest in attempting to optimize vaccines throughconsiderations of diversity. There are currently two general approachesto selecting vaccine strains that attempt to contend with thehigh levels of HIV sequence variation (Table 1). Thefirst is based on using isolates of a particular subtype, sometimesselected from a geographic region where the vaccine is intendedfor use. Examples of this approach that are under way includethe development of several A- and C-subtype vaccines, as wellas CRF01 vaccine reagents (5, 6). This kindof approach can be integrated with biological considerations,such as coreceptor usage, neutralization susceptibility, neutralizationpotency of the serum from the individual from whom the isolatewas obtained, or the preferential use of isolates from recentseroconvertors (8). The second approach, rather thanusing actual viruses from within the population, is to constructeither a consensus sequence or an ancestral sequence reconstructedon the basis of an evolutionary model. Such sequences have theadvantage of being central and most similar to currently circulatingstrains of interest and may have enhanced potential for elicitingcross-reactive responses. They may also have economic and politicaladvantages that merit consideration. Economic, because it is notfeasible to duplicate vaccine design efforts using country-specificstrains for every nation and region that needs a vaccine, andthis is a way to limit the number of constructs that must be producedand tested in a way that is logical and scientifically defensible.Political, because such artificial sequences are not associatedwith any specific country of origin, so nations hosting vaccinetrials would not need to contend with the natural concerns thatarise when asked to host a vaccine trial using HIV-1 antigenswith distant geographic origins. Some subtype C sequences of currentinterest are indicated in Fig. 1, a phylogenetic treethat includes geographic information to provide a basis for consideringcandidate vaccine strains. (Phylogenetic terms and concepts usedin this article are defined in the legend to Fig. 1.)These two basic approaches, in different ways, directly confrontthe problem of viral diversity; we will explore their advantagesand disadvantages in the sections that follow.

Table 1. A summary of classes of potential vaccine strains for use in subtype C epidemic regions. Differences show typical values of the percentage of amino acid changes observed when comparing the potential vaccine strain sequences to the sets of available C clade protein sequences. The lower bound represents conserved proteins, the upper bound variable proteins.

Vaccine source Differences (%) Advantages and characteristics

Isolates, subtype B 10-30 Furthest along in vaccine testing

Based on an actual virus, and strains can be selected on the basis of advantageous biological characteristics

Isolates, subtype C 5-15 The closest natural form to C-subtype circulating strains

Like subtype B isolates, based on actual viruses, and thus can be selected on the basis of biological features

Consensus, subtype C 3-8 Central to C-subtype circulating strains

Each amino acid is most commonly found at that position

Ancestral, subtype C 3-8 Representative of the C subtype

Maximum likelihood model of ancestor sequence

M-group consensus 5-15 Representative of the HIV-1 epidemic

Most likely to cross-react with all clades

Consensus of the subtype consensus sequences

Fig. 1. Maximum likelihood phylogenetic tree showing the genetic distances and relationships of potential vaccine strains to subtype C gag sequences, and to representatives from other subtypes. The external nodes, the branch tips on the right of the tree, each represent an actual sequence. The interior nodes, or branch points, are ancestral to the "clade" of sequences that branch to their right. This tree uses the M-group consensus as the outgroup, a sequence brought into the analysis to help determine the ancestral states and root of the "ingroup," in this case the HIV-1 M group. Constructing the tree with the M-group consensus sequence as an outgroup forces the ancestral node of the M group to be central to all of the subtypes; using a more conventional strategy of selecting another primate lentiviral sequence for an outgroup can lead to statistically unsupported and unrealistic locations for ancestral nodes (1). The horizontal branch lengths in the trees represent evolutionary distances and indicate how many nucleotide substitutions have occurred; these are estimated from the evolutionary model. The two-letter code for the country of origin of subtype C sequences is indicated: India, IN; South Africa, ZA; Botswana, BW; Tanzania, TZ; Israel, IL; Ethiopia, ET; Zambia, ZM; and Brazil, BR. The locations of the C-subtype and M-group consensus and ancestor are indicated. Potential C-subtype vaccine strains are marked by a bold branch and their isolate name; these include Brazilian BR025, one of the first available subtype C isolates; ZM651 (AF286244), a Zambian strain; IN101 (AB023804), an Indian sequence discussed in the text, and sequences derived from recent South African isolates, including Du422 (AY043175), and three available for reagent development through the UNAIDS network, derived from viable isolates with full-length sequenced clones, ZA009 (AY118166), ZA003 (AY118165), and ZA012 (AF286227) (24, 53). [Note: There are two South African samples designated ZA009, one from a B clade infection (AF095828), and one recently obtained UNAIDS Network C clade isolate; in this paper ZA009 refers to the C clade isolate.] The scale bar indicates the genetic distance along the branches. gag and env maximum likelihood trees with all sequences labeled are also available (32). [View Larger Version of this Image (24K GIF file)]

Other concepts being explored to address diversity issues can ultimately be considered in the framework of the two basic approachesto strain selection described above. For example, multivalentcocktails of proteins that include a spectrum of regional variantsare being evaluated. Most of these strategies assume that theimmune responses elicited by any one circulating strain will beof sufficient cross-reactivity to protect against other strainsfrom the same subtype. Given that intrasubtype diversity in variableproteins can reach 20%, even this assumption may be too optimistic.Consensus sequences and strains from viable isolates could becombined in a polyvalent approach. A second strategy is to designmodified envelopes to enhance exposure of epitopes known to becapable of inducing broadly neutralizing antibodies (9-11).There are a limited number of monoclonal antibodies that havebroad, cross-clade neutralization capabilities (11-13),and these antibodies can act synergistically (13-15).Vaccines specifically designed to target these conserved epitopes,if successful, may ultimately be optimized by fine-tuning as subtype-specificvaccines, as appropriate; although there is little evidence thatsubtypes correspond to neutralization phenotypes, in some casesparticular clades can be refractive or less susceptible to particularantibodies. For example, the three broadly neutralizing monoclonalantibodies 2F5, 2G12, and IgG1b12, raised against B clade strains,were not individually able to neutralize a C clade primary isolate,although they could in combination (14). Even if subtypesare not relevant to a particular neutralizing epitope, lineage-specificvariation within the relevant antigenic domain may still be worthconsidering. A third strategy for contending with diversity isto use polyvalent peptides spanning a region like the V3 loopthat induces strain-specific neutralizing antibodies (16),to attempt to elicit a set of responses that together confer cross-reactiveprotection (17, 18).

Isolate-Based Vaccines

Geographic considerations. For historical reasons, AIDS vaccines reagents were first developed from subtype B viruses, thedominant subtype in the United States and Europe. It has beenproposed that such strains be included in vaccine trials conductedin populations infected with subtypes other than B. CytotoxicT lymphocyte (CTL) studies provide evidence for cross-subtypeT cell responses, and B viruses have been studied for a longertime, so researchers can more rapidly move forward in safety andimmunogenicity (phase I) studies (5, 6). However,T cell immune responses in general are more intense and have greaterbreadth within a subtype (19-23).Thus, although there is potential for cross-reactivity and evenfor synergistic interactions between antibodies (15),it is more than likely that both the breadth and intensity ofpolyclonal T and B cell immune responses to cross-clade immunogenswill be suboptimal and that important epitopes will be missed.Subtype-specific, single-strain, or combination vaccines havebeen strongly advocated in recent years (24-26),and approximately 10 subtype C vaccines are poised to enter phaseI trials in India, South Africa, and China (27).

There has been some discussion of choosing a regional strain for a vaccine, for example, an Indian strain to be used in India,and a South African strain in South Africa, and so on (8).There is little support for this in terms of sequence analysis.Subtype C sequences from Botswana and South Africa intermingle(28), and there is no obvious choice of a single sequencemost representative of the diversity in these regional samples(28); however, selecting a sequence with a short branchlength relative to the common ancestor (29) in C clademight be advantageous, as it would tend to be most similar tothe majority of contemporary sequences represented in the tree(30). Conversely, it would be sensible to avoid selectingoutliers. Indian sequences tend to form a distinct subclade (31)within the C clade, indicating that most sampled Indian virusesare descended from a single founder strain. A small number ofsequences from African nations are associated with sequences fromIndia, however, and the sampling is extremely limited relativeto the scale of the epidemic in both regions. Thus, there maybe continuing movement of the virus between Africa and India.The Indian clade sequences tend to have short branch lengths relativeto the root of the C clade. As a consequence, a strain like IN101(accession number AB023804) from India is closer to most Africansubtype C strains than African strains are to each other (32),an interesting quirk that emphasizes that it does not necessarilyconfer an advantage to select a strain from the country wherea vaccine trial will be held.

There are other subclades within the C subtype, besides the Indian subclade (33), that could be considered as vaccinecandidates (Fig. 1), but such subclades tend to havemuch shorter defining branch lengths than subtypes and, consequently,fewer distinguishing amino acids, so the benefit of consideringthem each separately for vaccines diminishes. On rare occasions,geographically localized epidemics have been identified soon afterthe introduction of a founder virus, and prevalent viruses werehighly related, as in Thailand (34) and Kaliningrad(35). It may ultimately be advantageous to develop vectorsand strategies for rapid-response vaccine programs in such circumstances,when a highly similar virus is spreading explosively through avulnerable population, but first, a working vaccine concept mustbe in place.

Evolutionary evidence for subtype-specific antigenicity in the envelope protein. Clearly, subtype-specific vaccines wouldincrease the overall sequence similarity of the vaccine antigenrelative to circulating viruses (Fig. 2), but this isonly part of the story for antibody binding, because protein foldingand exposure of antigenic domains are of great importance. Toexplore the hypothesis that there may be subtype-specific patternsin the exposure of antigenic domains that are able to elicit antibodyresponses strong enough to drive escape mutations, we comparedestimates of codon-specific ratios of nonsynonymous to synonymoussubstitution rates (dN/dS) (36) in B clade and C cladeenvelope genes. (We selected B and C subtypes, as B clade vaccinesare being considered for use in populations where the C subtypedominates.) High rates of diversifying selection were identifiedin different regions of the envelope protein (Env) in the twolineages, most strikingly in the Env V3 to C4 region. The V3 loopis less variable in the C subtype than in other subtypes (37),and as expected, the density of sites in the V3 loop with dN/dS> 1 was higher in the B clade than in the C clade. This patternwas reversed, however, in the region just proximal to the V3 loop,where multiple sites show an excess of nonsynonymous substitutionsin the C clade but not in the B clade (Fig. 3). To explorethe consistency of these patterns within the C clade, three subclades,or phylogenetically associated groups of sequences within theC subtype, were examined independently. Two sets had 21 subtypeC sequences, and the third had 18 sequences. The results suggestedsubstantial intraclade coherence in how selection acts on individualcodons within the C subtype; the 12 strongly selected sites inthe region downstream of the V3 loop (Fig. 3) had a dN/dSratio > 1 in each of the three independent C-subtype data sets,and the tip of the V3 loop was relatively constrained with lowdN/dS values. Given that immune escape is likely to be a drivingforce of positive selection, immune pressure may be focused ondifferent regions of Env in the B subtype (the V3 loop) and Csubtype (the COOH-terminal region beyond the V3 loop). If thisinterpretation of the observed differences in selection pressurein B and C subtypes is correct, there may be advantages in usinga clade-appropriate vaccine strain, as the immune response tothe vaccine and the circulating virus would share antigenic domains.

Fig. 2. Scanning the HIV-1 genome and proteins to illustrate similarities between potential vaccine candidates and sequences from isolates. (A) and (B) compare 23 full-length subtype C sequences from South Africa, Botswana, and India with potential vaccine sequences. Green lines represent the comparison with the subtype C consensus sequence. The purple and blue lines show the comparison of the sequences of vaccine candidates BR025 and ZA003 (described in Fig. 1), respectively. The red lines show an interclade comparison of subtype C sequences with the B clade sequence JRCSF. (A) shows a nucleotide similarity plot, and (B) shows the corresponding amino acid similarity plot. [View Larger Version of this Image (64K GIF file)]

Fig. 3. The dN/dS ratio at each position in the V3 region, comparing a B-subtype and a C-subtype alignment. The dN/dS ratio was determined for each codon in an env alignment of C- and B-subtype sequences. The V3 region gave a particularly striking distinction between the two subtypes, illustrated here. The blue lines indicate the dN/dS ratio in the V3 loop of the B subtype, a region known to be a target of type-specific neutralizing antibodies. Four codons on either side of the tip of the V3 loop have dN/dS ratios over 5, indicative of very strong positive selection. The red lines indicate the dN/dS ratio of the C subtype, and there is no strong pressure for change near the tip of the V3 loop. In contrast, downstream of the V3 loop there are 12 codons that exhibit high dN/dS ratios (>4) in the C subtype and only three in the B subtype. This suggests different regional evolutionary pressures in the two subtypes, and possibly distinct regions of antigenic exposure in these regions in the B and C lineages. [View Larger Version of this Image (17K GIF file)]

Artificial Sequences for Minimizing Diversity

An effective way to minimize the degree of sequence dissimilarity between a vaccine strain and contemporary circulating virusesis to create artificial sequences that are "central" to theseviruses. The simplest way to design such a sequence is to usea consensus sequence based on the most common amino acid in eachposition in an alignment (33, 38). Alternatively,a model of the most recent common ancestral sequence of an appropriatelineage can be reconstructed from a phylogenetic tree, for example,by means of maximum likelihood. The most likely sequence at anyinterior node in a tree can be derived from the sequences usedto construct the tree, the evolutionary model used (how oftenone base is mutated to another, and the relative mutation rateat each site), and the branching pattern of the tree. Figure 1illustrates where the C consensus and ancestral branch pointsare located in the tree. Both of these sequences are more "central,"i.e., they are closer to modern C-subtype sequences than modernsequences are to each other. As artificial sequences, their constructiondepends on the sequences included in the analysis and so willchange as the database expands.

Envelope proteins are the most difficult HIV proteins to construct artificially, as both ancestral and consensus sequencescontain hypervariable domains with multiple insertions and deletions(indels). Alignments are subjective in such regions, and indelsdo not evolve according to the base substitution models currentlyassumed in deriving a maximum likelihood tree. For constructingour consensus and ancestor sequences (3), hypervariableregions are aligned by anchoring on glycosylation sites, and onlyminimal common elements spanning the region are retained. As bothconsensus and ancestral sequences are derived and not actual sequences,expression, antigenicity, and biological activity require carefulcharacterization before use in a vaccine (39).

Although artificial sequences may not have a proper protein conformation, and this may be critical for antibody responses,it is less important for designing T cell epitopesor peptide reagentsfor testing T cell responses. Consensus sequences may be idealfor peptides used to explore the T cell immune response, as itwould probably improve recognition compared with any single referencestrain, and using sets of autologous strain peptides can be prohibitivelyexpensive. A consensus may even be preferable to autologous peptides,as CTL escape mutations can rapidly predominate in the viral quasispecies,and important early responses (40) may go undetectedthrough the use of peptides based on isolates from later timepoints that have escaped the early responses. Consensus peptidesfor several subtypes are available (41).

A similarity plot maps the percent similarity of a query sequence relative to a test set in a window spanning a region ofa specified size that is moved progressively along an alignment.In Fig. 2, prototype vaccine reagents (a C consensus,two subtype C vaccine strains, and a subtype B isolate) are usedas query sequences and compared with 23 subtype C sequences fromSouth Africa, India, and Botswana. In every gene region, the samerelative pattern holds. The spectrum of similarity scores forthe C consensus sequence compared with the set of 23 C sequencesis 5 to 15% greater than when any one C isolate is compared withothers in the set (28). In turn, subtype C proteinsare 5 to 15% more similar to the subtype C sequences than aresubtype B sequences. This implies that using a B clade virus asthe basis of a vaccine in a C clade-dominated epidemic may beless effective than using a C clade virus, and a C clade virusmay not be as effective as a C consensus. Conserved proteins fromdifferent subtypes can be more closely related than variable proteinsfrom the same subtype, and this fact might be exploited by usinga single vaccine strain for conserved proteins and multiple clade-specificstrains for variable proteins.

We have been discussing pooling sequences within a subtype to generate artificial central sequences, but it is also possibleto pool the subtypes themselves. To maximize potential cross-reactivity,we have created sequences central to the M group, the diverseviruses that have contributed most to the global epidemic. Theset of subtype consensus sequences was used to build an M-groupconsensus, thus weighting the subtypes equally. The M-group consensusand the most recent common ancestor can be very nearly identical(1, 28). Because of the nature of the HIV-1M-group phylogeny, the average distance from HIV-1 sequences tothe M-group consensus is similar to intrasubtype sequence distancesbetween contemporary isolates (32), roughly half thatof intersubtype distances (Table 2). In the DemocraticRepublic of the Congo (DRC) (42, 43), so manysubtypes and recombinants circulate together that the extent ofthe regional diversity resembles the global diversity. In thissetting, an M-group consensus may be helpful, or a polyvalentapproach including representative strains from common subtypesalong with the M-group consensus. Even in a design focusing onepitopes that are conserved across clades, an M-group consensusmight be the optimal baseline sequence. Consensus and ancestralsequences for the major HIV-1 subtypes, CRFs, and the M groupare available (3) and will be updated as sequences accrue.Intersubtype similarity comparisons with an M-group consensusare included in the supplementary material (31).

Table 2. Median and range of percent similarity scores between potential vaccine candidates and an alignment of C clade sequences. The similarities are shown for the p24 and gp160 proteins, representative of highly conserved and highly variable proteins. The C-subtype ancestral and consensus (con) sequences, when compared with the set of protein sequences from contemporary subtype C isolates, have comparable distributions of similar- ity scores. The M-group consensus is comparable to C clade isolate sequences. Two representative C clade isolates are shown, DU422 from South Africa and BR025 from Brazil. The last column shows the results of a subtype B strain compared with C sequences (intersubtype) for contrast. Regions with gaps were included, and either a relative insertion or deletion or an amino acid change at any position is considered one difference.

Protein Consensus sequences
Isolate sequences

C-subtype con C ancestral M-group con C_ZA.DU422 C_BR.92BR025 B_FR.HXB2R

p24 95.4 (97.4-92.8) 94.8 (97.7-92.8) 93.1 (95.1-91.1) 94.8 (97.4-92.1) 92.8 (96.4-88.9) 88.9 (91.8-86.6)

gp160 87.5 (90.6-83.5) 86.1 (88.3-81.7) 81.4 (84.0-77.1) 81.3 (89.3-77.4) 83.1 (85.9-79.6) 72.1 (73.8-68.0)

Consensus and ancestral sequences conserve CTL epitopes. Experimentally defined CTL epitopes in the HIV Immunology Database(3) cluster more densely in conserved regions of HIVproteins (44). The peptides spanning variable regionsused to detect CTL responses can be quite different from the infectingstrain that elicited the response, no doubt contributing to thepaucity of defined epitopes in variable domains, but, in addition,an enrichment of features that could contribute to CTL escapecan be discerned in variable domains (45). Either way,regions where defined epitopes are concentrated are likely tobe key for cross-reactive CTL responses (44). The epitopesin the database have primarily been defined for B clade responses;however, the C clade peptides that trigger immunodominant responsestend to be localized in these same regions (44, 46).Thus, in contrast to Fig. 2 and Table 2, wherewhole proteins were analyzed, we focused on protein regions whereCTL epitopes have been found in order to create Fig. 4,which shows the average sequence distances from potential vaccinestrains to immunogenic regions in subtype C proteins, by countryof origin. Three proteins were selected, representing the spectrumof variability: highly conserved p24, variable p17, and highlyvariable envelope (subunit gp160). In the immunogenic regionsanalyzed, C-subtype consensus and ancestral sequences had thefewest amino acid changes relative to contemporary C-subtype proteinsequences. Within-subtype comparisons of single C-subtype viralstrains and the M-group consensus sequences gave comparable numbersof amino acids changes, roughly half the number of changes relativeto B subtype interclade comparisons.

Fig. 4. Amino acid percent differences between vaccine strain sequences and C-subtype sequences in CTL epitopes. This analysis was limited to protein subregions known to be immunogenic by requiring overlap with at least one well-characterized CTL epitope from the HIV immunology database (3). The consensus, ancestral, and vaccine sequences were compared with all subtype C sequences, and subtype C sequences broken down by country of origin for Botswana, India, and South Africa. The median difference between the query and the C-subtype sequence set is shown. Seventy-nine subtype C sequences were used for the p24 comparison, 79 for p17, and 97 for gp160. The range of differences is indicated only for the South African set, which tends to be typical, to simplify the figure. The comparisons are numbered along the x axis: 1, Botswanan C consensus; 2, Indian C consensus; 3, South African C consensus; 4, C clade consensus; 5, C ancestral sequence; 6, M-group consensus; 7, M-group ancestral sequence; 8, C.ZA.DU422; 9, C.ZA.ZA003; 10, C.ZA.ZA009; 11, C.ZA.ZA012; 12, C.ZM.ZM651; 13, C.BR.BR025; and 14, B.FR.HXB2R. [View Larger Version of this Image (46K GIF file)]

Consensus and ancestral sequences conserve predicted immunoproteasome cleavage sites. For viral proteins to be recognizedby CTL they must be processed, and each step of epitope processinghas potential constraints imposed by sequence specificity. Immuneescape due to mutations in epitope flanking regions demonstratesescape from immune suppression through cleavage abrogation (47)and shows that epitope processing is sensitive to the surroundingsequence, although a simple cleavage signal is not readily discernable.If the tendency to be cleaved at a relevant site is markedly differentin a vaccine strain and a challenge strain, the immunologicalpriming induced by the vaccine will be ineffective. This problemis difficult to resolve experimentally, so we addressed it computationallyby means of NetChop (48), a neural net prediction programfor immunoproteasome cleavage (32, 45).

The median cleavage prediction scores for subtypes B and C were correlated, but although many sites preserved their relativetendency to be cleaved, there were many exceptions, positionswith high cleavage prediction scores in subtype B but not in C,or vice versa (32). This suggests that the predilectionfor cleavage of many sites would be altered in the two subtypes,which could result in diminished breadth of cross-reactive responses.C clade sequences and the M-group consensus gave cleavage predictionpatterns that were similar when compared with the median scoresfor the C clade alignment, and they performed better than sequencesfrom the B clade (32). Scores predicted for the C-subtypeconsensus cleavage correlated most strongly with the median scoresfor the subtype C population (32), suggesting that itwould be processed at any given position similarly to most ofthe subtype C strains, and so it may have the greatest potentialfor eliciting cross-reactive immune responses at the populationlevel. The complete analysis is provided in the supplemental information(32), but in summary, the linear correlation coefficients(r² values) for comparisons of the median C clade cleavage scoresto vaccine candidate strains are as follows for positions in theEnvelope protein: B clade, 0.65; the M-group consensus, 0.79;the M-group ancestor, 0.80; specific sequences from subtype Cisolates, 0.79 to 0.81; the subtype C ancestor, 0.88; and thesubtype C consensus, 0.92.

Consensus or reconstructed ancestor? One might assume that an ancestral sequence would resemble more closely a real viralprotein than a consensus. It is statistically extremely unlikely,however, that an ancestor corresponds to an ancestral sequenceof a clade as complex and diverse as an HIV-1 subtype. Furthermore,reconstruction greatly depends on assumptions inherent in buildingmaximum likelihood trees; for example, if positions are not evolvingindependently or there are undetected recombination events, theancestral reconstruction would be influenced and incorrect. Thus,it is highly improbable that an ancestor of a subtype ever existedprecisely as reconstructed. Ancestor and consensus sequences aresubject to different sampling biases and will change from yearto year as sequences accrue. An ancestor is influenced by sequencesexternal to the subtype of interest and will tend to be slightlymore distant from available sequences within a subtype than aconsensus sequence (Table 2), as well as slightly closerto sequences of other subtypes. The inclusion of a new outlierthat branches near the basal node of a subtype could have a stronginfluence on the ancestral node (see, for example, Fig. 1,where the ancestral and consensus sequences are separated mainlybecause of one single outlier virus from South Africa), but asa single sequence it would have little bearing on the consensus.In contrast, a consensus will be influenced by the sampling ofsequences from within subclades. For example, if many sequenceswere obtained from the Indian subclade during the next year, thenext C consensus would be more like the Indian subset, but theshift in sampling would have less impact on the subtype C ancestor,unless the new sequences substantially altered the evolutionarymodel.

It is possible that a consensus sequence based on contemporary isolates may be more likely to reflect escape variants relevantto the host population than an ancestral sequence. For example,a CTL escape mutant in an epitope presented by a human leukocyteantigen molecule common in a certain population may be selectedfor and may be more likely to be represented in the consensussequence than a reconstructed ancestor sequence. If most virusesin the circulating population had already lost the original epitopebecause of immune escape, and if the epitope elicited a dominantresponse upon vaccination with a strain that carried it, thenthe consensus sequence would have an advantage. On the other hand,if the wild-type form of the epitope was still circulating, eveninfrequently, and if the epitope was particularly potent, theremight be an advantage in using the ancestor. In the end, bothconcepts need to be tested experimentally, both in terms of Bcell and T cell responses.

Applying Evolutionary Principles to Vaccine Strain Selection

How can HIV's evolutionary trajectory be incorporated into a sensible vaccine approach? Although subtypes of HIV-1 are phylogeneticallydefined on the basis of genetic and evolutionary distances, thepractical consequence of phylogenetic clustering of viruses ispatterns of shared amino acids that can influence the immunologicalcross-reactivity of vaccine-stimulated immune responses. Env proteinsfrom different clades can differ in more than 30% of their aminoacids, and HIV-1 continues to diversify. Neutralizing antibodyas well as CTL escape occurs in vivo (49, 50),escape mutations can be transmitted and stable (49),and there are protein regions under clear positive selection pressure(36) (Fig. 3). These observations indicatethat HIV-1 amino acid variation is immunologically relevant. Theimpact of that variation on vaccine-conferred immune protectionwill ultimately have to be assessed through vaccine trials, butthe differences between potential vaccines and circulating strainscan be minimized when designing trial reagents to attempt to enhancecross-reactive responses.

Most vaccines are intended to elicit polyclonal responses to multiple epitopes, so even if they differ in some antigenic domainsfrom a given virus, in others they may be cross-reactive. Selectinga clade-appropriate vaccine for a regional trial would tend toincrease the number of potentially cross-reactive epitopes byincreasing the level of similarity between the vaccine and thepopulation, and the use of consensus and ancestors would enhancethe cross-reactive potential. The difference in selection pressureon B and C clade envelopes is indicative of lineage-specific antigenicity,further supporting the use of subtype-appropriate vaccines tomaximize the probability that the vaccine elicits immune responsesto domains that are antigenic in the circulating viruses.

We could see no compelling advantage in further subdividing the C clade by country of origin, although this is often a considerationfor vaccine design (8). Our analysis supports the recommendationsof the international meeting on candidate vaccines for the developingworld sponsored by the Vaccine Research Center of the United States,National Institute of Allergy and Infectious Diseases (51),indicating that, although there may be advantages to a subtype-specificvaccine, a promising subtype-specific vaccine candidate couldbe used in many different geographic locations without compromisingthe potential for success. This does not mean that there wouldnever be an advantage in tailoring a vaccine further by selectinga sequence from an interior subclade within a subtype. For example,there might be an advantage in using an Asian, not African, CRF01in Thailand, or an Indian C clade sequence in India. But within-cladedifferences tend to be subtle and represent far fewer amino acidchanges than between-subtype differences.

In regions where an epidemic is dominated either by a particular subtype or CRF, it makes sense to use that dominant lineagefor a vaccine and to consider the use of a consensus or ancestor.Although we cannot know if even the use of central sequences willbe enough to contend with HIV diversity, this kind of strategycan potentially enhance the cross-reactivity and breadth of avaccine response relative to any single strain. In regions wheretwo or three subtypes and multiple recombinants are cocirculating,to include each of the prevalent subtypes could improve the potentialcoverage not only of those subtypes, but of the variety of recombinantforms that stem from them (52). Finally, nations withvery diverse viral populations, like the DRC, might be best servedby developing polyvalent vaccines including a spectrum of naturalforms combined with an M-group consensus. An M-group consensusor ancestor is central not only to the major subtypes, but torecombinant forms involving the subtypes. Even if a single subtypepredominates in a country, combining an M-group consensus witha regionally dominant subtype might be advantageous in an urbancontext where people of many nationalities mingle.

REFERENCES AND NOTES

1.	B. Korber, et al., Science 288, 1789 (2000) [Abstract/Full Text] .
2.	P. M. Sharp, E. Bailes, D. L. Robertson, F. Gao, B. H. Hahn, Biol. Bull. 196, 338 (1999) [Medline] .
3.	HIV Immunology and Sequence Databases, B. Korber et al. Eds. (Los Alamos National Laboratory, Los Alamos, NM, 2000); available at www.hiv.lanl.gov.
4.	B. T. Korber, B. Foley, B. Gaschen, C. Kuiken, in Retroviral Immune Response and Restoration, G. Pantaleo and B. D. Walker, Eds. (Humana Press, Totowa, NJ, 2001), pp. 1-32.
5.	A. M. Schultz, J. A. Bradac, AIDS (London) 15 (suppl. 5), S147 (2001).
6.	B. Graham, in HIV Molecular Immunology, B. T. Korber et al., Eds. (Los Alamos National Laboratory, Theoretical Biology, Los Alamos, NM, 2000), Part I, pp. 20-38.
7.	UNAIDS, www.unaids.org/
8.	J. Goudsmit, in IAVI Rep. Dec 2000/Jan 2001 (International AIDS Vaccine Initiative, New York, 2001).
9.	S. W. Barnett, et al., J. Virol. 75, 5526 (2001) [Abstract/Full Text] .
10.	J. M. Binley, et al., J. Virol. 74, 627 (2000) [Abstract/Full Text] .
11.	E. O. Saphire, et al., Science 293, 1155 (2001) [Abstract/Full Text] .
12.	W. Xu, et al., J. Hum. Virol. 4, 55 (2001) [Medline].
13.	M. B. Zwick, et al., J. Virol. 75, 12198 (2001) [Abstract/Full Text] .
14.	J. R. Mascola, et al., J. Virol. 73, 4009 (1999) [Abstract/Full Text] .
15.	A. Li, et al., J. Virol. 72, 3235 (1998) [Abstract/Full Text] .
16.	H.-X. Liao, et al., J. Virol. 74, 254 (2002) .
17.	B. F. Haynes, et al., AIDS Res. Hum. Retrovir. 11, 211 (1995) [ISI][Medline] .
18.	For example, there were 34 unique forms of the immunogenic tip of the V3 loop (corresponding to the IHIGPGRA of MN) among C clade sequences from 436 infected Africans in the HIV Sequence Database 2001, and these may fall into a smaller number of workable serotypes [ S. Zolla-Pazner, M. K. Gorny, P. N. Nyambi, T. C. VanCott, A. Nadas, J. Virol. 73, 4042 (1999) [Abstract/Full Text] ] that could serve as a basis for a polyvalent peptide, but the complexity of this problem rapidly increases when multiple subtypes are considered.
19.	H. Cao, et al., J. Infect. Dis. 182, 1350 (2000) [Medline] .
20.	L. Dorrell, et al., Eur. J. Immunol. 31, 1747 (2001) [CrossRef][Medline] .
21.	G. Ferrari, et al., Immunol. Lett. 79, 37 (2001) [Medline] .
22.	V. Novitsky, et al., J. Virol. 75, 9210 (2001) [Abstract/Full Text] .
23.	S. L. Rowland-Jones, et al., J. Clin. Invest. 102, 1758 (1998) [Abstract/Full Text] .
24.	J. van Harmelen, et al., AIDS Res. Hum. Retrovir. 17, 1527 (2001) [CrossRef][Medline] .
25.	S. A. Lee, et al., Vaccine 20, 563 (2001) [Medline] .
26.	D. P. Francis et al., AIDS Res. Hum. Retrovir. 14 (suppl. 3), S325 (1998).
27.	K. Gupta, personal communication, International AIDS Vaccine Initiative (IAVI).
28.	M. Groenink, et al., Science 260, 1513 (1993) [ISI][Medline] .
29.	E. B. Stephens, et al., J. Med. Primatol. 25, 175 (1996) [Medline] .
30.	B. Foley, H. Pan, S. Buchbinder, E. L. Delwart, AIDS Res. Hum. Retrovir. 16, 1463 (2000) [CrossRef][Medline] .
31.	R. Shankarappa, et al., J. Virol. 75, 10479 (2001) [Abstract/Full Text] .
32.	Supplemental materials available on Science Online concerning methods, detailed figures, and additional discussion include gag and env trees used in this paper; interclade similarity plots; immunoproteasome cleavage prediction comparisons; and consensus and ancestral sequences for major subtypes, CRFs, and the M group.
33.	V. Novitsky, et al., J. Virol. 76, 5435 (2002) [Abstract/Full Text] .
34.	F. E. McCutchan, et al., J. Virol. 70, 3331 (1996) [Abstract] .
35.	K. Liitsola, et al., AIDS (London) 12, 1907 (1998) .
36.	Z. Yang, R. Nielsen, N. Goldman, A. M. Pedersen, Genetics 155, 431 (2000) [Abstract/Full Text] .
37.	C. L. Kuiken, B. Foley, E. Guzman, B. T. Korber, in Molecular Evolution of HIV, K. Crandall, Ed. (Johns Hopkins Univ. Press, Baltimore, MD, 1999).
38.	B. Korber, et al., Br. Med. Bull. 58, 19 (2001) [Abstract/Full Text] .
39.	An M-group consensus envelope protein reacted equally well with sera from B and subtype C infections in Western blots, and BiaCore assays revealed that it bound to CD4 and numerous monoclonal antibodies, equivalently to a normal Env; further studies are under way. F. Gao and B. Hahn, unpublished data (2002).
40.	T. M. Allen, et al., Nature 407, 386 (2000) [CrossRef][ISI][Medline] .
41.	The NIH AIDS Reagent Program www.aidsreagent.org/.
42.	J. L. Mokili, et al., AIDS Res. Hum. Retrovir. 15, 655 (1999) [CrossRef][Medline] .
43.	N. Vidal, et al., J. Virol. 74, 10498 (2000) [Abstract/Full Text] .
44.	B. D. Walker and B. T. Korber, Nature Immunol. 2, 473 (2001) [CrossRef][Medline].
45.	K. Yusim et al., J. Virol., in press.
46.	P. J. Goulder, et al., J. Virol. 74, 5679 (2000) [Abstract/Full Text] .
47.	N. J. Beekman, et al., J. Immunol. 164, 1898 (2000) [Abstract/Full Text] .
48.	C. Kesimir, A. K. Nussbaum, H. Schild, V. Detours, S. Brunak, Protein Eng. 15, 287 (2002) [Abstract/Full Text] .
49.	P. J. Goulder, et al., Nature 412, 334 (2001) [Medline] .
50.	J. P. Langedijk, G. Zwart, J. Goudsmit, R. H. Meloen, AIDS Res. Hum. Retrovir. 11, 1153 (1995) [Medline] .
51.	NIAID/NIH report: Development of URC HIV Candidate Vaccines for the Developing World, www.vrc.nih.gov.
52.	S. M. Agwale, et al., Vaccine 20, 2131 (2002) [CrossRef][Medline] .
53.	C. M. Rodenburg, et al., AIDS Res. Hum. Retrovir. 17, 161 (2001) [CrossRef][Medline] .
54.	We thank Y. Li, Y. Chen, and C. M. Rodenburg for excellent technical assistance and C. Brander, J. Bradac, C. Kuiken, U. Smith, and J. Mullins for ideas and suggestions. We would also like to thank our reviewers and editor, B. Jasny, for their exceptionally detailed and thoughtful comments. This work was supported by grants from the National Institutes of Health (N01 AI 85338, P20 AI 27767, R01 AI 40951, R01 AI 35351, R01 AI 05397) and NIH-Department of Energy interagency agreement YI AI 1500-01 and internal Laboratory Directed Research and Development funding at Los Alamos National Laboratory. Supporting Online Material www.sciencemag.org/cgi/content/full/296/5577/2354/DC1 Materials and Methods SFigs. 1 to S3

10.1126/science.1070441
Include this information when citing this paper.

Abstract of this Article

Reprint (PDF) Version of this Article

dEbates: Submit a response to this article

Related articles in Science

Similar articles found in:
SCIENCE Online

Supplemental Data

Search Medline for articles by:
Gaschen, B. || Korber, B.

Alert me when:
new articles cite this article

Download to Citation Manager

Collections under which this article appears:
Medicine/Diseases

Diversity Considerations in HIV-1 Vaccine Selection

Isolate-Based Vaccines

Artificial Sequences for Minimizing Diversity

Applying Evolutionary Principles to Vaccine Strain Selection

REFERENCES AND NOTES

Related articles in Science: