Bioinformatic Pattern Analysis
  Project: HIV evolution and cost of immune escapes
 
Goal: Learn to combine tools encountered in the previous Computer exercises to answer specific biological questions.

Remarks: Although a couple of hints are given at the bottom of this page, the questions are less straightforward than the previous Computer exercises, and answering them will require some trial and error. Do not hesitate to ask your assistant if you feel you are not making any progress (but please read the hints first!). Note that you can answer question 2 without fully answering question 1, so you won't need to be bored when you get stuck while your assistant is temporarily unavailable.

Format: The report should be minimum of 800 words and contain two figures that you have generated yourself. Moreover, it should be in the form of a short article, with an introduction, results and discussion section. Please do not copy-paste complete pages of BLAST output; a summary like "a blastp search of protein x against the nr databases gives hits in species y and z" will do. If you are using web servers (e.g. for a BLAST search or a clustalw alignment), don't forget to write down the settings you are using if they are different from the default settings of the page. Make the report a story, not just a list of answers. You can find a more elaborate guide here. If you work in a group of 2-3, it is enough to hand in a single report, but please state clearly the names. The deadline is on the Calendar page of this course.
 

 
HIV
HIV is the principal cause of acquired immunodeficiency syndrome (AIDS). HIV/AIDS remains to be one of the most severe health concerns, as it effects lives of more than 33 million people all around the world. The background article by Ho & Bieniasz gives a good overview on HIV/AIDS.

1. HIV virus was transmitted to human from a non-human primate. However, when (and from which organism) HIV was transmitted to humans is still under investigation. Other immunodeficieny viruses related to HIV have been found in many species. In the first assignment you will try to confirm the origins of the HIV. Use the protein with accession number CBI61196.1 as the start of your analysis. Did HIV cross the species boundary only a single time, or can you find evidence for multiple events? If you have time, repeat the same analysis starting from a different protein and see if you get same results. The origins of HIV virus is discussed in Ho & Bieniasz and Keele et al..
2. Remember that mutations in HIV-1 proteins allow the virus to escape the human immune response. Therefore, an efficient HIV-1 vaccine, i.e., one that generates immunity that is persistent to immune escapes, should consist of HIV-1 proteins or fragments of the proteins where the virus can not escape (because the fitness cost of the mutations too high??). HIV has 9 genes (explained more in detail here). Pol gene is one of the most immunogenic genes of HIV, i.e. fragments from Pol are often recognized by T cell responses. In this exercise, you should identify the regions in the Pol gene where immune escapes are less likely to occur, and therefore would be good candidates to include in HIV-1 vaccine. Can you find supporting evidence for your results in the function and structure of the regions you identified? Use the protein with accession number CAB86375 as the start of your analysis. If clustalw is taking a very long time, try to use fewer sequences or a different server.
 

 
Links

BLAST servers:

Sequence analysis webservers:

Tree of Life

NCBI Entrez

WebLogo

Google

Wikipedia

budding HIV (source: Wikipdia)

Scanning electron micrograph of HIV-1 budding from cultured lymphocyte

 

 
Hints
Many of the hints of the previous Computer exercises could prove useful. Here are additional hints:
  • Remember there are several HIV viruses, HIV-1 (group M, N, and O), and HIV-2, and you should try to identify the origins of at least two different HIV viruses.
  • You can get a fasta file with the amino acid sequence from a selection of BLAST hits by selecting the checkbox in front of your hits of interest in the alignment part of NCBI BLAST output, followed by clicking the "get selected sequences" link at the bottom of the page. On the next page, change the Display pull-down menu to "fasta" and select "Text" or "File" from the Send-to menu.
  • If you want to make a phylogenetic tree of a protein with a lot of blast hits, don't simply take the top hits, but consider including hits with varying degrees of similarity.
  • If you find too many BLAST hits for HIV-1 proteins in the nr database, try changing the database to refseq. To find out the host of a virus you have to click on the gene identifier and read the NCBI entry. Often under FEATURES there would be a "source" item which states which strain of the virus you found and (if known) what the host is.
  • You can select how BLAST output is sorted by clicking on the different headers of the summary table. Sorting by "Query_coverage" can be helful in getting rid if partial hits and synthetic constructs (which are often very short).
  • On the query page of NCBI BLAST, you can use Entrez queries to limit the results; the query "feline"[descr] will mostly give you hits from the feline immunodeficiency virus, while all[filter] NOT "Human immunodeficiency virus 1"[descr] will for example exclude BLAST hits from HIV-1. Similarly the query gorilla[all] would result (almost only) with virus sequences where in the NCBI entry anywhere the word gorilla is mentioned. You can also change the hits presented in the BLAST output through the link "formatting options" on top of the results page.
  • When you are selecting BLAST hits to generate a multiple sequence alignment, try to select proteins which are similar to the BLAST query over their entire length.
  • Making multiple sequence alignments of long proteins like Pol can take quite a while, so it is probably a good idea to limit the number of sequence to align (max 15).
  • Jalview is a nice tool for identifying specific positions and conserved regions in multiple sequence alignments; you can find a link to Jalview near the top of the EBI clustalw output.
  • An easy way to include pictures in your report is to press the print-screen button, followed by pasting in a program like MS Word or Open Office Writer.