Immunobiology, Computer Exercise Binf I

Learning Objectives:

To demonstrate how one can study the evolution of the immune system using the standard sequence analysis tools like CLUSTALW, BLAST and phylogeny tools available on the internet.
As a specific case: to learn more about the evolution of NK cells, a subject that we discussed in depth during the lectures.

Background: Gascoyne et al : The basic leucine zipper transcription factor E4BP4 is essential for natural killer cell development. Make sure you have read and understood the abstract of the paper.

Evolution of NK cells

Our big research question is "how old are NK cells?". In other words, what is the evolutionarily earliest organism that have NK cells. We will assume that existance of an homolog of E4BP4 protein in a genome is enough evidence to suggest that an organism can have NK cells similar to vertebrate NK cells. Therefore, we can use the protein sequence of human E4BP4 to study the evolution of NK cells. Following questions/hints will help you to investigate the answer:

Which domains does E4BP4 protein have? Are these domains all specific for E4BP4-like proteins, or do they also occur in other proteins with different function than E4BP4? Determine which domain(s) you can use to study the evolution of E4BP4. To answer this question, you can make use of PFAM web page.

Find homologs of E4BP4 protein, align these sequences and make a phylogenetic tree. Do this first with protein sequences, but you will need to check nucleotid sequences as well to reach a conclusion on the age of NK cells.

Are NK cells part of the adaptive or innate immune system? How does your conclusion on the evolution of NK cells fit into this? Was our initial assumption a good one?

Suggestions for extra analysis

If you decide to write your research project report on this computer exercise, then we require that you perform the analysis suggested above, but moreover, you will need to think of and perform an extra analysis. One possibility is to repeat your analysis using a different protein, some candidates are mentioned in the article of Gascoyne et al.

Links

General:

PFAM (Domains)

BLAST servers:

NCBI BLAST

NCBI BLAST help

Alignment and phylogeny webservers:

Hints

Sequence retrieval

The most commonly used format for sequence files for many bioinformatics servers is the FASTA format. The description of this format is explained in the NCBI help pages.
You can limit your search result at NCBI by specifying the database field you want to search, for example: cytochrome B AND arabidopsis[orgn] only returns hits in the organism Arabidopsis. You will find more on the syntax of NCBI entrez queries here.
You can make this exercise using protein and/or nucleotide sequences. If you want to use nucleotide sequences, extract the sequences using "Nucleotide" pull-down menu in NCBI instead of Gene menu. Using the gene option you might risc to get a big portion of the chromosome where the gene is located.
Once you have your sequences in a file in FASTA format edit the header lines (the first line) so that the first word is no longer a sequence ID, but it is something you can easily understand, like organism name, etc. This is rather crucial to later have a good phylogenic tree.

BLAST

At the NCBI Blast start-page, look carefully at all the links before you click on something.
Choose the proper database for your blast search.
Make sure you use the correct BLAST flavor; NCBI defaults to Megablast for nucleotide sequences, not blastn! Since we are looking for far away homologs Megablast is not a good choice.

Click on the Taxonomy reports link on top of the BLAST result page to see the taxonomy of your hits.
By default NCBI BLAST only shows the 100 best hits. You can change this behaviour by clicking "Algorithm Parameters" on the BLAST query page and selecting a higher number under "Max target sequences". Note that selecting numbers above 1000 tends to make BLAST rather slow.

Alignment and Phylogeny using EBI servers

To align sequences using Clustal Omega , open the FASTA file in your browser or in Notepad, and paste the sequences into the sequence box.
Jalview is a nice tool for identifying specific positions and conserved regions in multiple sequence alignments; you can find a link to Jalview under Results Summary link in EBI clustalw output. You can compare the alignments using JalView. Click the Show Colors button to color the amino acids according to their properties. This will also make it easier to compare the sequences.
To open more than 1 alignment at a time in JalView, you need to save the result and then open it again.
Remember that the differences between alignments are in the gaps! So focus on the gaps, heads and tails.
If a position is fully conserved, it is indicated with "*". Substitutions can fall into three categories: between very similar amino acids (":"), between relatively similar amino acids (".") and between non-similar ones (indicated without any symbol).
Whether or not Clustal Omega decides that it is 'good' to have a gap in an alignment depends of course on the gap penalty. For very low gap penalties, Clustal Omega may easily insert a gap to get a 'better' alignment, even if a deletion or insertion would be unlikely (when could this be the case? Think of the relation between nucleotides and proteins). For extremely high gap penalties almost all gaps will go away (except at the beginning and end of the sequence), even if a deletion or insertion could easily have taken place.
Note that the sequences should be aligned before you can make a phylogenetic tree! To do this, first paste the sequences in the EBI Clustal Omega page and align them. To make a phylogenetic tree, paste the aligned sequences in the submission form on the EBI Phylogeny page, including the first line or use the link from the Clustal Omega Results page to send it directly to EBI Phylogeny page. The aligned sequences can also be downloaded by clicking on the links (top of Clustal Omega result page). The phylogeny software can generate a tree with unaligned sequences, so you should remember yourself to do the alignment first.
The default clustering method is NJ (which stands for Neighbour Joining).
Scroll to the bottom of the results page to see the tree. The default is a cladogram, to get a phylogram click on the "real" button .
Remember these trees that you obtain are UNROOTED trees. Where do you think the root of the tree should be in this exercise ? A more advanced tool to draw phylogenetic trees are in iTol package. Here you can change the root of your tree. You have to upload your tree file.