Sequence Analysis in Immunology
  Evolution of NK cells
 
Goal: to become familiar with classical sequence analysis tools: CLUSTALW, BLAST and phylogeny tools available on the internet. Using these tools we will study the origin of the NK cells.
Background: Gascoyne et al : The recently identified transcription factor that seems to be specific for NK cells. Make sure you have read and understood the abstract of the paper.
The Hints section gives a lot of practical help.

Evolution of NK cells
Our big question is "how old are NK cells?". In other words, what is the evolutionarily earliest organism that had NK cells. We will assume that existance of an homolog of E4BP4 protein in a genome is enough evidence to suggest that an organism can have NK cells similar to vertebrate NK cells. Therefore, we can use the protein sequence of human E4BP4 to study the evolution of NK cells. Following questions/hints will help you to investigate the answer:
1. Which domains does E4BP4 protein have? Are these domains all specific for E4BP4-like proteins, or do they also occur in other proteins with different function than E4BP4? Determine which domain(s) you can use to study the evolution of E4BP4.
2. Find homologs of E4BP4 protein, align these sequences and make a phylogenetic tree. Check both nucleotide and protein sequences to reach a conclusion on the age of NK cells.
3. Are NK cells part of the adaptive or innate immune system? How does your conclusion on the evolution of NK cells fit into this? Was our initial assumption a good one? Or can you think of another protein to study the evolution of NK cells?
Links

General:

NCBI Entrez

PFAM (Domains)

EBI tools overview

WebLogo

Tree of life

UniProt

BLAST servers:

NCBI BLAST help

Alignment and phylogeny webservers:



Hints
Sequence retrieval


  • The most commonly used format for sequence files for many bioinformatics servers is the FASTA format. The description of this format is explained in the NCBI help pages.
  • You can limit your search result at NCBI by specifying the database field you want to search, for example: cytochrome B AND arabidopsis[orgn] only returns hits in the organism Arabidopsis. You will find more on the syntax of NCBI entrez queries here.
  • Once you have your sequences in a file in FASTA format edit the header lines (the first line) so that the first word is no longer a sequence ID, but it is something you can easily understand, like organism name, etc. This is rather crucial to later have a good phylogenic tree.
BLAST


  • At the NCBI Blast start-page, look carefully at all the links before you click on something.
  • Choose the proper database for your blast search. 
  • Make sure you use the correct BLAST flavor; NCBI defaults to Megablast for nucleotide sequences, not blastn! Since we are looking for far away homologs Megablast is not a good choice.
  • Click on the Taxonomy reports link on top of the BLAST result page to see the taxonomy of your hits.
  • By default NCBI BLAST only shows the 100 best hits. You can change this behaviour by clicking "Algorithm Parameters" on the BLAST query page and selecting a higher number under "Max target sequences". Note that selecting numbers above 1000 tends to make BLAST rather slow.

Alignment and Phylogeny using Jalview


  • Install Jalview. How to do this is explained extensively here . Start the program locally. First you will get a lot of examples explaining how the program works.
  • Upload your sequences (using File menu/Input alignments). Remember that the sequences you retrieved via BLAST searches are unaligned sequences, and you should first align them prior to the phylogenetic analysis. You can align your sequences using MUSCLE or CLUSTALW programs (available under Webserver menu in Jalview).
  • To make a phylogenetic tree you can use the Calculate tree tool in Calculate menu of Jalview. Choose Blosum62 matrix and Neighbour Joining (NJ) method if you are working with the protein sequence. For nucleotide sequences the only option is to use percentage identity and NJ method.

Alignment and Phylogeny using EBI servers


  • To align sequences using a webserver, open the FASTA file in your browser or in Notepad, and paste the sequences into the sequence box. You can select the scoring matrix (BLOSUM, identity) using the MATRIX dropdown-menu, above the sequence box.
  • The labels of most of the ClustalW options at the EBI website are links to the relevant bits of the ClustalW help pages.
  • Always use ClustalW's slow or full alignment algorithm, and not the fast one. Some servers default to the fast algorithm, so do not forget to change this.
  • Jalview is a nice tool for identifying specific positions and conserved regions in multiple sequence alignments; you can find a link to Jalview under Results Summary link in EBI clustalw output. You can compare the alignments using JalView. Click the Show Colors button to color the amino acids according to their properties. This will also make it easier to compare the sequences.
  • To open more than 1 alignment at a time in JalView, you need to save the result and then open it again.
  • Remember that the differences between alignments are in the gaps! So focus on the gaps, heads and tails.
  • If a position is fully conserved, it is indicated with "*". Substitutions can fall into three categories: between very similar amino acids (":"), between relatively similar amino acids (".") and between non-similar ones (indicated without any symbol).
  • Whether or not ClustalW decides that it is 'good' to have a gap in an alignment depends of course on the gap penalty. For very low gap penalties, ClustalW may easily insert a gap to get a 'better' alignment, even if a deletion or insertion would be unlikely (when could this be the case? Think of the relation between nucleotides and proteins). For extremely high gap penalties almost all gaps will go away (except at the beginning and end of the sequence), even if a deletion or insertion could easily have taken place.
  • Note that the sequences should be aligned before you can make a phylogenetic tree! To do this, first paste the sequences in the EBI ClustalW page and align them. To make a phylogenetic tree, paste the aligned sequences in the submission form on the EBI Phylogeny page, including the first line (the line that says CLUSTAL 2.0.10 multiple sequence alignment). The aligned sequences can also be downloaded by clicking on the file links (top of ClustalW result page). The phylogeny software can generate a tree with unaligned sequences , so you should remember yourself to do the alignment first. Remember the guide tree in the a lignment page is not a phylogenetic tree, but one made based on pairwise distances and UPGMA method. The guide tree i s a "guide" to generate the multiple alignment.
  • The default clustering method is NJ (which stands for Neighbour Joining).
  • Scroll to the bottom of the results page to see the tree. This is a cladogram, to get a phylogram press the button Show as Phylogram Tree.
  • Remember these trees that you obtain are UNROOTED trees. Where do you think the root of the tree should be in this exercise ? A more advanced tool to draw phylogenetic trees are in iTol package. Here you can change the root of your tree. You have to upload your tree file.
  • In the parameters there is also a setting called CORRECT DIST. This is th e same as the Kimura 2-parameter correction, which corrects the evolutionary distances for multiple substitutions. See whether setting this has an effect on the phylograms you produce.