become familiar with classical sequence analysis tools: CLUSTALW, BLAST
and phylogeny tools available on the internet. Using these tools we
will study the origin of the NK cells.
et al : The recently identified transcription factor that seems to
be specific for NK cells. Make sure you have read and understood the
abstract of the paper.
The Hints section gives a lot
of practical help.
|Our big question is
"how old are NK cells?". In other
words, what is the evolutionarily earliest organism that had NK cells. We will assume that
existance of an homolog of E4BP4 protein in a genome is enough
evidence to suggest that an organism can have NK cells
similar to vertebrate NK cells. Therefore, we can use the protein
sequence of human E4BP4 to study the evolution of NK cells. Following questions/hints will help you to
investigate the answer:
||Which domains does E4BP4 protein
have? Are these domains all specific for E4BP4-like proteins, or do
they also occur in other
proteins with different function than E4BP4?
Determine which domain(s) you can use to study the evolution of E4BP4.
||Find homologs of
E4BP4 protein, align these sequences and
make a phylogenetic tree. Check both nucleotide and protein
sequences to reach a conclusion on the age of NK cells.
||Are NK cells part of the adaptive
or innate immune system? How does your conclusion
on the evolution of NK cells fit into this? Was our initial
assumption a good one? Or can you think of another protein to study the
evolution of NK cells?
Tree of life
- The most commonly used format for sequence files for
many bioinformatics servers is
the FASTA format. The description of this format is explained in the
NCBI help pages.
- You can limit your search result at NCBI by
specifying the database field you want to search, for example: cytochrome
arabidopsis[orgn] only returns hits in the organism
Arabidopsis. You will find more on the syntax of NCBI entrez queries here.
Once you have your sequences in a file in FASTA format edit the header lines (the first line)
so that the first word is no longer a sequence ID, but it is
something you can easily understand, like organism name, etc. This is rather crucial
to later have a good phylogenic tree.
- At the NCBI Blast start-page, look carefully
the links before you click on something.
- Choose the proper database for your blast
- Make sure you use the correct BLAST flavor;
defaults to Megablast for nucleotide sequences, not blastn! Since we are
looking for far away homologs Megablast is not a good choice.
- Click on the Taxonomy reports link on top of
page to see the taxonomy of your hits.
default NCBI BLAST only shows the 100 best hits. You can change this
behaviour by clicking "Algorithm Parameters" on the BLAST query page
and selecting a higher number under "Max target sequences". Note that
selecting numbers above 1000 tends to make BLAST rather slow.
- Install Jalview.
How to do this is explained extensively here . Start the program
locally. First you will get a lot of examples explaining how the program works.
- Upload your sequences (using File menu/Input alignments). Remember
that the sequences you retrieved via BLAST searches are unaligned sequences,
and you should first align them prior to the phylogenetic analysis. You can align
your sequences using MUSCLE or CLUSTALW programs (available under Webserver menu in Jalview).
- To make a phylogenetic tree you can use the Calculate tree tool in Calculate
menu of Jalview. Choose Blosum62 matrix and Neighbour Joining (NJ) method if you are working
with the protein sequence. For nucleotide sequences the only option is
to use percentage identity and NJ method.
- To align sequences using a webserver, open the
file in your browser or in Notepad, and paste the sequences into the
sequence box. You can select the scoring matrix (BLOSUM, identity)
using the MATRIX dropdown-menu, above the sequence box.
- The labels of most of the ClustalW options at
website are links to the relevant bits of the ClustalW help pages.
- Always use ClustalW's slow or full alignment
algorithm, and not the fast one. Some servers default to the fast
algorithm, so do not forget to change this.
- Jalview is a nice tool for identifying specific positions and conserved regions in multiple sequence alignments; you can find a link to Jalview
under Results Summary link in EBI clustalw output.
You can compare the alignments using JalView. Click the Show Colors button to color the amino
acids according to their properties. This will also make it easier to
compare the sequences.
- To open more than 1 alignment at a time in
you need to save the result and then open it again.
- Remember that the differences between
in the gaps! So focus on the gaps, heads and tails.
- If a position is fully conserved, it is
with "*". Substitutions can fall into three categories: between very
similar amino acids (":"), between relatively similar amino acids (".")
and between non-similar ones (indicated without any symbol).
- Whether or not ClustalW decides that it is
have a gap in an alignment depends of course on the gap penalty. For
very low gap penalties, ClustalW may easily insert a gap to get a
'better' alignment, even if a deletion or insertion would be unlikely
(when could this be the case? Think of the relation between nucleotides
and proteins). For extremely high gap penalties almost all gaps will go
away (except at the beginning and end of the sequence), even if a
deletion or insertion could easily have taken place.
- Note that the sequences should be aligned
you can make a phylogenetic tree! To do this, first paste the sequences
in the EBI
ClustalW page and align them. To make a phylogenetic tree, paste
the aligned sequences in the submission form on the EBI Phylogeny
page, including the first line (the line that says CLUSTAL 2.0.10
multiple sequence alignment). The aligned sequences can also be
downloaded by clicking on the file links (top of ClustalW result page). The phylogeny software can generate a tree with unaligned sequences
, so you should remember yourself to do the alignment first. Remember the guide tree in the a
lignment page is not a
phylogenetic tree, but one made based on pairwise distances and UPGMA method. The guide tree i
s a "guide" to
generate the multiple alignment.
- The default clustering method is NJ (which stands
to the bottom of the results page to see the tree. This is a cladogram,
to get a phylogram press the button Show as Phylogram Tree.
- Remember these trees that you obtain are
trees. Where do you think the root of the tree should be in this
exercise ? A more advanced tool to draw phylogenetic trees are in iTol
package. Here you can change the root of your tree. You have to upload
your tree file.
- In the parameters there is also a setting called CORRECT DIST. This is th
e same as the Kimura 2-parameter correction, which corrects the evolutionary distances for multiple substitutions. See whether
setting this has an effect on the phylograms you produce.