|
Learning Objectives:
- To
demonstrate how one can study the evolution of the immune system using
the standard sequence analysis tools like CLUSTALW, BLAST
and phylogeny tools available on the internet.
- As a specific case: to learn more about the evolution of NK cells, a subject that we discussed
in depth during the lectures.
Background: Gascoyne
et al : The basic leucine zipper transcription factor E4BP4 is essential for natural killer cell development.
Make sure you have read and understood the
abstract of the paper.
|
|
Our big research question is
"how old are NK cells?". In other
words, what is the evolutionarily earliest organism that have NK cells. We will assume that
existance of an homolog of E4BP4 protein in a genome is enough
evidence to suggest that an organism can have NK cells
similar to vertebrate NK cells. Therefore, we can use the protein
sequence of human E4BP4 to study the evolution of NK cells. Following questions/hints will help you to
investigate the answer: |
1. |
Which domains does E4BP4 protein
have? Are these domains all specific for E4BP4-like proteins, or do
they also occur in other
proteins with different function than E4BP4?
Determine which domain(s) you can use to study the evolution of E4BP4. To answer this question, you can
make use of PFAM web page. |
2. |
Find homologs of
E4BP4 protein, align these sequences and
make a phylogenetic tree. Do this first with protein sequences, but you will need to check nucleotid
sequences as well to reach a conclusion on the age of NK cells. |
3. |
Are NK cells part of the adaptive
or innate immune system? How does your conclusion
on the evolution of NK cells fit into this? Was our initial
assumption a good one? |
If you decide to write your research project report on this computer exercise, then we require that you perform the analysis suggested above, but moreover, you will need to think of and perform an extra analysis. One possibility is to repeat your analysis using a different protein, some candidates are
mentioned in the article of Gascoyne et al. |
General:
NCBI Entrez
PFAM
(Domains)
WebLogo
Tree of life
UniProt
|
|
|
|
|
- The most commonly used format for sequence files for
many bioinformatics servers is
the FASTA format. The description of this format is explained in the
NCBI help pages.
- You can limit your search result at NCBI by
specifying the database field you want to search, for example: cytochrome
B
AND
arabidopsis[orgn] only returns hits in the organism
Arabidopsis. You will find more on the syntax of NCBI entrez queries here.
- You can make this exercise using protein and/or nucleotide sequences. If you want to use nucleotide sequences,
extract the sequences using "Nucleotide" pull-down menu in NCBI instead of Gene menu. Using the gene option you might risc to get a big portion of the chromosome where the gene is located.
-
Once you have your sequences in a file in FASTA format edit the header lines (the first line)
so that the first word is no longer a sequence ID, but it is
something you can easily understand, like organism name, etc. This is rather crucial
to later have a good phylogenic tree.
|
|
|
- At the NCBI Blast start-page, look carefully
at all
the links before you click on something.
- Choose the proper database for your blast
search.
- Make sure you use the correct BLAST flavor;
NCBI
defaults to Megablast for nucleotide sequences, not blastn! Since we are
looking for far away homologs Megablast is not a good choice.
- Click on the Taxonomy reports link on top of
the
BLAST result
page to see the taxonomy of your hits.
- By
default NCBI BLAST only shows the 100 best hits. You can change this
behaviour by clicking "Algorithm Parameters" on the BLAST query page
and selecting a higher number under "Max target sequences". Note that
selecting numbers above 1000 tends to make BLAST rather slow.
|
|
|
|
- To align sequences using Clustal Omega , open the
FASTA
file in your browser or in Notepad, and paste the sequences into the
sequence box.
- Jalview is a nice tool for identifying specific positions and conserved regions in multiple sequence alignments; you can find a link to Jalview
under Results Summary link in EBI clustalw output.
You can compare the alignments using JalView. Click the Show Colors button to color the amino
acids according to their properties. This will also make it easier to
compare the sequences.
- To open more than 1 alignment at a time in
JalView,
you need to save the result and then open it again.
- Remember that the differences between
alignments are
in the gaps! So focus on the gaps, heads and tails.
- If a position is fully conserved, it is
indicated
with "*". Substitutions can fall into three categories: between very
similar amino acids (":"), between relatively similar amino acids (".")
and between non-similar ones (indicated without any symbol).
- Whether or not Clustal Omega decides that it is
'good' to
have a gap in an alignment depends of course on the gap penalty. For
very low gap penalties, Clustal Omega may easily insert a gap to get a
'better' alignment, even if a deletion or insertion would be unlikely
(when could this be the case? Think of the relation between nucleotides
and proteins). For extremely high gap penalties almost all gaps will go
away (except at the beginning and end of the sequence), even if a
deletion or insertion could easily have taken place.
- Note that the sequences should be aligned
before
you can make a phylogenetic tree! To do this, first paste the sequences
in the EBI
Clustal Omega page and align them. To make a phylogenetic tree, paste
the aligned sequences in the submission form on the EBI Phylogeny
page, including the first line or use the link from the Clustal Omega Results page to send it
directly to EBI Phylogeny page. The aligned sequences can also be
downloaded by clicking on the links (top of Clustal Omega result page).
The phylogeny software can generate a tree with unaligned sequences,
so you should remember yourself to do the alignment first.
- The default clustering method is NJ (which stands
for Neighbour
Joining).
- Scroll
to the bottom of the results page to see the tree. The default is a cladogram,
to get a phylogram click on the "real" button .
- Remember these trees that you obtain are
UNROOTED
trees. Where do you think the root of the tree should be in this
exercise ? A more advanced tool to draw phylogenetic trees are in iTol
package. Here you can change the root of your tree. You have to upload
your tree file.
|
|