Errata Reader
 Page 2: For all R modules, it is mentioned to look in the back of the reader, but in fact the best place to look is in Blackboard under "Course Content". Blackboard also contains the R selftests and the deadlines (see calendar, also in Blackboard).
 Page 2: On Tuesday February 4th, 11:0012:45 you can make the exercises of Chapters 1 & 2 with the teaching assistants (TAs), location: see MyTimeTable. After 15:00 you can make the R modules 46 in self study (see Blackboard).
 Page 35: In q5, add: Hint: remember, you can always use '?' in R to find help on any function (see Section 2.5.1).
 Page 46: In q3d, Does this tell you something about the alpaca should be What does this tell you about the alpaca. (This should be changed to make you think a bit harder and avoid giving you a 50% chance of guessing the answer.)
 Page 48: q5 is an extra exercise to practice with R  you may skip until later.
 Page 48: q6 dh  you may skip until later.
 Page 62: q9b  (i.e. star representation) should be (i.e. radial representation).
 Page 72: q4  forget the comment about JukesCantor correction  this is only explained in Chapter 7.
 Page 73: q6  In the next version of the reader, I will break down this exercise into steps. First, identify the phylogenetically informative positions. Second, create a distance matrix listing the number of sites that differ between each pair of sequences. Third, consider all possible trees that could be made from these six sequences. In each tree, indicate which mutations occurred on the branches. Fourth: the maximum parsimony tree is the one that requires the lowest total number of mutations. Hint: use the heuristic teresearching approach. Once you have found a tree that is pretty good (requires a low number of mutations), see if you can improve it by swapping branches around. Once you are confident that no branchswaps lead to a lower number of assumed mutations, you might just have found the maximum parsimony tree!
 Page 79: Change sentence to "In such cases, you will find more synonymous mutations, and less nonsynonymous mutations in the DNA, so dN/dS < 1."
 Page 85: q10c  Note that "T" is used as a variable name here, not as the R shorthand for "TRUE".
 Page 86: q12a  Do not take the length of the sequence into account for now.
 Page 93: q4b  Hint: we use the seq command to get evenly spaced values along the Xaxis. This allows us to nicely plot the graph in R.
 Page 104: Section 8.7: Check out the very helpful Wikipedia page of the NeedlemanWunsch algorithm.
 Page 106: q6k  Hint: click the link to the Interpro domain database.
 Page 118: q11a  Error! Reference source not found. should be 5.
 Page 133: In the answer to q1b, you could refer to Figure 12 to see that 10 different phyla were measured. If we assume that this is the same data that the PCA in Figure 13 was based on (it is from the same publication after all), then the original data had 10 dimensions  one for every phylum. Every metagenome is a point in this 10dimensional space depending on how much of that phylum was measured, and Figure 13 is a twodimensional projection of this space.
 Page 134: Extra explanation to q3d. We know:
 0 < D < 1
 1 < r < 1
 Correlation (r) can be viewed as a similarity (S) measure (r~S).
 D = 1  S
Thus we could say D = 1  r. However, that would violate the restriction in (1) that 0 < D < 1  if e.g. r = 0.6.
Therefore we need to first scale r, so the values it takes lead to meaningful distance calculations.
Starting with (2):
1 < r < 1
0 < r+1 < 2 (add 1 to all members of the inequality)
0 < (r+1) / 2 <1 (divide all by 2)
Thus, if r_scaled = (r+1) / 2 :
D = 1  r_scaled =
1  ((r+1)/2) =
(2  r 1) / 2 =
(1r) / 2 => D = 0.5  r/2
 Page 137: In the answer to q5b, note that this function is basically identical to Equation 1 on pg41.
 Page 140: In the answer to q6g, 88% of the variation should be 89% of the variation.
 Page 144: In the answer to 2c: (((A:0.1,B:0.1):0.075,C:0.175):0.04,D:215); should be (((A:0.1,B:0.1):0.075,C:0.175):0.04,D:0.215);.
 Page 144: In the answer to q3, "P (Phenylalanine)" should be "P (Proline)". (Phenylalanine is F.)
 Page 146: In the answer to q10d, the ("Human COX1","Sheep COX1") branch in the tree on the left has a bootstrap value of 75. Note that this number could be lower if any of the other bootstrap trees had a ("Sheep COX1","Sheep COX1") branch.
 Page 148: The answer to q5 could be A, B, or E, depending on the specific mutation. If the 3nucleotide insertion (A) or deletion (E) occurred in frame, this would lead to a single change in the protein sequence, i.e. the insertion of deletion of a single amino acid, respectively. If these mutations occurred out of frame, the mutation could lead to two amino acid changes in the protein sequence. The 4nucleotide substitution (answer E) could lead to 0, 1, or 2 amino acid changes in the protein sequence, depending on which codons are affected.
 Page 150: In the answer to q1a, S is the least conserved in the PAM250 matrix.
 Page 151: In the answer to q3, the observed/expected ratio is 2^{(7/2)}=11.3. Thus, these two sequences are 11.3 times more likely to be wellaligned homologs than unaliged sequences.
 Page 151: In the answer to q4d, note that D is always smaller than d because back mutations could make two sequences look more alike.
 Page 152: In the answer to q1, the alignment score of all three alignments is 2. They are all optimal and you should report all three of them.
 Page 156: In the answer to q2b, "specific to your novel fungus" should be "specific to your novel fungus and not found in other genomes".
 Page 157 at the top: In the answer to q2c, "in a blastx search" should be "in a blastx or tblastn search".
 Page 158: In the answer to q8b, "If you use a blastx" should be "If you use a blastx against a protein database".
 Page 159: In the answer to q8d, "we can find homologs in Xenopus frogs with blastn if the number of target sequences is set sufficiently high, but not with megablast".
