R Markdown

This file is prepared to help you with the analysis of the NetMHC output. In the following, we will explain how to perform the analysis for B1801 molecule. You can use exactly the same procedure for the other HLA molecules. We used HIV.fasta file to make the predictions for HLA-B1801 molecule with the following parameters: peptide length =9 save output in XLS format = checked all other parameters are left in default values. At the end of the output page, there is a link where you can download the output file. We saved the file with the name: B1801_NetMHCpan.xls. Let us first read this file:

B1801 <- read.csv("B1801_NetMHCpan.xls", header=TRUE, stringsAsFactors = FALSE, sep="\t", skip = 1)

Let us make a vector containing all proteins names. Protein names are stored in the ID column of my B1801 data frame.

proteins <- c(unique(B1801$ID))

Now let us count number of binders from all proteins for HLA-B1801 molecule. We are using the protein names from the proteins vector, and the column EL_Rank which indicates the ranking of a peptide based on its binding in percentages. We will here use a threshold of 1% to define the binders.

Threshold <- 1
binders_B1801 <- c() 
for (i in proteins){
bind <- sum(B1801$ID == i & B1801$EL_Rank<=Threshold)
binders_B1801 <- c(binders_B1801, bind)
}

To find the expected value of the binders, let us first find the number of peptides in each protein. For all HLA alleles, the expected value of the binders would be the same, i.e. with a threshold of 1%, the expected value of the binders is 1% of
the total number of peptides per protein.

peptides <- c()
for (i in proteins){
  total <- sum(B1801$ID == i)
  peptides <- c(peptides, total)
}
expected <- as.integer(Threshold*peptides/100)

Now let us make a nice barplot with the number of binders we found for B1801 molecule and the number of expected binders.

barplot(matrix(c(binders_B1801, expected), ncol = length(proteins), nrow = 2, byrow =TRUE), beside = TRUE, names.arg = proteins, col=c("red", "black"), args.legend = list(x = "topright", bty = "n", inset=c(-0.08, 0)), legend = c("B1801", "Expected") )

Now you can repeat it for B*5703 and finally make a graph that has both B57 and for B18 the predicted and expected epitopes per protein. This graph will help you to answer/conclude the research question of this exercise.