Online Supporting Information S1.The benchmark dataset S includes 9,552 locative protein sequences (5,048 different proteins), classified into 20 animal subcellular locations. Among the 5,048 different protein sequences, 2,284 belong to one location, 1,740 to two locations, 510 to three locations, and 368 to four locations, 111 to five locations, 20 to six locations, 9 to seven locations, and 6 to 8 locations. Both the accession numbers and sequences are given. Except for the subsets of "arosome", "centriole", "cell cortex" and "melanosome", none of the proteins has more than 25% sequence identity to any other in a same subset (subcellular location). See Table 1 and the relevant text of the paper for further explanation. Click download to get the benchmark dataset S
