Protein Prospector FAQ

Here is a list of questions asked about Protein Prospector. If you have any submissions/corrections please send them to Prospector email .

Licensing Issues

Can I get a licensed copy of Protein Prospector?
Which computer platforms can I run a licensed copy of Protein Prospector on?

Questions About Using Protein Prospector

What database should I use?
I know my protein is from mouse. Should I do a species restricted search of only mouse proteins?
What is a good discriminant score?
My searches keep 'timing out'. How can I prevent this?
Can I search MALDI-TOFTOF data where there is a different sample on each spot without combining the results?
If there are any phosphorylated peptides in my samples I would like to know. Why shouldn't I always allow for phosphorylation in my searches?
Can I search more than one LC-MS/MS run at once?
How do I get MS-Fit to run with no Possible modifications?
Could you define the MS-Digest Index Number?
Is there a reference for a paper that describes MS-Fit?
What should I be aware of when using the dbEST database?
What is the amino acid U?
Do you have a reference for peeling (b+H₂O) ions?
What does the -28 refer to in the MS-Product output?
On the MS-Product form what is an m-ion?
Is the enzyme Tyr-C available for purchase?
The Biemann nomenclature gives a mass difference of 17 Da between y and z ions. Why is Protein Prospector giving me a 16Da difference.?
What is a good MOWSE score.?
What does the Search Compare message "A problem has been encountered setting a coverage map." mean?

Can I get a licensed copy of Protein Prospector?

Licensed versions of Protein Prospector binaries and source code are freely available. Please contact Prospector email for more information.

Which computer platforms can I run a licensed copy of Protein Prospector on?

The package has been tested on most versions of Windows up to Windows 7. It may also run on Windows 8 but this has not yet been tested. The package runs on most versions of Linux, although installation is manual. It can also be run on a compute cluster (even a mixed Windows and Linux system), where it can utilize multiprocessor searching to accelerate analysis.

What database should I use?

Database search engines assume that the database you search against contains all the possible sequences that can be found in your sample. Hence, if you use a smaller database you will get more confident results. On the other hand, if this database does not contain all the sequences that can be found in your sample then you will miss results and can get some false positive results that would not be obtained if you searched the larger database. UniProt is a safe choice of database, but if you are confident your sample proteins are in SwissProt, then this search will be quicker and can be equally reliable.

I know my protein is from mouse. Should I do a species restricted search of only mouse proteins?

You should not restrict your database search such that some correct answers can be excluded. Thus, even though your sample may be from mouse, it is advisable to allow for the fact that 'our old friend' keratin may have been introduced during sample handling. Thus 'Human Mouse' or 'Mammals' would be sensible options.

What is a good discriminant score?

A good value for a discriminant score is dependent on the search. It can be determined by looking at the histogram of results at the top of the Results report. Determine a value at which you think the distribution of incorrect answers finishes at and use this as a minimum threshold. If you go back to Search Display you can filter your report to only report peptides with a discriminant score above this threshold.

My searches keep 'timing out'. How can I prevent this?

We have set a timeout on an individual search to try to regulate the load on the server so that users do not have to wait too long to get their results if other people are using the server. The two major parameters that affect the search duration are the number of spectra being searched and the size of the database being searched against. If you are searching against the whole Uniprot database, try restricting the search to only 'mammalian' entries (if appropriate) or search SwissProt. Another cause of making a search slow is if you are looking for many modifications (especially phosphorylation). If this is the problem, try initially searching without looking for the modifications, then click on the 'Batch Tag of Listed Accession Numbers' at the top of a Protein report. This will allow you to only search those proteins you identified in this first search. This second search should be dramatically quicker, even if you are looking for many modifications in this search.

Can I search MALDI-TOFTOF data where there is a different sample on each spot without combining the results?

Yes. On the 4700 create the peak list using 'Peaks to Mascot' using the 'Protein-Peptide Spotset' option. This will create one file of MS/MS spectra for all spots. Search the data in Batch-Tag, then in Search Display tick the 'Multi Sample' option and it will separate the results by spot.

If there are any phosphorylated peptides in my samples I would like to know. Why shouldn't I always allow for phosphorylation in my searches?

Searching allowing for variable modifications increases the size of the database in silico making it more difficult to get confident answers. Modifications that only occur on the protein N-terminus such as protein N-terminal acetylation, or modifications that occur to fairly rare amino acids such as methionine oxidation, do not have a big impact, but searching for phosphorylation of serines and threonines, two common amino acids drastically increases the number of sequences being searched, will slow down your search and will give you less confident results.

Can I search more than one LC-MS/MS run at once?

Yes. See the user manual for details on how to do this.

How do I get MS-Fit to run with no Possible modifications?

It's different depending on your system, Mac, PC etc. On my PC if I hold down the control key and click then I can de-select any individual choice. Some variation of this like the shift key or the apple key should work on a Mac.

Could you define the MS-Digest Index Number?

The index number actually represents the numerical location of the protein entry in the FASTA formatted sequence database file. i.e. index number 82367 is the 82367th protein in the file counting from the beginning of the file. We call it the MS-Digest index number because throughout the Protein Prospector package you can click on the number and be linked to the MS-Digest program for that protein. Note that with every revision of the database this number can change, however the accession numbers are constant. Hence, the index number itself is only meaningful for operations within the Protein Prospector programs.

Is there a reference for a paper that describes MS-Fit?

We don't yet have a paper out which deals with MS-Fit. If the reason for your request is that you are writing something and want to cite MS-Fit, the preference is that you do something like:

Baker, P.R. and Clauser, K.R. http://prospector.ucsf.edu.

Some people place it in the text, others in the reference list. It doesn't matter much to me, at the minimum giving the web URL in a methods section helps others the most. Please don't list anything in the URL after .ucsf.edu as that will change over time.

What should I be aware of when using the dbEST database?

1) Peptide mass fingerprinting in principle shouldn't typically be very effective with ESTs because you have digested a whole protein and EST's are only a few hundred basepairs, often on the order of 100-200. So you can only match a portion of your protein.

2) For the practical reason above the dbEST search feature was put into Protein Prospector mainly for searches using MS/MS spectra and sequence data; MS-Tag, MS-Pattern.

3) When using the web server rather than Protein Prospector on the web, you are competing for CPU time against people all over the world. So for long dbEST searches it sometimes helps to break it into 2 searches. 1 with 3 frame translation, 1 with -3.

What is the amino acid U?

Selenocysteine is the only rare amino acid that is genetically coded and it has been referred to as the 21st amino acid. It corresponds to UGA, which usually encodes a stop sequence. In prokaryotes the sel operon together with a stem-loop feature in the mRNA provide a signal that makes UGA encode SeCys rather than stopping translation. Eukaryotes don't have the sel operon but they do have the stem-loop feature known as the SEC insertion sequence (SECIS). Coding for SeCys is rare as it is toxic to cells in high doses, but there are specific selenoproteins, including thioredoxin reductase and glutathione peroxidase. HIV patients often have low selenium levels and need Se supplements to maintain the levels of these redox proteins.

It is anticipated that users will find peptides that contain SeCys. We are not certain what happens when these are alkylated but alkylation occurs in vivo and is responsible for inactivating thioredoxin reductase. We suspect SeCys would undergo the same reactions as the thiol variety.

The elemental composition of selenocysteine is C3 H5 N O Se. Unfortunately, selenium has a rather complicated isotopic composition where 50% is 80Da, 25% is 78Da and isotopes of 76, 77 and 82 are all around 8% abundance. A peptide peak containing selenocyteine could easily be interpreted as overlapping peaks.

Do you have a reference for peeling (b+H₂O) ions?

The first reference for this ion type is: Thorne, G.C. and Gaskell, S.J. Rapid Communications in Mass Spectrometry (1989) 3 7 217-221

What does the -28 refer to in the MS-Product output?

The -28 does refers to a difference of CO. The most commonly observed internal ions are of the 'b' ion type, but it is also possible to form an 'a' ion type internal ion through cleavage N-terminal to the carboxyl group.

On the MS-Product form what is an m-ion?

An m-ion is an amino acid side chain loss from the molecular ion as observed in high energy CID spectra. These are listed in Table 2 in:

Medzihradsky K. F. and Burlingame A. L., The Advantages and Versatility of a High Energy Collision-Induced Dissociation-Based Based Strategy for the Sequence and Structural Determination of Proteins, A Companion to Methods in Enzymology, Vol. 6, Pp. 284-303 (1994) (click here)

Is the enzyme Tyr-C available for purchase?

No this is just a "theoretical" enzyme that somebody asked for in the past.

The Biemann nomenclature gives a mass difference of 17 Da between y and z ions. Why is Protein Prospector giving me a 16Da difference.?

You are correct that the mass difference between a y and z ion according to Biemann nomenclature should be 17Da. The confusion is caused by the fact that people describe ions as z ions, when they are not. Fragmentation mechanisms such as ECD and ETD do not form z ions, contrary to what you might think upon reading the literature. The predominant C-terminal fragment ion type formed is actually a z+1 ion (which differs from a y ion by 16 Da). Some people refer to this as a z^• ion to indicate it is a free radical ion (which the z+1 ion is, but the z ion is not). To add to the confusion, it is also possible to form an ion where the z^• ion abstracts a hydrogen to form what is generally referred to as a z+1, but according to Biemann nomenclature it would be a z+2 ion.

So, Protein Prospector is reporting the mass of the ion you expect when doing ECD or ETD, which is commonly referred to as a z ion, although it is technically a z+1 ion.

What is a good MOWSE score.?

There is no threshold for a reliable MOWSE score.

The MOWSE score is described in the paper:

Pappin et al, Current Biology, 1993, Vol 3, No 6, pp 327-332

It is not a statistical score in that a particular value means that the answer is very likely to be correct. It is mainly for putting proteins in order of whether they are likely to be correct. Thus what you are trying to do is separate real matches from random matches.

Here's some general advise:

1). Try to figure out what the accuracy of the results is and use an appropriate mass tolerance. Generally you should be looking to internally calibrate your data and use a mass tolerance of 10ppm if your instrument allows this. Prospector also has a systematic error parameter that can be used correct for calibration problems.

2). Use a taxonomy filter if possible but try to search at least say 10000 proteins.

3). Don't set too many potential modifications unless you know what you are doing.

4). Try to get MSMS to check your results.

5). Watch out for hits from very large proteins. These should at least have a large number of peptides.

6). Check Protein MW/pI if you know what these are but bear in mind you may have a piece of a protein (in which case the coverage map should reflect this).

7). If the answer is keratin it is almost certainly correct.

8). You could try searching against a random (not reverse) database. Any random hits will have negative accession numbers and are definitely incorrect. However if you do this you will be doubling the number of proteins you are searching against.

9). Make sure as much as possible that the peaks you are submitting in the peak list are real peptide peaks.

What does the Search Compare message "A problem has been encountered setting a coverage map." mean?

When saving the results of the database search Protein Prospector stores the start amino acid number of the peptide within the protein. One thing these are used for is to display protein coverage maps. There is a problem with this approach if the database protein sequence has changed between doing the search and looking at the results. This particular error is encountered when the start amino acid number is greater than the protein length. You can generally look at the history of the protein sequences associated with a given accession number on the database web site. One reason that the database entry could be shorter would be if the signal peptide had been removed.

If you are worried about these errors it is best to repeat the search. If the signal peptide has been removed from the database sequence you may now detect the protein N-terminus peptide if it previously wouldn't have been generated by the enzymatic cleavage rules.

Please give feedback, by sending e-mail to Prospector email