Description, Instructions, and Tips for


Purpose
This document provides instructions for .

Instructions for ProteinProspector Programs

Contents of this document:

Links to topics in the general instructions:


Introduction

was originally developed for use with those MALDI-PSD spectra which do not contain a sufficient number of peaks to enable de novo sequence interpretation, but were obtained from peptides whose sequence might be present in a database. The program can also be used with data from MS/MS techniques other than MALDI-PSD simply by selecting fragment-ion types consistent with the technique. MS-Tag can also be used with spectra which contain sufficient information content to enable de novo interpretation; however, a homolog of the peptide must be in the database for a search to be successful.


Fragment-Ion types

Checking the boxes next to each ion type allows those ion types to be considered in matching the data to a sequence. Users can either choose to use the default ion types for a particular instrument or a user selected set. Server administrators can modify the default instrument ion types or add new instrument selections. The instrument definition can also include a specification of which amino acids lose water (default S,T,E and D), which lose ammonia (default R,K,N and Q), which are positive charge bearing (default R,H,K and N) and what the maximum internal ion mass is (default 700 Da). Selection of fewer ion types will generally lead to fewer false positives, however ion masses corresponding to ion types not selected or unknown to the program will result in those ion masses NOT being matched. The -SOCH4 and -H3PO4 ion types should only be used in homology mode, and should be selected when the spectrum also contains MH+ - 64 and MH+ - 98 ions respectively. The currently supported ion types are:
Ion typeRestrictions
a, b, y no restrictions
c not if C term residue of the fragment is P
a-NH3, b-NH3, y-NH3 ion contains an amino acid which loses ammonia
b-H2O ion contains an amino acid which loses water
b+H2O ion contains a positive charge bearing amino acid; only bn-1, bn-2 ( length n)
a-H3PO4, b-H3PO4, y-H3PO4 ion contains phosphorylated S,T
b-SOCH4, y-SOCH4 ion contains oxidized M
internal b < maximum internal mass
internal a < maximum internal mass, internal b present
internal b-H2O < maximum internal mass, internal b present, ion contains an amino acid which loses water
internal b-NH3 < maximum internal mass, ion contains contains an amino acid which loses ammonia
N-term ladder removal of N term residues (y equiv.)
C-term ladder removal of C term residues (b+H2O equiv.)


Mass Tolerances

While the intact MW of the protein can be specified, MS/MS data is generally of such high discriminating power that no restriction is necessary. Furthermore, database entries frequently contain pre and pre-pro proteins, as well as gene fragments.
The tolerances on both the parent ion and fragment ions should be set to be consistent with the mass accuracy of the instrument used to generate the data. It is generally a better idea to use units of ppm or % rather than Da. Fragment mass accuracy is often better at lower mass than at higher mass. For PSD spectra this is particularly important. If you have +/- 1 Da tolerance you are more likely to match low mass internal sequences of the wrong nominal mass thus increasing the likelihood of false-positive hits. Hence setting the fragment mass accuracy to +/- 1000 ppm prevents this with an "acceptable" trade-off of allowing looser than appropriate mass accuracy on fragment ions > 1000 Da.


What is a Fragment-ion Tag?

A Fragment-Ion Tag can be obtained from an MS/MS spectrum and consists of 3 attributes:
Fragment-ion Tag
  1. A peptide parent-ion mass: Pm.
  2. Masses of all sequence related fragment-ions from the peptide: Fi, Fj, ... The fragment ions need not all be of the same ion type. Currently supported fragment-ion types are described above.
  3. Masses of composition ions which indicate the presence of particular amino acids in the peptide: Ci, Cj, ... This information can be from immonium and related low mass ions or high mass ions representing side-chain losses from the parent ion.


Search Mode

The search mode option can either be set to identity or to one of the homology/modification tolerant modes. The identity mode is used to look for sequences in the database which are identical to the peptide used to generate the MS/MS spectrum. The homology/modification tolerant modes are described below.


Homology-Tolerant Modes

In order to match one's data to a sequence in the database which is not identical to the peptide used to generate the MS/MS spectrum, MS-Tag must be used in one of the homology/modification tolerant modes. This enables matching for peptides with a mutation, cross-species substitution, sequence polymorphism, modified amino acid, or error in the database. Homology mode works based on 3 concepts:
  1. Allow parent mass to be shifted from the parent mass of sequences in the database.
  2. Consider each ion independently rather than examining relationships between ions.
  3. When a parent ion undergoes fragmentation, at least 2 pieces are formed (a fragment-ion and a neutral). While only the mass of the ionized fragment is measured, the mass of the neutral piece is easily calculated as parent mass - fragment ion mass. If the peptide matches a database sequence exactly, then the masses of BOTH the fragment-ion and the neutral will match. If there is a single sequence difference, then the mass of either the fragment-ion or the neutral will match, but NOT BOTH. If there are two sequence differences, fragmentation at any sites located BETWEEN the two mismatched sites will result in NEITHER the fragment-ion or the neutral matching.
Fragment-ion Tag and Sequence Mismatching

Matching sequences are filtered through a mutation/modification matrix to try and find a single AA substitution which would transform the calculated mass of the database sequence to the experimentally determined mass. The output displays the necessary substitution and the corresponding sequence consistent with the experimental peptide mass data (not the sequence present in the database). The actual matrix which is used depends on which option you have selected from the Search Mode menu. Users who require changed or additional mutation/modification matricies should direct their local ProteinProspector Server administrator to the instructions To Change the Homology/Modified Amino Acid Matrix Definitions. For the World Wide Web version of ProteinProspector please send email to: .

To see matches to sequences that require a parent mass shift that is not consistent with a single substitution among the standard 20 amino acids or modified amino acids that MS-Tag "knows" about, the Hide Star-ions option must be de-selected.


Parent Mass Shift

MS-Tag only considers database sequences with calculated parent masses which pass through a parent mass filter. In identity Mode the filter is determined by the specified parent mass +/- the parent mass tolerance. In one of the Homology Modes this is determined by the specified parent mass and the parent mass shift. You should NOT attempt to accomplish this by using a wider parent mass tolerance. Use a parent mass tolerance consistent with the accuracy to which the parent mass is measured. The default value of +/- 45 Da allows for the largest possible parent mass shift associated with a homologous mutation among the 20 standard amino acids ( G -> W or W -> G would mean a change of 129 Da). All database sequences with a calculated parent mass +/- 45 Da of the specified parent mass would thus be considered. This means an ~90 fold increase in the number of sequences considered, and hence increases the potential for false-positives. The +/= and -/= features allow a user to specify an anticipated parent mass shift value and reduce the number of sequences considered in a search. Ie. suppose data from a peptide expected to be phosphorylated is being used; specifying a parent mass shift of +/= 80 would allow matches to database sequences which exactly match the parent mass or database sequences which would match the specified parent mass if 80 Da were added.


Mutation Matrix OFF

In one of the Homology Modes, to see matches to sequences that can be formed by altering a sequence in a database with a single substitution among the standard 20 amino acids or modified amino acids that MS-Tag "knows" about, the Mutation Matrix OFF option must be de-selected. To see matches to sequences as they originally appear in the database the Mutation Matrix OFF option must be selected and the Hide Star ions option de-selected.


Hide Star-ions

In Homology Mode to see matches to sequences that require a parent mass shift that is not consistent with a single substitution among the standard 20 amino acids or modified amino acids that MS-Tag "knows" about, the Hide Star-ions option must be de-selected. For unknown parent mass shift matches, the output will contain some ion types labeled means the match for that ion is based on the neutral piece formed during fragmentation rather than the protonated piece. The mass of the neutral piece is easily calculated as parent mass - fragment ion mass. Example: suppose an ion is matched and designated as b*3 this means there is some difference between the experimental peptide and the database sequence at residue number 1, 2, or 3 and the parent mass shift between the 2 is not consistent with a substitution to another amino acid that MS-Tag "knows" about, but all residues after position 3 match as is.
Note about Potential Confusion: Selection of Mutation Matrix OFF accompanied by selection of Hide Star Ions effectively puts MS-Tag in Identical Mode.


Modified amino acids

MS-Tag currently accounts for the modified amino acids listed below. With the exception of modified cysteines, modified amino acids are only accounted for in the Homology Modes. Modified cysteines are accounted for in identical mode also.
DesignationModified Amino Acid
Cmodified or unmodified Cys (whichever is selected)
hhomoserine lactone (only for CNBr digests)
mMet sulfoxide
qPyro-glutamic acid (only at N-term of peptide)
sPhospho-Ser
tPhospho-Thr
yPhospho-Tyr


Ranking / Scoring of Results

MS-Tag currently does not have a scoring system. Our general philosophy is to try and match every ion present in an MS/MS spectrum to an ion type consistent with a sequence in a database. Given sufficient data a single sequence will generally be matched. However, the results are sorted so that if multiple sequences are matched, more likely sequences are listed higher in the list. All sequences matching the input data / parameters are sorted on the following basis:
  1. In Homology Mode, matches to database sequences identical to their appearance in the database are listed higher.
  2. In Homology Mode, matches with known modifications are listed higher.
  3. Sequences matching with the least number of unmatched ions are listed higher.
  4. Among equivalent matches the results are sorted in order of increasing parent mass.
  5. Among equivalent matches the results are sorted in order of increasing index number.
Note that the last two sorts do NOT imply a BETTER ranking, even though one match will be listed higher than another, but are merely intended to provide some organization to the listing and aid the user in viewing the results.


De novo Sequence Interpretation

MS-Tag has limited (evolving in future revisions) capacity for de novo MS/MS spectral interpretation. This is done by a brute-force mechanism. The parent ion and composition ion information are used to:1) calculate all possible AA compositions (combinations). 2) generate on-the-fly all sequences (permutations) for each combination. Each permutation is then examined by MS-Tag much like searching a database of all combinatorial possibilities, we call the technique searching the UnKnome. To invoke this mode, simply select Unknome on the database menu.

Note: This method relies on very good parent mass accuracy and immonium ion representation. Furthermore, the exponentially increasing number of permutations limits the usefulness of the approach to peptides < 1300 Da. If the input parent mass and AA composition data results in parameters which are consistent with more than 10,000 AA compositions (combinations) and/or more than 500,000,000 sequences (permutations), MS-Tag will not perform an Unknome search. Instead it will generate an error message indicating the unsuitability of the data.

The figure below illustrates the method and shows how the approach could be followed by homology type search strategies. We have modified MS-Pattern to accept the list of sequences output from MS-Tag and perform a text-based search with a user-specified number of mismatched amino acids. In general, we expect such a linked search technique could match more weakly homologous sequences than the strongly homologous sequences which can be matched by MS-Tag alone in homology mode.
Unknome Strategy Illustrated


Multiply-charged ions

Multiply charged ions are handled in a similar way in all ProteinProspector programs.


Maximum Number of Unmatched Fragment Ions

Composition ions are not considered when calculating the number of unmatched fragment ions.