Description, Instructions, and Tips for
Purpose
This document provides instructions for .
Instructions for ProteinProspector Programs
Contents of this document:
Links to topics in the general instructions:
Introduction
was originally developed for use with
those MALDI-PSD spectra which do not contain a sufficient number of peaks to
enable de novo
sequence interpretation, but were obtained from peptides whose sequence
might be present in a database.
The program can also be used with data from MS/MS techniques other than
MALDI-PSD simply by
selecting fragment-ion types consistent with the technique. MS-Tag can
also be used
with spectra which contain sufficient information content
to enable de novo interpretation; however, a homolog of the
peptide must be in the database for a search to be successful.
Fragment-Ion types
Checking the boxes next to each ion type allows those ion types to be considered
in matching the data to a sequence. Users can either choose to use the default
ion types for a particular instrument or a user selected set. Server administrators
can modify the default instrument ion types
or add new instrument selections. The instrument definition can also include a
specification of which amino acids lose water (default S,T,E and D), which lose
ammonia (default R,K,N and Q), which are positive charge bearing (default R,H,K
and N) and what the maximum internal ion mass is (default 700 Da). Selection of fewer ion types will generally
lead to fewer false positives, however ion masses corresponding to ion types
not selected or unknown to the program will result in those ion masses NOT
being matched. The -SOCH4 and -H3PO4 ion types should only be used in homology mode,
and should be selected when the spectrum also contains MH+ - 64 and MH+ - 98 ions
respectively. The currently supported ion types are:
| Ion type | Restrictions |
| a, b, y | no restrictions |
| c | not if C term residue of the fragment is P |
| a-NH3, b-NH3, y-NH3 | ion contains an
amino acid which loses ammonia |
| b-H2O | ion contains an amino acid which loses water |
| b+H2O | ion contains a positive charge bearing amino acid; only bn-1,
bn-2 ( length n) |
| a-H3PO4, b-H3PO4,
y-H3PO4 | ion contains phosphorylated S,T |
| b-SOCH4, y-SOCH4 | ion contains oxidized
M |
| internal b | < maximum internal mass |
| internal a | < maximum internal mass, internal b present |
| internal b-H2O | < maximum internal mass, internal b present, ion
contains an amino acid which loses water |
| internal b-NH3 | < maximum internal mass, ion contains contains an
amino acid which loses ammonia |
| N-term ladder | removal of N term residues (y equiv.) |
| C-term ladder | removal of C term residues (b+H2O
equiv.) |
Mass Tolerances
While the intact MW of the protein can be specified, MS/MS data is generally of
such high discriminating power that no restriction is necessary. Furthermore, database
entries frequently contain pre and pre-pro proteins, as well as gene fragments.
The tolerances on both the parent ion and fragment ions should be set to be
consistent with the mass accuracy of the instrument used to generate
the data. It is generally a better idea to use units of ppm or % rather than Da.
Fragment mass accuracy is often better at lower mass than at higher mass. For PSD
spectra this is particularly important. If you have +/- 1 Da tolerance
you are more likely to match low mass internal sequences of the wrong nominal mass
thus increasing the likelihood of false-positive hits. Hence setting the
fragment mass accuracy to +/- 1000 ppm prevents this with an "acceptable" trade-off
of allowing looser than appropriate mass accuracy on fragment ions > 1000 Da.
What is a Fragment-ion Tag?
A Fragment-Ion Tag can be obtained from an
MS/MS spectrum and consists of 3 attributes:
|
- A peptide parent-ion mass: Pm.
- Masses of all sequence related fragment-ions from the peptide:
Fi, Fj, ...
The fragment ions need not all be of the same ion type.
Currently supported fragment-ion types
are described above.
- Masses of composition ions which indicate the presence of particular amino
acids in the peptide: Ci, Cj, ...
This information can be from immonium and related low mass
ions or high mass ions representing side-chain losses from the parent ion.
|
Search Mode
The search mode option can either be set to identity or to one of the
homology/modification tolerant modes. The identity mode is used to look for
sequences in the database which are identical to the peptide used to generate the
MS/MS spectrum. The homology/modification tolerant modes are
described below.
Homology-Tolerant Modes
In order to match one's data to a sequence in the database which is not identical
to the peptide used to generate the MS/MS spectrum, MS-Tag must be used in one of
the homology/modification tolerant modes. This enables matching for peptides with a
mutation, cross-species substitution, sequence polymorphism, modified amino acid,
or error in the database. Homology mode works based on 3 concepts:
- Allow parent mass to be shifted from the parent mass of sequences in the database.
- Consider each ion independently rather than examining relationships between ions.
- When a parent ion undergoes fragmentation, at least 2 pieces are formed (a
fragment-ion and a neutral). While only the mass of the ionized fragment is
measured, the mass of the neutral piece is easily calculated as parent mass -
fragment ion mass. If the peptide matches a database sequence exactly, then the
masses of BOTH the fragment-ion and the neutral will match. If there is a
single sequence difference, then the mass of either the fragment-ion or the
neutral will match, but NOT BOTH. If there are two sequence differences,
fragmentation at any sites located BETWEEN the two mismatched sites will
result in NEITHER the fragment-ion or the neutral matching.
Matching sequences are filtered through a mutation/modification matrix
to try and find a single AA substitution which would transform the calculated mass of the
database sequence to the experimentally determined mass. The output
displays the necessary substitution and the corresponding sequence consistent with the experimental
peptide mass data (not the sequence present in the database). The actual matrix which is used depends on
which option you have selected from the Search Mode menu. Users who
require changed or additional mutation/modification matricies should direct
their local ProteinProspector Server administrator to the instructions
To Change the Homology/Modified Amino Acid Matrix Definitions.
For the World Wide Web version of ProteinProspector please send email to:
.
To see matches to sequences that require a parent mass shift that is not consistent
with a single substitution among the standard 20 amino acids or modified amino acids
that MS-Tag "knows" about, the Hide Star-ions option must be
de-selected.
Parent Mass Shift
MS-Tag only considers database sequences with calculated parent masses which pass through
a parent mass filter. In identity Mode the filter is determined by the specified
parent mass +/- the parent mass tolerance. In one of the Homology Modes
this is determined by the specified parent mass and the parent mass shift.
You should NOT attempt to accomplish this by using a wider parent
mass tolerance. Use a parent mass tolerance consistent with the accuracy to which
the parent mass is measured. The default value of +/- 45 Da allows for the largest
possible parent mass shift associated with a homologous mutation among the 20
standard amino acids ( G -> W or W -> G would mean a change of 129 Da). All
database sequences with a calculated parent mass +/- 45 Da of the specified
parent mass would thus be considered. This means an ~90 fold increase in the
number of sequences considered, and hence increases the potential for false-positives.
The +/= and -/= features allow a user to specify an anticipated parent mass shift
value and reduce the number of sequences considered in a search. Ie. suppose
data from a peptide expected to be phosphorylated is being used; specifying a
parent mass shift of +/= 80 would allow matches to database sequences
which exactly match the parent mass or database sequences which would match
the specified parent mass if 80 Da were added.
Mutation Matrix OFF
In one of the Homology Modes, to see matches to sequences that can be
formed by altering a sequence in a database with a single substitution among the standard
20 amino acids or modified amino acids that MS-Tag "knows" about, the Mutation Matrix OFF
option must be de-selected. To see matches to sequences as they originally appear in
the database the Mutation Matrix OFF option must be selected and the Hide Star ions
option de-selected.
Hide Star-ions
In Homology Mode to see matches to sequences
that require a parent mass shift that is not
consistent with a single substitution among the standard 20 amino acids or modified amino
acids that MS-Tag "knows" about, the Hide Star-ions option must be de-selected.
For unknown parent mass shift matches, the output will contain some ion types labeled
means the match for that ion is based on the neutral piece formed during fragmentation
rather than the protonated piece. The mass of the neutral piece is easily calculated
as parent mass - fragment ion mass. Example: suppose an ion is matched and designated
as b*3 this means there is some difference between the experimental peptide and the
database sequence at residue number 1, 2, or 3 and the parent mass shift between
the 2 is not consistent with a substitution to another amino acid that MS-Tag
"knows" about, but all residues after position 3 match
as is.
Note about Potential Confusion: Selection of Mutation Matrix OFF accompanied
by selection of Hide Star Ions effectively puts MS-Tag in Identical Mode.
Modified amino acids
MS-Tag currently accounts for the modified amino acids listed below. With the exception of
modified cysteines, modified amino acids are only accounted
for in the Homology Modes. Modified cysteines
are accounted for in identical mode also.
| Designation | Modified Amino Acid |
| C | modified or unmodified Cys (whichever is selected) |
| h | homoserine lactone (only for CNBr digests) |
| m | Met sulfoxide |
| q | Pyro-glutamic acid (only at N-term of peptide) |
| s | Phospho-Ser |
| t | Phospho-Thr |
| y | Phospho-Tyr |
Ranking / Scoring of Results
MS-Tag currently does not have a scoring system. Our general philosophy is to try
and match every ion present in an MS/MS spectrum to an ion type consistent with
a sequence in a database. Given sufficient data a single sequence will generally
be matched. However, the results are sorted so that if multiple sequences are matched,
more likely sequences are listed higher in the list. All sequences matching the
input data / parameters are sorted on the following basis:
- In Homology Mode, matches to database
sequences identical to their appearance in the database are listed higher.
- In Homology Mode, matches with known
modifications are listed higher.
- Sequences matching with the least number of unmatched ions are listed
higher.
- Among equivalent matches the results are sorted in order of increasing
parent mass.
- Among equivalent matches the results are sorted in order of increasing index
number.
Note that the last two sorts do NOT imply a BETTER ranking,
even though one match will be listed higher than another, but are merely intended to
provide some organization to the listing and aid the user in viewing the results.
De novo Sequence Interpretation
MS-Tag has limited (evolving in future revisions) capacity for de novo
MS/MS spectral interpretation. This is done by a brute-force mechanism. The parent
ion and composition ion information are used to:1) calculate all possible
AA compositions (combinations). 2) generate on-the-fly all sequences
(permutations) for each combination. Each permutation is then examined by MS-Tag
much like searching a database of all combinatorial possibilities, we call
the technique searching the UnKnome. To invoke this mode, simply
select Unknome on the database menu.
Note: This method relies on very good parent mass accuracy and immonium
ion representation. Furthermore, the exponentially increasing number of permutations
limits the usefulness of the approach to peptides < 1300 Da. If the input parent mass
and AA composition data results
in parameters which are consistent with more than 10,000 AA compositions (combinations)
and/or more than 500,000,000 sequences (permutations), MS-Tag will not perform an Unknome
search. Instead it will generate an error message indicating the unsuitability
of the data.
The figure below illustrates the method and shows how the approach could be followed
by homology type search strategies. We have modified
MS-Pattern to accept the
list of sequences output from MS-Tag and perform a text-based search with a
user-specified number of mismatched amino acids.
In general, we expect such a linked search technique could match more weakly
homologous sequences than the strongly homologous sequences which can be matched
by MS-Tag alone in homology mode.
Multiply-charged ions
Multiply charged ions are handled in a similar way
in all ProteinProspector programs.
Maximum Number of Unmatched Fragment Ions
Composition ions are not considered when calculating the number of unmatched
fragment ions.