Description, Instructions, and Tips for MS-Fit
Purpose
This document provides instructions for MS-Fit.
Instructions for ProteinProspector Programs
Contents of this document:
Links to topics in the general instructions:
Introduction and Background
MS-Fit was the first program Karl Clauser and Peter Baker
developed together. The name stems from the program's expected usage: correlating
Mass Spectrometry data (parent masses only, not fragment masses) with
a protein in a sequence database which best Fits the data. Note that the word
fit was chosen and NOT the word identify. In the spring of 1995, when the name was
selected, the typical peptide mass fingerprinting experiment preceding use of MS-Fit
was to digest a protein with an enzyme,
then perform MALDI mass spectrometry on the resulting mixture of peptides to
determine the masses of each peptide. At that time the state of the art mass
accuracy using MALDI on a continuous-extraction, reflector time-of-flight instrument
was +/- 0.5 Da. This mass accuracy level was poor in comparison to the standard
of +/- 10 ppm established with magnetic sector instruments several decades earlier.
Thus in our opinion MS-Fit could in favorable cases (where both
species and approximate intact protein molecular weight were known) merely
suggest protein identity. To establish protein identity one
needed, in our opinion, some sequence support. This support could be obtained
from the combined use of MS/MS and our subsequently developed program
.
The development of delayed extraction MALDI in 1996 has tremendously improved the accuracy
of mass measurement on reflector MALDI-TOF instruments. Mass accuracy in the range of
5-100 ppm is now possible. The low end of this range (best mass accuracy) is accessible
with internal calibration and long flight tubes, while the high end of the range
is accessible with external calibration and short flight tubes.
Consequently, proteins can now be confidently identified by peptide mass fingerprinting
using masses alone with MS-Fit. Identification certainty is primarily a function of the
level of mass accuracy.
The mass tolerances should be set to be consistent with the mass accuracy of the
instrument used to generate the data. It is generally a better idea to use units
of ppm or % rather than Da, as mass spectrometers typically have an error associated
with mass measurement that is mass dependent and thus cannot be uniformly
expressed in Da.
Measuring masses as accurately as possible is the single most important thing one
can do to achieve the highest certainty of protein identification in a peptide mass
fingerprinting experiment.
Selecting any search mode except identity puts MS-Fit into homology
mode by invoking the MS-Tag mutation matrix routine.
In this mode the MS-Fit routines for possible modifications are bypassed. Instead
the set of modified AA's allowed in MS-Tag Homology mode
is used.
In practice, homology mode should only be used when one or more of the
following conditions applies:
peptide mass data has excellent mass accuracy (+/- 10 ppm or better)
a narrow intact protein MW filter is used
the Hits will be saved and searched via MS-Tag
MS-Fit matches a database sequence with a calculated peptide mass which pass through
one of the peptide mass filters. Normally the filters are determined by the
user-supplied peptide masses +/- the peptide mass tolerance (standard filter). In
Homology Mode these filters are re-configured to the user-supplied peptide masses +/- the
peptide mass shift (see section of MS-Tag manual on
parent mass shift for full details).
However, a particular protein entry in the database is not subjected to these
widened homology filters unless a preliminary cut-off number of user-supplied
peptide masses first match in a standard-filter search. This preliminary cut-off
is controlled by the parameter: Min. # matches with NO AA substitutions.
Database sequences passing the homology widened peptide mass filter are then passed
through a mutation matrix to try and find a single AA
substitution which would transform the calculated mass of the database sequence to the
experimentally determined mass. The output displays the necessary substitution and the
corresponding sequence consistent with the experimental peptide mass data
(not the sequence present in the database).
Minimum Number of Peptides Required to Match
In order for a particular protein in the database to generate a hit it must match at least
Minimum Number of Peptides Required to Match masses from the input data.
Ranking / Scoring of Results
The MOWSE score reported by MS-Fit is based on the scoring system described in
Pappin et al, Current Biology, 1993, Vol 3, No 6, pp 327-. As MS-Fit offers
several options not available in the initial version of MOWSE several modifications
have had to be made.
After the species and molecular weight pre-searches the remaining proteins undergo
theoretical digestion. The resulting peptides are then placed in bins based on their
molecular weight and the intact molecular weight of undigested protein they originated
from. There are eleven intact molecular weight bins. Under 100000 Da there are 10
bins of width 10000 Da. The other bin contains all the proteins over 100000 Da
There are thirty peptide molecular weight bins of width 100 amu between 0-3000 Da
Peptides above 3000 Da are not binned. Peptides with no missed cleavages contribute
1.0 to the bin total whereas peptides containing missed cleavages contribute pfactor
(a user supplied parameter).
Bin frequency values are then calculated by dividing the bin totals by the sum of the
bin totals for each 10000 Da protein interval. The bin frequency values are then
normalised to the largest bin frequency value to yield frequency values between 0 and 1.
Masses in the theoretical digestion which match masses in the data set are divided into
scoring matches and non-scoring matches. Scoring matches include unmodified peptides and
acrylamide modified Cys and N-terminal Gln to pyroGlu and oxidation of Met in the presence
of the unmodified peptide. Non-scoring matches include pyroGlu and oxidation of Met in
the absence of the unmodified peptide, acetylated N-termini, phosphorylation of S, T
and Y and single amino acid substitutions. Unmatched masses are ignored. The score for
each matching mass is assigned as the appropriate normalised distribution frequency value.
In the case of multiple matching masses the scores are multiplied together. The final
product score is inverted and normalised to an average protein molecular weight
of 50 kD.
If scoring is not selected MS-Fit uses a simple ranking system. The results are sorted so that if
multiple database entries are matched, more likely sequences are listed higher in the
list. All database entries matching the input data and parameters are ranked on the
following basis:
- Database entries with the least number of unmatched masses are ranked
higher.
- Among equivalent matches (those with the same rank) the results are sorted
in order of increasing index number.
Note that the last sort does NOT imply a BETTER ranking,
even though one match will be listed higher than another, but is merely intended to
provide some organization to the listing.
Multiply-charged ions
Multiply charged ions are handled in a similar way
in all ProteinProspector programs.
Monoisotopic/Average Flags
Monoisotopic/Average Flags can now be set in a column to the right of the mass/charge
column. You must first set Peptide masses are: to monoisotopic
and then enter a column of 0's and 1's here to state whether an m/z value is
monoisotopic (0) or average (1). There must be the same number of 0/1's as there
are m/z values. If the column is left blank then all the values are assumed
to be monoisotopic as before.
This option is currently only available to licensees. The appropriate items on the MS-Fit
HTML input page are normally commented out.
Searching for Mixtures
At the end of each hit in the MS-Fit detailed report there is a link which allows
you to do a subsequent search just using the unmatched masses. Subsequent
searches use the same ratio of masses submitted to masses required to match
as was used in the original search.
Looking for Peptides with Non-Specific Cleavages
At the end of each hit in the MS-Fit detailed report there is a list of
unmatched masses. If you click on one of these masses you can see if the mass
matches any peptides in the protein that was hit. The usual enzyme cleavage rules
are not considered.
Hit Statistics
The percent TIC, mean error, data tolerance and mean number of missed cleavages are
printed after an MS-Fit hit. If intensities aren't specified then the percent TIC
value will be the same as the percent masses matched. The mean error is useful for
diagnosing systematic errors in the results - indicating a calibration problem. The
data tolerance is twice the standard deviation of the results and is the number that
should be used as a tolerance parameter in the absence of systematic errors. This
number is more reliable if there are a reasonable number of matching peaks (say 10).
Also the number is only valid if all the matched peptides are real hits.
Contaminant Masses
A list of singly charged contaminant masses can be entered. Data peaks which are within
the tolerance of the contaminant masses will be deleted from the data set before the search
takes place. All charge states are considered.