Instructions for ProteinProspector Programs
Contents of this document:
Links to topics in the general instructions:
Sequence and Mass
MS-Pattern first finds amino acid sequences in the selected database which match the
regular expression entered, then filters those sequences to eliminate those not
containing one of the specified peptide mass WITHIN the sequence. Hence, not all
of the specified sequence must be contained in the region defined by the mass.
Thus, residues outside of the peptide in question could be specified (unless done when
specifying No enzyme, the cleavage rule may prevent matching in such cases).
In this
mode the sequence should be in CAPITAL LETTERS.
List of Sequences
This list is expected to contain multiple sequences which are similar, as would result
from de novo interpretation of an MS/MS
spectrum with (all sequences would thus have the same
mass). Furthermore, it should be possible to match sequences which are homologous to one
of the sequences in the list if the number of mismatched
AA's is set to a value > 0 (2 is a good 1st choice).
This mode does NOT allow use of non-alphabetic characters from regular
expressions ( [, ], ^, ., .* ). In this mode the sequences should be in CAPITAL LETTERS.
| [EF] | The amino acid is either E or F. |
| [^EF] | The amino acid is anything but E or F. |
| . | Any single amino acid is possible. |
| .* | Used to represent a sequence of one or more unknown amino acids. Note that this is "dot-star" not just "star". This wildcard allows some not entirely obvious features. A match is to the longest sequence fitting the condition (ex: FMQ .*K will find the last K in the sequence following FMQ). In Sequence and Mass mode the sequence is matched first then a mass WITHIN the sequence is found. Hence, not all of the specified sequence must be contained in the region defined by the mass. Thus, residues outside of the peptide in question could be specified (unless done when specifying No enzyme, the cleavage rule may prevent matching in such cases). |
By setting the Max. # of Mismatched AA's parameter to a value other than 0, homologous sequences can be matched. This is done by allowing a number of positions, as determined by this parameter, not to match protein sequences in the database. In future revisions of MS-Pattern this parameter may be replaced by PAM matrices used in sequence homology programs like BLAST. This parameter is active in the following search modes: