Instructions for General Features
Common to Multiple Protein Prospector Programs

Purpose

This document provides instructions for features found across more than one program in the Protein Prospector package.


Contents of this document:

Search Times

Search times vary from a few seconds to a few minutes depending on the computer hardware Protein Prospector is running on, the size of the database being searched, the restrictiveness of the search parameters and the number of searches being simultaneously performed.

When two or more searches are being performed simultaneously the searches slow noticeably. In general faster searches result with more discriminating search parameters: single species, narrow intact protein MW range, 0 missed cleavages. For DNA database searches set the intact protein MW filter to All.


Stopping / Cancelling a Search

Most of the Protein Prospector searches can be stopped by pressing an abort search button.


Output Type

This can currently be set to either HTML or XML output. If you want to save the results to file set this to HTML.


Saving Hits from one Protein Prospector program, searching them with another

One Protein Prospector search program can serve as a pre-filter for another search program. To accomplish this the Hits (index numbers for matching database entries) from the first program are saved to a user specified file. This file is then retrieved by the second program, and only those matching database entries are searched by the second program.

The following programs can both save hits and search saved hits:

  • MS-Fit
  • MS-Tag
  • MS-Seq
  • MS-Pattern
  • MS-Homology
  • DB-Stat

You can also use the save hits to file option to create disk files of the HTML and XML outputs.


Protein Prospector programs search sequence databases which are located locally on the server running the programs. The actual files searched are FASTA formatted copies of the source database which contain minimal annotation. Search output typically contains a web-link to a fully annotated version of the source database for each entry matched.

Protein Prospector programs currently allow searching of the publicly available Genome and Proteome databases listed below. However, nearly any sequence database in a suitable FASTA format can be set up for use by contacting the administrator of a Protein Prospector server.

Protein Databases

  • NCBInr: Current README file
    A non-redundant database compiled by NCBI by combining most of the public domain databases (EST's not included).
  • Genpept: Current Release Notes
    Protein translation of Genbank (EST's not included).
  • Swiss Prot
    A curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc), a minimal level of redundancy and high level of integration with other databases
  • UniProtKB
    Merger of the information contained in Swiss-Prot, TrEMBL and PIR to produce a comprehensive database. All entries are highly annotated, some manually (Swiss-Prot and PIR) whilst other in an automated fashion using sequence similarity to previously annotated proteins (TrEMBL).
  • IPI
    Single species databases from species whose genome has been sequenced. Content includes protein entries in the UniProtKB database, plus predicted protein sequences from Ensembl and RefSeq.
  • Owl
    OWL is a non-redundant composite of 4 publicly available primary sources: SWISS-PROT, PIR (1-3), GenBank (translation) and NRL-3D. SWISS-PROT is the highest priority source, all others being compared against it to eliminate identical and trivially different sequences. OWL has not been maintained since 1999. However it is still available and Protein Prospector can search it.
  • Ludwignr
    The Ludwignr database is a non-redundant database made up from a number of another databases. The component databases can be downloaded individually or combined together by concatenation. The component databases are searched in the following order and duplicates are eliminated: Swiss-Prot, Trembl, Trembl-New, Genpept-updates, Genpept, Yeastpep, Wormpep. It is updated weekly.
    Also included are varsplic databases, Swiss-Prot Varsplic and Trembl Varsplic, which are a collection of isomeric proteins not really recorded in any of the other databases except for the SwissProt database in the (FT) features section. This new protein set, even though it is constructed from the original entry, can end up as a vastly different sequence with changes to amino acids, subtractions of segments and even addition of large segments of the sequence anywhere in the chain.

DNA Databases

Reasons to search particular databases:

  • UniProtKB
    Almost as comprehensive as NCBInr, it is well annotated and has significantly less protein redundancy than NCBInr.
  • NCBInr
    Largest protein database and updated most frequently.
  • Swiss Prot
    Smallest and best annotated.
  • dbEST
    No matches in protein databases, so gene for your protein may not yet be cloned. Perhaps an EST is known which contains part of your protein. Search times will typically be longer because of multi-frame translation combined with the fact that the dbEST file is > 3x larger than the NCBInr file.

Reasons NOT to search particular databases:

  • Owl
    No longer maintained.

The local copy of the database being searched with the programs is subject to updating by the administrator of a Protein Prospector server.


If you don't know the Latin taxonomic name for the species you're interested in try: NCBI Taxonomy Browser

Species limited searches in Protein Prospector programs are performed by means of preliminary filtering of a database according to the user designated species or collection of species. This species pre-filter is bypassed when the species is designated as All.

This species pre-filtering is imperfect because of the poor usage of taxonomy (standard species naming conventions) in the databases, AND the poorly standardized location of this information in the FASTA database formats used by Protein Prospector programs.

Users who desire additional/changed species filtering capability should direct their local Protein Prospector Server administrator to the instructions To Add/Change Species Filter. For the World Wide Web version of Protein Prospector please send email to: .

Species pre-filtering is implemented in Protein Prospector programs by correlating the user selected species name in the HTML form with the variety of pseudonyms for a particular species in the databases through behind the scenes access to a species alias list for all the databases used. This alias list is located on each Protein Prospector server in the directory.

Below is a list of the variety of pseudonyms for Mouse.

NCBInr dbEST Genpept Owl SwissProt

MOUSE
MUS MUSCULUS
MUS SP.
M. MUSCULUS
M.MUSCULUS
MOUSE
MUS DOMESTICUS
MUS MUSCULUS

MUS MUSCULUS

MOUSE
MUS MUSCULUS
MUS MUSCULUS (MOUSE)

MOUSE

Server Administrators can edit this alias list without requiring access to Protein Prospector source code. Note that while this mechanism of pseudonym correlation is a hassle it also allows for significant flexibility. For example an alias can be created that includes a collection of species i.e. mammals, eukaroytes, prokaryotes etc. Server administrators who create such alias collections are encouraged to send the modified parameter files to for inclusion in subsequent Protein Prospector releases.


Checking species remove removes the selected species from the search.


FASTA format databases generally contain a field in the comment line for the species which the sequence was obtained from. These vary from database to database and often entries in the same database use different text strings to denote the same species.

This option allows you to enter a list of these text strings as a pre-search filter.


Intact protein MW limited searches in Protein Prospector programs are performed by means of preliminary filtering of a database according to the user designated intact protein MW. This pre-filter is bypassed when the MW range checkbox All is checked.

The intact protein MW pre-filtering is imperfect because sequences in protein databases often exist in pre, pro, and fragment forms. Sequences in DNA databases often exist as fragments (EST's) or as cDNA's.

Protein Prospector programs ALWAYS calculate the intact protein MW, according to the following constraints.

  1. Treat protein as uncharged.
  2. Use average mass scale.
  3. Treat amino acid C as unmodified.
  4. Treat amino acid X as leucine.
  5. Treat amino acid B as glutamic acid.
  6. Treat amino acid Z as glutamine.
  7. Treat amino acid U as Selenocysteine.
  8. Ignore amino acids J, O.

The molecular weight pre-search option is not available for DNA databases.


Intact protein pI limited searches in Protein Prospector programs are performed by means of preliminary filtering of a database according to the user designated intact protein pI. This pre-filter is bypassed when the pI range checkbox All is checked.

The intact protein pI pre-filtering is imperfect because sequences in protein databases often exist in pre, pro, and fragment forms. Sequences in DNA databases often exist as fragments (EST's) or as cDNA's.

Protein Prospector programs ALWAYS calculate the intact protein pI, according to the following constraints.

  1. Treat amino acid C as unmodified.
  2. Treat amino acid X as leucine.
  3. Treat amino acid B as glutamic acid.
  4. Treat amino acid Z as glutamine.
  5. Treat amino acid U as selenocysteine. However the correct coefficients aren't currently known.
  6. Ignore amino acids J, O, U.

The pI pre-search option is not available for DNA databases.

The pK values used to calculate the pI values can be modified by Protein Prospector server administrators. You must remake the database index files using FA-Index if you change the pK values.


The name pre-filter examines the name field of each database entry's comment line for one or more case-insensitive regular expressions.

Name pre-filtering is not performed if the field is left blank.


The accession number pre-filter can be used to specify a list of database entries on which to perform the search.

If accession number pre-filtering is in use then other pre-filters, such as name or intact molecular weight filtering, are disabled.


Sometimes you may know that your sample contains proteins from a particular species but that other contaminant proteins such as keratin are also present.

This option allows you to enter a list of the accession numbers of the contaminant proteins and so include them in your search.


For a protein database you should enter a list of accession numbers or index numbers (one per line).

For example:

P38706
P05756
P48589
P26782
P04456
P02303

For a DNA database you should enter a list of:

(Accession Number/Index Number) (DNA Reading Frame) (Open Reading Frame)

For example:

113988 6 3
113989 4 1

DNA reading frames and open reading frames are described in the Frame Translation in DNA databases section.


DNA databases can NOT be searched with mass spectrometry data from DNA samples. Protein Prospector programs perform translation of DNA sequences to protein sequences.

Frames 1, 2, and 3, represent translation of the database sequence from left to right beginning in positions 1, 2, or 3 respectively. Frames 4, 5, 6 represent translation of the complement of the database sequence from right to left beginning in positions 1, 2, or 3 respectively.

Frame translation in Protein Prospector programs can be designated in 1, -1, 3, -3 or 6 frame translation modes. Frame mode 1 considers only frame 1 described above whereas frame mode -1 considers only frame 4. Frame mode 3 considers only frames 1, 2 and 3 whereas frame mode -3 considers only frames 4, 5 and 6. Frame mode 6 considers all 6 frames. A user should select frame mode 6 unless he/she knows that the database being searched contains sequences exclusively cloned in one direction or contains known genes with sequences already in frame.

Since the capability of searching DNA databases was intended to use EST databases, translation initiation does not require a start codon. If a stop codon is encountered the polypeptide is terminated. Translation is then reinitialized and continued with the following codon, thus beginning a new open-reading frame. MS-Fit requires all matches to a particular database entry to belong not only to the same translational frame, but also to the same open reading frame. Users who feel any of these procedures are inappropriate or inadequate, are urged to contact . Implementation of these procedures was done with significant uncertainty as to optimal strategy.


Enzyme specificity / Missed cleavages

The termini of the matched peptides can be set to be consistent with the cleavage specificity of the enzyme used to generate the peptide. By selecting No enzyme (not available in MS-Fit, MS-Digest or MS-Bridge) the matched peptides have no constraint on their termini. Increasing the number of maximum number of missed cleavages allowed enables matching to sequences with uncleaved sites internal to the peptide.

The option for the non-existent enzyme Slymotrypsin was created as a means for allowing Chymotryptic cleavages in Trypsin digests. When using this choice it is important to increase the missed cleavages allowed. Increasing to 9 will result in only a marginal increase in the search time.

Protein Prospector server administrators can edit the existing enzyme cleavage rules or add new ones.

It is possible to combine the rules for two or more enzymes by adding options to the Enzyme item on the HTML form. N-terminal cleavage rules can thus be mixed with C-terminal ones.


The Non-Specific options can be used to relax the enzyme cleavage rules at one or more of the peptide termini:

at 0 termini - The normal cleavage rules are followed;

at 1 termini - The cleavage rules are relaxed at the either the N or C terminus but not both at the same time;

at 2 termini - This is similar to No enzyme except that the missed cleavage option is considered;

at N termini - The cleavage rules are relaxed at the N terminus;

at C termini - The cleavage rules are relaxed at the C terminus.

N termini-1=D - Unlike the other options above, the selected enzyme specificity is ignored at the N-terminus and instead is fixed to be a cleavage after D; this is used for work with caspase or similar substrates. This was necessary in order to implement an enzyme with different specificities at the N- and C-termini.


The End Terminus parameter selects end terminal processing of the digest fragments. The stripping terminus parameter is used to specify which terminus the amino acids are stripped from. The stripping range specifies the range of the number of amino acids which are cleaved off.


This option allows you to specify which amino acids are in all the peptides which are reported. If you select AND then all the specified amino acids have to be present. If you select OR then just one of the selected amino acids has to be present.


To use MS-Digest/MS-Bridge/MS-NonSpecific to operate on a user supplied sequence:

1). select User Protein as the Database option;

2). paste or type the sequence in the User Protein Sequence box;

    Tabs, returns, and spaces are ignored.

    USE CAPITAL LETTERS for the amino acids. The following lower case letters can be used:
    s,t,y - Phosphorylated S,T,Y
    u,v,w,x - user specified amino acids

    Use U for selenocysteine.

    Do NOT use the letters B, J, O, X, or Z.

    Do NOT use 3-letter code amino acid symbols.

    Do NOT use a "*" character anywhere in the sequence.

3). set the other parameters as appropriate.

4). press the Perform Digest button.

You can specify more than 1 user protein by separating the sequences by a > character. The > character must be on a separate line.

For example:

MPPKRAALIQNLRDSYTETSSFAVIEEWAAGTLQEIEGIAKAAAEAHGTIRNSTYGRAQAEKSPEQLL
GVLQRYQDLCHNVYCQAETIRTVIAIRIPEHKEEDNLGVAVQHAVLKIIDELEIKTLGSGEKSGSGGA
PTPIGMYALREYLSARSTVEDKLLGSVDAESGKTKGGSQSPSLLLELRQIDADFMLKVELATTHLSTM
VRAVINAYLLNWKKLIQPRTGTDHMVS
>
RVCMGKSQHHSFPCISDRLCSNECVKEEGGWTAGYCHLRYCRCQKAC

If multiple proteins are entered the Separate Proteins option can be used either to indicate whether or not you want a separate section in the report for each protein.


The links in program output are intended to easily facilitate user access to obvious sources of additional information about proteins or peptides matched or under study. Some of the default parameters of these links can be changed by Protein Prospector server administrators.

change the default parameters in the HTML links from the accession number
change the default parameters in the HTML links from the MS-Digest index number


The outputs from Protein Prospector programs usually contains links to other Protein Prospector programs and Internet pages (general features of links from program output). You can disable these links by checking the Hide HTML Links option. This will have the effect of considerably reducing the size of the output report and hence the network traffic.


The database accession number in the search results has an HTML link to retrieve the complete entry including comments from a remote database. In order for this link to be created the programs need to know the URL for the remote database. Users who desire links to different fully annotated databases, or who find links to a particular database to be defective should contact their local Protein Prospector server administrator. For the World Wide Web version of Protein Prospector please send email to: .

Server Administrators can change the default address of links from accession numbers in program output without requiring access to Protein Prospector source code. Those administrators who find improved options for links to publicly available databases are encouraged to send the modified parameter files to for inclusion in subsequent Protein Prospector releases.


The MS-Digest index number in the search results has an HTML link to retrieve a listing of all the masses and sequences of peptides that can be produced by digesting the matched protein with the designated enzyme. If No enzyme was designated in the search parameters, then Trypsin is supplied in this HTML link. The number of missed cleavages is set to 2 unless a higher number was designated in the search parameters.

Server administrators can change the HTML link from the MS-Digest index number in the search results.

If the MS-Digest number link marked Coverage Map in the MS-Fit detailed results is pressed then the protein display at the top of the MS-Digest report has the matching peptides highlighted.


The peptide sequence in the search results has an HTML link to MS-Product for retrieving a listing of the theoretical fragment-ions that may be formed in an MS/MS experiment. The default set of ion types supplied in this link corresponds to those expected to be formed in post-source decay (PSD) experiments.


Some Protein Prospector programs allow the peptide terminal groups to be modified from the defaults of hydrogen at the N terminus and free acid at the C terminus.

Users who desire additional options for terminal groups should contact their local Protein Prospector server administrator. For the World Wide Web version of Protein Prospector please send email to: .

Server Administrators can add terminal groups without requiring access to Protein Prospector source code. Those administrators who add terminal groups are encouraged to send the modified parameter files to for inclusion in subsequent Protein Prospector releases.


Any of the 20 standard amino acids can be modified in a user designated way although this option will generally be used to modify cysteine residues.

It is an error to specify more than one constant modification for a single amino acid.

Users who want additional options for constant amino acid modifications should contact their local Protein Prospector server administrator. For the World Wide Web version of Protein Prospector please send email to: .

Server Administrators can add constant modification options without requiring access to Protein Prospector source code. Those administrators who add constant modification options are encouraged to send the modified parameter files to for inclusion in subsequent Protein Prospector releases.

Notes on Cysteine Modification

Carboxymethylation is the product of a reaction with iodoacetic acid, carbamidomethylation is the product of a reaction with iodoacetamide and pyridylethylation is the product of a reaction with vinylpyridine.

In every case we would assume that all cysteines are modified by the addition of the appropriate group, eg for carboxymethylation an H is replaced by CH2COOH for every cysteine, i.e. a nominal mass increase of 58 Da per cysteine.

There are miscellaneous reasons for people choosing particular alkylating agents, including the ease of carrying out the reaction, the efficiency and yield of the reaction, the desire to add relatively small mass increments per cysteine, changes in the properties of the protein and its peptides, etc. Acrylamide modification usually means that there was no deliberate attempt at alkylation before running a gel. Iodoacetic acid and iodoacetamide are both convenient and easy reagents to work with, that react with high yield to add a well-defined mass increment. The acid makes a protein more hydrophilic and tends to open up its structure to more efficient digestion. Other reagents may be more problematical but may offer particular advantages, e.g. vinylpyridine is not water soluble so the reaction is carried out in an organic solvent. This may be more effective for hydrophobic proteins, e.g. membrane proteins. Cyanylation and subsequent cleavage has been developed for identification of multiple bridged cysteines. Then there are the various reagents that add a tag such as biotin to assist with separation, and ICAT that combines this with isotopic labelling.

More than one method of modification (mixing) can NOT generally be designated at the same time for a single search. There is one exception to this rule in the MS-Fit and MS-Digest programs where it is possible to consider Acrylamide Modified Cys in addition to the selected cysteine modification (Modifying Amino Acids).


See also: Modified Cysteine residues.

Both MS-Fit and MS-Digest allow for a specialized set of modified amino acids:

Peptide N-terminal Gln to pyroGlu

Any instance of Glutamine at the N-terminus of a peptide (following digestion) is considered as either normal Gln or as pyro-glutamic acid.
Designation: Q -> q

Oxidation of M

Any instance of Methionine is considered as either normal Met or Met + oxygen.
Designation: M -> m

Protein N-terminus Acetylated

For any database entry with a Met at the N-terminus the N-terminal peptide is considered as either in its original form or in a form where the Met is removed and the next amino acid is acetylated. While this post-translational modification does not occur in bacteria, MS-Fit and MS-Digest don't know any better. Furthermore, if the database curators have removed the N-terminal Met from the sequence, then MS-Fit and MS-Digest will not apply the acetylation modification.

Acrylamide Modified Cys

Any instance of Cysteine is considered as either the Cysteine modification chosen on the Cys modified by: option or acrylamide modified Cys. This option would normally be used to consider each Cysteine as either unmodified or acrylamide modified.

User Defined 1/User Defined 2/User Defined 3/User Defined 4

Up to four of the considered modifications can be selected from a list of user defined modifications which a server administrator can add to. For example if Phosphorylation of S, T and Y is chosen from the list then any instance of Serine, Threonine, or Tyrosine is considered as either normal Ser, Thr, Tyr or phosphorylated Ser, Thr, Tyr.
Designation: S -> s, T -> t, Y -> y


Some Protein Prospector programs allow the use of user specified amino acids for which you must supply the elemental compositions. To specify the user defined amino acid in a peptide or protein sequence use the appropriate letter (lower case u, v, w or x). The default elemental composition for all the user defined amino acids is that of glycine.


Protein Prospector programs expect the mass input values to represent the actual m/z values measured on a mass spectrometer. Thus protons - H+ (other charging agents are not currently allowed), need not be subtracted. However, input data that has had the mass of the protons subtracted can be used; simply designate the charge as 0.


Monoisotopic: only the lowest common isotope for each element is used in the mass calculations 12C, 1H, 14N, 16O, 32S, 31P.

Average: All isotopes for each element are used and with their abundances reflecting their "normal" proportion in the biosphere. The isotope abundances can be changed by editing the elements.txt file.

Par(mi)Frag(av): Parent masses are calculated as monoisotopic and fragment masses are calculated as average. The Par(mi)Frag(av) option should be chosen when the mass accuracy on fragment mass measurements is modest ( +/- 1000 ppm ).

Par(av)Frag(mi): Parent masses are calculated as average and fragment masses are calculated as monoisotopic. The Par(av)Frag(mi) option should be chosen when the mass accuracy of the parent mass measurement is modest ( +/- 1000 ppm ).


Protein Prospector programs can handle multiply charged data from both positive and negative ion experiments. Simply specify the integer charge state corresponding to the m/z value. Absence of charge specification in the input defaults to a charge state of +1. Input data that has had mass of the protons subtracted can be used; simply designate the charge as 0. The charge is used to convert the m/z value to an MH+ value for search purposes. Output will show the m/z value with the charge as a superscript.


The mass tolerances should be set to be consistent with the mass accuracy of the instrument used to generate the data. It is generally a better idea to use units of ppm or % rather than Da, as mass spectrometers typically have an error associated with mass measurement that is mass dependent and thus cannot be uniformly expressed in Da.

Measuring masses as accurately as possible is the single most important thing one can do to achieve the highest certainty of protein identification in a peptide mass fingerprinting experiment.


Two types of data set are used in Protein Prospector. The programs MS-Fit and MS-Bridge use MS data sets and the programs MS-Product, MS-Seq and MS-Tag use MS/MS data sets. In addition MS-Fit and MS-Tag can now search multiple data sets at one time.

An MS data set consists of a list of parent ion measurements - one per line. An MS/MS data set consists of a parent ion measurement on one line followed by a list of fragment ion measurements each on separate lines.

The Data Format parameter is used to specify the format a single data point on a single line of an MS or MS/MS file. It can be set to either M/Z Charge or M/Z Intensity Charge. If the charge is not specified for a data point then it defaults to 1. Note that if you specify an M/Z Intensity Charge format for an M/Z Charge data set then the charges will be incorrectly read as intensities.

Intensities should not be specified for the parent ion measurement in an MS/MS file.


Protein Prospector currently has three options for data sources:

1). Enter a list of files each containing a single data set. These files must be on the same computer file system as the Prospector software so this option is not available for people using one of our web sites. This option works differently depending on whether you are using a UNIX or Windows version of the software. This data source is used on the Batch MS-Fit and Batch MS-Tag forms.

2). Select a file containing the data from the local disk using the Browse button. The file must be a text file created using a program such as Windows Notepad. It can contain multiple datasets which should be separated by a > character. This data source is used on the MS-Fit Web Batch and MS-Tag Web Batch forms.

3). Paste the data into the data paste area. Multiple data sets separated by > characters can be entered into the paste area as shown above. This data source is used on the standard MS-Fit and MS-Tag forms.


Windows

If the files are all in the same directory then this directory should be specified in the Data File Directory item. The file names themselves should be specified one per line in the Data Files item. If the files are in different directories then the Data File Directory item should be left blank and the full path of each file should be specified in the Data Files item.

On Microsoft Windows systems you can use wildcard characters to specify more than one file at a time. The following rules thus apply when specifying filenames:

a). A filename can contain up to 255 characters, including spaces. But, it cannot contain any of the following characters:

\ / : * ? " < > |

b). Wildcard characters can represent one or more characters. The question mark (?) wildcard can be used to represent any single character, and the asterisk (*) wildcard can be used to represent any character or group of characters that might match that position in other filenames.

c). Capitalization doesn't matter.

The Data Files option can contain more than one wild carded filename.

Specifying a list of files in this way is only possible for MS-Fit and MS-Tag.

UNIX

The files must all be in the same directory and this directory should be specified in the Data File Directory item. The file names or regular expressions representing groups of files should be specified one per line in the Data Files item.

The regular expressions which can be used to represent a group of file names are described in the regular expressions section.

Capitalization matters.

The Data Files option can contain more than filename regular expression.

Specifying a list of files in this way is only possible for MS-Fit and MS-Tag.


data set 1
>
data set 2
>
data set 3

Note that for security reasons the Upload Data From File item is cleared once you submit the form so you have to reselect the file before you can do a subsequent search. This is a function of the web browser rather than Protein Prospector.

Only a single data set can be specified for MS-Seq, MS-Bridge and MS-Product.

Paste the data into the data paste area. Multiple data sets separated by > characters can be entered into the paste area as shown above.


This option is used to limit the maximum number of hits displayed. For example if the maximum number of reported hits is set to 50 and there are 100 hits then only the first 50 hits are displayed.


The search is aborted if this parameter is exceeded.


This option allows a user defined comment or sample identifier to be added the output.


Searches can be restricted to matching sequences containing particular amino acid(s) by checking the appropriate boxes. This information can be derived from the masses of immonium and related low-mass ions or high-mass ions indicating side-chain losses from the parent ion. The programs do not actually use the mass values but instead filter the matched sequence for the presence of the designated amino acid(s).

In MS-Tag the masses of immonium and related low-mass ions can also be placed directly in the fragment-ion mass window. MS-Tag invokes the same rules as conveyed in the check box chart, and converts the masses to AA characters and filters matched sequences as above for presence of the described amino acid(s). Protein Prospector server administrators can control these immonium ion rules by editing the immonium.txt file.


MS-Comp considers the 20 naturally occurring amino acids as a default. If you know that your unknown peptide doesn't contain particular amino acids you can narrow the range of the search by excluding them. You might also wish to exclude either Leucine or Isoleucine.


MS-Comp considers the 20 naturally occurring amino acids as a default. They can also optionally include the following:

  • m - Oxidized Methionine
  • q - Pyroglutamic Acid
  • h - Homoserine Lactone
  • s - Phosphorylated Serine
  • t - Phosphorylated Threonine
  • y - Phosphorylated Tyrosine
  • u - A User Specified Amino Acid
  • v - A User Specified Amino Acid
  • w - A User Specified Amino Acid
  • x - A User Specified Amino Acid

Some Protein Prospector parameters are specific to an instrument type. Server administrators can modify these parameters or add new instrument types by editing the instrument.txt file.


The regular expressions used are of the form used by the UNIX grep facility. Examples (type man grep on a UNIX system for full details):

[AB] The character is either A or B.
[A-IK-Y] the character is either alphabetically between A and I or K and Y
[^AB] The character is anything but A or B.
. Any single character is possible.
.* Used to represent a sequence of one or more unknown characters

Where stated the regular expressions are case insensitive.


Separate Proteins

If this is option is selected there is a separate section in the MS-Digest/MS-Bridge report for each protein.


Hide Protein Sequence

The complete protein sequence is normally displayed in the MS-Digest/MS-Bridge/MS-NonSpecific output. You can disable this display using the Hide Protein Sequence option.

This option is also available on the FA-Index Database Summary Report form. However the protein sequence is off by default for these reports.


It is possible to retrieve entries from the database by specifying either the Accession Number or the Index Number. The accession number is a unique identifier for a protein within the database. It will not change between subsequent revisions of the database and is external to the Protein Prospector package. The index number for a particular protein is internal to the Protein Prospector package and is likely to change when you update the database. Both the index number and the accession number are reported in Protein Prospector search results. Entries are generally more efficiently retrieved using index numbers.


Ticking the Chem Score box reports the Chem Score for each peptide as described in the paper:

Parker, Kenneth C. (2002) "Scoring Methods in MALDI Peptide Mass Fingerprinting: Chem Score and the ChemApplex Program", J. Am. Soc. Mass Spectrom., 13, 22-39

Terminal peptide adjustments are not included in the score. This option should only be used if you are doing an experiment similar to that described in the paper.