Instructions for ProteinProspector Programs
The Ludwignr database is a non-redundant database made up from several smaller databases contained in the directory ftp://ftp.ch.embnet.org/pub/databases/nr_prot. You need to download the ones you are interested in individually and then concatenate them together to make one file. To do the concatenation on the UNIX operating system you can use the cat command from the command line. For Windows NT one option is to download some Windows Explorer extensions which include a concatenation feature. http://www.funduc.com/explorer_extensions.htm.
If you really want to know what FA-Index does and why, please read the manual. Don't even think about trying to use proprietary databases or update databases daily, UNLESS you read the FA-Index manual, particularly the generic database filenaming sections.
FA-Index will create a file with a .usp suffix (eg. Genpept.r95.usp) where it writes the comment line for each FASTA entry which the FA-Index program cannot parse out the species. Viewing this file can help troubleshoot FASTA format problems for anyone using proprietary databases.
/home/httpd//acclinks.txt  
on SunOS UNIX systems.
C:\http\acclinks.txt  
on Windows NT systems.
The database accession number in the search results has an HTML link to retrieve the complete entry including comments from a remote database. In order for this link to be created the programs need to know the URL for the remote database. This is accomplished through parameters contained in the acclinks.txt file. Occasionally the URL's to the remote database may need to be updated, or new ones added for a new database. This requires editing of the acclinks.txt file.
Within the acclinks.txt file an entry for an HTML link from the accession number MUST contain 1 line:
The line must contain the following information:
Note that this link need not be to a sequence database. The link could be to whatever a ProteinProspector server administrator specifies.
Example:
Below is an example of the entries for Genpept in
acclinks.txt:
Genpept http://www3.ncbi.nlm.nih.gov/htbin-post/Entrez/query?db=p&form=6&dopt=ng&uid=
gen http://www3.ncbi.nlm.nih.gov/htbin-post/Entrez/query?db=p&form=6&dopt=ng&uid=
The lowercase prefixes gen, owl, swp, or nr are intended to be used for a second database that is of the same format as the uppercase one. See Linking for creating links into NCBI databases. As mentioned above the prefix name can refer to a single database or a set of databases. For example if you have two user created databases called PA3_mouse and PA33_mouse, an entry in the acclinks.txt file of the form:
PA3 some_url_prefix
would give the databases the same accession number link. On the other hand entries of the following form:
PA3 some_url_prefix
PA33 another_url_prefix
would give the databases different accession number links.
ProteinProspector server administrators who find improved options for links to publicly available databases are encouraged to send the modified parameter files to for inclusion in subsequent ProteinProspector releases.
/home/httpd//idxlinks.txt  
on SunOS UNIX systems.
C:\http\\idxlinks.txt   on
Windows NT systems.
The MS-Digest index number in the search results has an HTML link to retrieve an MS-Digest listing for the matched database entry. In order for this link to be created the programs need to know the URL to MS-Digest and some default parameters. This is accomplished through information contained in the idxlinks.txt file. A server administrator can customize these parameters by editing the idxlinks.txt file.
Within the idxlinks.txt file an entry for an HTML link from the MS-Digest index number MUST contain 2 lines:
The lines must contain the following information:
Note that this link need not be the same for each ProteinProspector program creating the link, and that the MS-Digest parameters can be customized. Furthermore, this link need not be to MS-Digest at all; the link could be to whatever a ProteinProspector server administrator specifies.
Example:
Below is an example of the entries for msfit and mstag in
idxlinks.txt:
msfit
MSDIGEST?
mstag
MSDIGEST?mod_AA=Peptide+N-terminal+Gln+to+pyroGlu&mod_AA=Oxidation+of+M&mod_AA=Protein+N-terminus+Acetylated
/home/httpd//seqlinks.txt  
on SunOS UNIX systems.
C:\http\\seqlinks.txt   on
Windows NT systems.
The peptide sequence in the search results has an HTML link to retrieve an MS-Product listing for the peptide sequence. In order for this link to be created the programs need to know the URL to MS-Product and some default parameters. This is accomplished through information contained in the seqlinks.txt file. A server administrator can customize these parameters by editing the seqlinks.txt file.
Within the seqlinks.txt file an entry for an HTML link from the peptide sequence MUST contain 2 lines:
The lines must contain the following information:
Note that the parameters need not be for PSD and the link need not be to to MS-Product. The link could be to whatever a ProteinProspector server administrator specifies. BLAST or other sequence homology search programs for example.
Example:
Below is an example of the entry for mstag in seqlinks.txt:
mstag
MSPROD?parent_mass_convert=average&it=i&it=m&it=a&it=b&it=y&it=I&it=h&it=n&it=B
/home/httpd//elelinks.txt  
on SunOS UNIX systems.
C:\http\\elelinks.txt   on
Windows NT systems.
The elemental composition in the search results has an HTML link to retrieve an MS-Isotope listing for the elemental composition. In order for this link to be created the programs need to know the URL to MS-Isotope and some default parameters. This is accomplished through information contained in the elelinks.txt file. A server administrator can customize these parameters by editing the elelinks.txt file.
Within the elelinks.txt file an entry for an HTML link from the elemental composition MUST contain 2 lines:
The lines must contain the following information:
Note that the link need not be to MS-Isotope and that further MS-Isotope parameters could be specified. The link could be to whatever a ProteinProspector server administrator specifies.
Example:
Below is an example of the entry for msprod in
elelinks.txt:
msprod
MSISO?distribution_type=Elemental+Composition
/home/httpd//species.txt       on SunOS UNIX systems.
C:\http\species.txt      on Windows NT systems.
In order to limit searches to a particular species, or a collection of species, the programs have to
correlate the species name selected in the HTML form with the species names
in the database entries. This is accomplished through the species alias file
species.txt.
To add or change a particular species filter you must edit
the species.txt,
file and the Javascript which controls the appearance
of the species in the program HTML input files:
     
Note that the Javascript file can also be updated automatically from the FA-Index page. See Updating the Database and Species Lists in the HTML Forms for details.
There are three types of entry in the
species.txt file:
     
Within the species.txt file a single species entry must contain at least ONE line, species are separated by a line with only the ">" symbol.
Line 1 contains the species name as it appears in the HTML input files. Line 1 is only used to relate the HTML input entry to possible aliases in the databases. Every species in the HTML input pages should have an entry in this file. If one of those species is NOT present in any database leave the entry with only one line. No error message will be generated that way.
All other lines should contain names (aliases) by which the species may be found in the databases. The aliases can be in any order.
Examples:
>
HELICOBACTER PYLORI
HELPY
HELICOBACTER PYLORI
>
HOMO SAPIENS
HUMAN
H. SAPIENS
H.SAPIENS
HUMHBC
HOMO SAPIENS
>
In the first example HELPY is a typical SwissProt species alias and HELICOBACTER PYLORI is typical of what might be found in Genpept. A database such as Owl, which contains entries from several sources, would typically use several aliases.
If a program error directs you to the species.txt file likely problems are:
Multiple species entries allow you to group species together in a search. A typical example which restricts the search to the HOMO SAPIENS, BOS TAURUS and SUS SCROFA species is:
[Mammals]
HOMO SAPIENS
BOS TAURUS
SUS SCROFA
>
Line 1 contains the identifier for the multiple species entry as it appears in the HTML input files. The identifier is enclosed by the '[' character and the ']' character as in the example. Every multiple species entry in the HTML input pages should have an entry in the species.txt file.
The other lines should contain the names of the species that you which to include in the search. These can either be multiple or single species entries in the species.txt file.
Excluded species entries allow you to exclude species from a search. A typical example which includes all species except HOMO SAPIENS, BOS TAURUS and SUS SCROFA is:
]Model Organisms[
HOMO SAPIENS
BOS TAURUS
SUS SCROFA
>
Line 1 contains the identifier for the excluded species entry as it appears in the HTML input files. The identifier is enclosed by the ']' character and the '[' as in the example. Every excluded species entry in the HTML input pages should have an entry in the species.txt file.
The other lines should contain the names of the species that you wish to exclude. The species that you wish to exclude MUST have single species entries in the species.txt file.
/home/httpd//cys.txt      
on SunOS UNIX systems.
C:\http\cys.txt      
on Windows NT systems.
In order to require a particular modification to cysteine residues, the programs have to correlate the modification selected in the HTML form with the mass parameters for the particular modification. This is accomplished through an alias list located on the server in the file: cys.txt.
To add or change a particular cys residue you must edit:
     
and the cysteine modification list in the Javascript which controls the appearance
of the cysteine modification in the program HTML input files:
     
Within the file cys.txt an entry for each modification MUST contain 5 lines.
line 1) contains the name of the modification as it appears in the HTML input
pages for each program.
line 2) contains a name for the amino acid, but isn't used anywhere.
line 3) contains the elemental formula for each residue.
lines 4) and 5) contain elemental formulas for side-chains that are used in
calculating d and w ions. If there are no beta
substituents, or they are irrelevant, then enter 0
(zero) on these lines.
Below is an example of the entry for acrylamide in cys.txt:
acrylamide
Acrylamide modified Cysteine
C6 H10 N2 O2 S1
H1
0
Make sure the elements in your modified cysteine residue are present in the file elements.txt. See also, To Add/Change Elements.
If you add a new modified cysteine residue, please, send the modified parameter files to for inclusion in subsequent ProteinProspector releases.
/home/httpd//aa.txt
      on SunOS UNIX systems.
C:\http\\aa.txt      
on Windows NT systems.
Detailed information on all amino acids used in the programs is located on the server in the file: aa.txt.
You must edit this file to add or change an amino acid or modify amino acid pK values.
Within this file an entry for an amino acid MUST contain 9 lines:
line 1) contains a name for the amino acid, but isn't used anywhere.
line 2) contains a single letter code for the amino acid.
line 3) contains the elemental formula for each residue.
lines 4) and 5) contain elemental formulas for side-chains that are used in
calculating d and w ions. If there are no beta
substituents, or they are irrelevant, then enter 0
(zero) on these lines.
line 6) contains the pk_C_term for the amino acid.
line 7) contains the pk_N_term the amino acid.
line 8) contains the pk_acidic_sc for the amino acid.
line 9) contains the pk_basic_sc for the amino acid.
Below is an example of the entry for Isoleucine in aa.txt:
Isoleucine
I
C6 H11 N1 O1
C1 H3
C2 H5
3.55
7.5
n/a
n/a
Make sure the elements in your amino acid are present in the file elements.txt. See also, To Add/Change Elements.
If you add a new amino acid, please, send the modified parameter file to for inclusion in subsequent ProteinProspector releases.
/home/httpd//n_terms.txt
      on SunOS UNIX systems.
C:\http\\n_terms.txt      
on Windows NT systems.
In order to for ProteinProspector programs to designate particular modified termini, the program must correlate the modified termini selected in the HTML form with the mass parameters for the particular termini. This is accomplished through parameter lists located on the server in the files n_terms.txt and c_terms.txt.
To add or change a particular terminus you must edit the relevant file.
Then you must edit terminal modifying group lists in the Javascript
which controls the appearance
of the termini in the program HTML input files:
program HTML input form files:
     
Within the files n_terms.txt or c_terms.txt an entry for each terminus MUST contain 2 lines.
line 1) contains the name of the terminus as it appears in
the HTML input page for MS-Product.
line 2) contains the elemental formula for the terminus.
Below is an example of the entry for acetyl in n_terms.txt:
Acetyl
C2 O H3
Make sure the elements in your terminal modifying group are present in the file elements.txt. See also, To Add/Change Elements.
If you add a new terminal modifying group, please, send the modified parameter file to for inclusion in subsequent ProteinProspector releases.
/home/httpd//elements.txt
on SunOS UNIX systems.
C:\http\\elements.txt on
Windows NT systems.
Detailed information on all elements used in the programs is located on the server in the elements.txt file. You must edit this file to add or modify an element.
Within the file elements.txt an entry for an element MUST contain 1 line:
The line contains the following information:
a). The symbol for the element.
b). The valency of the element.
c). The number of isotopes listed on the line.
d). A mass/abundance pair for each isotope.
Below is an example of the entry for hydrogen:
H 1 2 1.007825035 .99985 2.014101779 0.00015
If you add a new element, please, send the modified parameter file to for inclusion in subsequent ProteinProspector releases.
/home/httpd//enzyme.txt
on SunOS UNIX systems.
C:\http\\enzyme.txt on
Windows NT systems.
Detailed information on all enzymatic digests used in the programs is located on the server in the enzyme.txt file. You must edit this file to add or modify the rules for an enzymatic digest.
Within this file an entry for an enzymatic digest MUST contain 4 lines:
line 1) contains a name for the enzymatic digest;
line 2) contains a list of cleavage amino acids;
line 3) contains a list of exception amino acids (a '-' character indicates no exceptions);
line 4) either C for cleavage on the C terminus side of an amino acid or N for cleavage
on the N terminus side.
Below is an example of the entry for Trypsin:
Trypsin
KR
P
C
Then you must edit the enzyme name list in the Javascript
which controls the appearance of the enzyme in the program HTML input files:
     
You can combine the cleavage rules for two or more enzymes by having them on the same line in the enzyme name list separated by a '/' character. For example to have an option which combines the cleavage rules for CNBr and Trypsin you would need the following line:
<OPTION> Trypsin/CNBr
It is possible to mix enzymes which cleave on the N-terminus side with those that cleave on the C-terminus side.
If you add a new enzymatic digest please send the modified parameter file to for inclusion in subsequent ProteinProspector releases.
/home/httpd//immonium.txt
on SunOS UNIX systems.
C:\http\\immonium.txt on
Windows NT systems.
The file contains the immonium ion masses and corresponding compositional information for use by ProteinProspector programs.
The first 2 entries in the file are for the immonium tolerance and the minimum fragment ion mass (both in Da). This is followed by a list of immonium ions.
An entry for an immonium ion contains:
1). The mass.
2). The compositional information. In the case where a given mass can
represent more than one amino acid the list of amino acids should be
enclosed in square brackets.
3). Ions labelled as M are major peaks; these are used to include an
amino acid when using immonium ions to extract compositional ions in
MS-Tag and MS-Seq. Minor ions are labelled m and are only likely to
be present alongside major ions. They are reported in the immonium and
related ions section of the MS-Product report.
4). A list of amino acids to exclude if the mass is missing or a dash
(-) character if there are no amino acids to exclude. Excluding amino
acids on the basis of missing peaks is a feature that can be turned
off.
The fields must be separated by spaces or tabs.
The ions must be in mass order.
For example:
60.0 S M -
70.0 [RP] M P
72.0 V M -
73.0 R m -
74.0 T M -
84.0 [KQ] M -
86.0 [IL] M IL
87.0 [NR] M -
88.0 D M -
100.0 R m -
101.0 [KQ] M -
102.0 E M -
104.0 M M -
110.0 H M H
112.0 R M R
120.0 F M -
126.0 P M -
129.0 [KQ] m -
136.0 Y M -
138.0 H m -
159.0 W M -
Any suggestion for improving this scheme should be sent to
for inclusion in subsequent ProteinProspector releases.
/home/httpd//usermod.txt
on SunOS UNIX systems.
C:\http\\usermod.txt on
Windows NT systems.
Detailed information on the user defined modifications used in MS-Fit and MS-Digest is located on the server in the usermod.txt file. You must edit this file to add or modify the rules for user defined modifications.
Within this file an entry for a user defined modification MUST contain 4 lines:
line 1) contains a name for the modification;
line 2) contains the code to be used for the modification in MS-Fit and MS-Digest reports;
line 3) contains an elemental formula for the modification (elements
can be negative - eg Amidation would be N H O-1);
line 4) contains a list of amino acids to check for the modification.
Below is an example of the entry for Phosphorylation of S, T and Y:
Phosphorylation of S, T and Y
PO4
P O3 H
STY
You must then edit the user defined modification name list in the Javascript
which controls the appearance of the User Defined Modification option in the
program HTML input files:
     
/home/httpd//sp_graph.par.txt
on SunOS UNIX systems.
C:\http\\sp_graph.par.txt on
Windows NT systems.
The graph in MS-Isotope is a Java applet which uses the information in the sp_graph.par.txt file to control its appearance.
The file contains comment lines (starting with a # character) explaining the information fields beneath them. The following information is stored in the file:
Colors are specified as 3 integers for the red, green and blue intensities respectively. The intensity values must be between 0 and 255.
A font specification is made up of a font family (Dialog, Helvetica, TimesRoman, Courier or Symbol), a font style identifier (PLAIN, BOLD or ITALIC) and a point size.
/home/httpd//colors.txt
on SunOS UNIX systems.
C:\http\\colors.txt on
Windows NT systems.
The following colors may be defined to override the default values.
Colors are defined as name value pairs separated by white space; the body_background_color is defined below as an example. A two digit hexadecimal number is used to define each of the red, green and blue values (eg. 000000 is black, FFFFFF is white, FF0000 is red, 00FF00 is green, 0000FF is blue, AAAAAA is grey, FFFF00 is yellow, etc).
body_background_color DDFFDD
/home/httpd//instrument.txt
on SunOS UNIX systems.
C:\http\\instrument.txt on
Windows NT systems.
An entry for an instrument MUST contain least ONE line, instruments are separated by a line with only the ">" symbol. line 1) Must contain the instrument name as it appears in the html input files. Every instrument in the html input pages should have an entry in this file.
This can be followed by optional lines which override the default instrument parameters. The additional lines have the form of name value pairs separated by a space. The possible parameters are listed below:
1). A list of amino acids which lose NH3 in MS/MS fragmentation.
name: nh3_loss
default value: RKNQ
2). A list of amino acids which lose H2O in MS/MS fragmentation.
name: h2o_loss
default value: STED
3). A list of positive charge bearing amino acids.
name: pos_charge
default value: RHKN
4). The maximum internal ion mass.
name: max_internal_ion_mass
default value: 700.0
5). The number of decimal places used when printing out parent ions in reports.
name: parent_precision
default value: 4
6). The number of decimal places used when printing out fragment ions in reports.
name: fragment_precision
default value: 2
7). A list of fragment ions types (one per line) which occur in MS/MS fragmentation.
name: it
possible values: a
a-H2O
a-NH3
a-H3PO4
b
b-H2O
b-NH3
b+H2O
b-H3PO4
b-SOCH4
c
x
y
y-H2O
y-NH3
y-H3PO4
y-SOCH4
Y
z
I Internal ions.
C C-ladder ions.
N N-ladder ions.
i Immonium and low mass ions.
m
d
v
w
h MH-H2O, b-H2O if b, b-H2O if y.
n a-NH3 if a, b-NH3 if b, y-NH3 if y.
B b+H2O if b.
P a-H3PO4 if a, b-H3PO4 if b, y-H3PO4 if y.
S b-SOCH4 if b, y-SOCH4 if y.
MH-H2O
The following ion types are possible in MS-Tag.
a,a-NH3,a-H2O,a-H3PO4,b,b-H2O,b-NH3,b+H2O,b-H3PO4,b-SOCH4,c
y,y-NH3,y-H2O,y-H3PO4,y-SOCH4
I,C,N,h,n,B,P,S
None are defined by default.
Below is an example of the entry for MALDI-TOF:
MALDI-TOF
nh3_loss RKQ
h2o_loss ST
pos_charge RHK
it a-NH3
it a
it b
it b-NH3
it b-H2O
it b+H2O
it y
it y-NH3
it y-H2O
it I
>
You must then edit the instrument list in the JavaScript which controls
the appearance of the Instrument option in the program HTML input files:
     
/home/httpd//homology.txt
on SunOS UNIX systems.
C:\http\\homology.txt on
Windows NT systems.
An entry for a homology/modified amino acid matrix MUST contain least TWO lines, homology matricies are separated by a line with only the ">" symbol.
line 1) Must contain the matrix name as it appears in the html input files. Every matrix in the html input pages should have an entry in this file.
Subsequent lines (of which there must be at least one) should contain the following information separated by a space:
a). an amino acid;
b). a list of amino acids that the amino acid in a) can mutate or be modified to.
Any of the amino acids in b) may be followed by (N) or (C) to denote that the modification can only take at the N or C terminus of a peptide.
Below are examples of entries for a comprehensive homology option and for an option which allows one unknown amino acid per peptide:
homology
A CDEFGHIKLMNPQRSTVWYmq(N)sty
C ADEFGHIKLMNPQRSTVWYmq(N)sty
D ACEFGHIKLMNPQRSTVWYmq(N)sty
E ACDFGHIKLMNPQRSTVWYmq(N)sty
F ACDEGHIKLMNPQRSTVWYmq(N)sty
G ACDEFHIKLMNPQRSTVWYmq(N)sty
H ACDEFGIKLMNPQRSTVWYmq(N)sty
I ACDEFGHKLMNPQRSTVWYmq(N)sty
K ACDEFGHILMNPQRSTVWYmq(N)sty
L ACDEFGHIKMNPQRSTVWYmq(N)sty
M ACDEFGHIKLNPQRSTVWYmq(N)sty
N ACDEFGHIKLMPQRSTVWYmq(N)sty
P ACDEFGHIKLMNQRSTVWYmq(N)sty
Q ACDEFGHIKLMNPRSTVWYmq(N)sty
R ACDEFGHIKLMNPQSTVWYmq(N)sty
S ACDEFGHIKLMNPQRTVWYmq(N)sty
T ACDEFGHIKLMNPQRSVWYmq(N)sty
V ACDEFGHIKLMNPQRSTWYmq(N)sty
W ACDEFGHIKLMNPQRSTVYmq(N)sty
Y ACDEFGHIKLMNPQRSTVWmq(N)sty
>
Unknown Amino Acid
X ACDEFGHIKLMNPQRSTVWY
>
You must then edit the homology/modified amino acid matrix name list in the Javascript
which controls the appearance of the Search Mode option in the program HTML input files:
     
Edit the computer.txt file.
The following parameters are currently available:
1). The default memory block size used in memory mapping.
name: block_size
default value: 65536
This number is applicable for Windows NT systems and should not be changed.
2). The number of blocks to use as a default memory map size when reading a database.
name: num_blocks
minimum value: 1
default value: 256
maximum value: 16384
The default value assumes that 16 MBytes blocks are mapped in for Windows NT. The maximum value is 1 GByte. You might want to vary this parameter to see if it affects search times. If you have a lot of RAM then a much bigger number would be appropriate.