This document provides instructions for DB-Stat.
- Databases
- Species Filtering
- Species Code Filtering
- Intact Protein MW Filtering
- Intact Protein pI Filtering
- Accession Number Filtering
- Species Code Filtering
- Enzyme specificity / Missed cleavages
- Frame Translation in DNA databases
DB-Stat calculates the following statistics for a given database (for DNA databases a reading frame must also be selected):
- the total number of entries in the database;
- the number of entries within the selected molecular weight range;
- the number of entries within the selected pI range;
- the number of entries for the selected species;
- the number of entries fitting the combined molecular weight, species and pI parameters.
For the given digest, molecular weight, species and pI parameters:
- the index number of the longest protein;
- the number of amino acids in the longest protein;
- the mass of the longest protein;
- the index number of the protein with the most digest fragments;
- the number of digest fragments for the protein with the most digest fragments;
- the total number of digest fragments;
- the average protein mass.
If you select this option you also get a table reporting the number of each amino acid for the current search.
A histogram is displayed showing the mass distribution of theoretical peptides. This can be useful in ascertaining the likely mass distribution of peptides from a particular digestion strategy. The histogram mass range may also be entered.
If the histogram is made up from less than 300000 data points a high resolution density plot is also drawn using the R statistics package. The bandwidth for the density plot can be entered which means for example that the distribution of peptide masses within a small mass window can be displayed. The bandwidth (in Da) corresponds to the
bwparameter in the R density plot function. It determines how much the data is smoothed. A smaller bandwidth will result in a more detailed (and often noisier) density estimate, while a larger bandwidth will yield a smoother estimate but may obscure important features of the data.