DB-Stat Instructions

Description, Instructions, and Tips for DB-Stat

Purpose

This document provides instructions for DB-Stat.

Contents of this document:

Introduction and Background
Amino Acid Statistics
Histogram

Links to topics in the general instructions:

Introduction and Background

DB-Stat calculates the following statistics for a given database (for DNA databases a reading frame must also be selected):

the total number of entries in the database;
the number of entries within the selected molecular weight range;
the number of entries within the selected pI range;
the number of entries for the selected species;
the number of entries fitting the combined molecular weight, species and pI parameters.

For the given digest, molecular weight, species and pI parameters:

the index number of the longest protein;
the number of amino acids in the longest protein;
the mass of the longest protein;
the index number of the protein with the most digest fragments;
the number of digest fragments for the protein with the most digest fragments;
the total number of digest fragments;
the average protein mass.

Amino Acid Statistics

If you select this option you also get a table reporting the number of each amino acid for the current search.

Histogram

A histogram is displayed showing the mass distribution of theoretical peptides. This can be useful in ascertaining the likely mass distribution of peptides from a particular digestion strategy. The histogram mass range may also be entered.

If the histogram is made up from less than 300000 data points a high resolution density plot is also drawn using the R statistics package. The bandwidth for the density plot can be entered which means for example that the distribution of peptide masses within a small mass window can be displayed. The bandwidth (in Da) corresponds to the bw parameter in the R density plot function. It determines how much the data is smoothed. A smaller bandwidth will result in a more detailed (and often noisier) density estimate, while a larger bandwidth will yield a smoother estimate but may obscure important features of the data.