Information on Local Protein Prospector Installation

There are different packages of Protein Prospector available, with some having more functionality, but as a result requiring installation of extra supplementary packages (and hence a more complicated install). Below we provide information to help identify which package of Protein Prospector is most suitable for your needs. We also discuss some of the relative benefits of having a local installation compared to use of the public website. If after reading this information you are interested in a local installation, please provide answers to the points in the checklist below when contacting us (Prospector email).

Public Website or Local Installation?

Advantages of a local installation

  • Privacy. If you are using a private, proprietary database or you work for a company that does not allow submission of data to a public website, then you need a local installation (note: Protein Prospector is only free for academic use; for-profit companies need to contact http://ita.ucsf.edu, as described in the License agreement at the bottom of this page). Administrators of the public website can access your searches, but we do not do so unless contacted.

  • Processing Power. The public website uses multi-processor searching, but the number of cores allocated to a search is split among submitted searches. Depending on the number of submitted searches, between 2-8 cores may be allocated to your search. In general, the public website is likely to be faster than any workstation, but will not compete with large compute clusters.

  • Use of a protein database not on the public website. Batch-Tag does allow submission of User Protein Sequences, and it is possible to upload a whole database through this feature, but this is not recommended for large databases containing more than a few hundred proteins.

  • Searching for peptide modifications not listed as options. Batch-Tag allows adding of User Defined Variable Modifications, but if these modifications are to be considered regularly it is more convenient if they are in the list of variable modification options.

Advantages of using the public website

  • Updates. The public website is regularly updated with new features and bug fixes. We have limited ability to update local installations, especially as installation versions get older.

  • Maintenance. If people break a functionality; e.g. by altering parameter files, it may difficult or not possible to troubleshoot remotely.

  • User Support. Users of the public website can contact the Administrators (Prospector email) and get advice about searches they have performed; e.g. how to improve search parameters; whether to trust certain results...
Check List
  1. Basic Version / Full Version

  2. LINUX (specify version and 32-bit/64-bit) / Windows (specify version) / Both

  3. Web Interface / Command Line

  4. Single Processor / Multi Processor (Batch-Tag only)

  5. Lab Repository Required (Batch-Tag only)

  6. Quantification or Raw Data Viewing Required (Batch-Tag only)

1). The choice between Basic or Full version depends on which Protein Prospector programs you want to run. If you do not require the Batch-Tag and Search Compare programs then you only need the Basic version; otherwise you need the Full version.

2). The software will run on both Windows and LINUX. For Windows there is an Installation Wizard which has been tested for versions up to Windows 7 (32 and 64 bit) and Windows 10. For Windows 7 and Windows 10 see these installation notes before installing. The installation has not been tested on Windows 8, but this may work. There is a Windows 7 Basic Version installation video. LINUX installations have been tested on Debian, CentOS, OpenSUSE and Ubuntu. At this stage we suggest using a 64-bit version of LINUX. The LINUX installation instructions are available in the Protein Prospector Installation Manual. It is possible to run Protein Prospector on a Virtual Machine, such as Oracle VirtualBox, or a cloud computer. Batch-Tag searches will typically run significantly faster on a LINUX server, so this is the recommended option if you want to set up a multi-user environment or have large data processing requirements. However, if you want to perform quantification (see also point 6 below) then you need Windows to extract information from raw data files. The public Protein Prospector website is running on Debian LINUX and uses a separate Windows machine to extract quantification information.

3). Protein Prospector can be run from the command line or via a web interface (like the public web site). Both the Basic and Full versions can be run from either interface. You might be interested in running the software from the command line if you want Protein Prospector to be part of a processing pipeline or if you want to run very large Batch-Tag searches on a shared computing cluster. More details about running Protein Prospector from the command line can be found here.

4). To run Batch-Tag searches multi-processor you need to install MPI. Although this is recommended it is also possible to run single-processor Batch-Tag searches.

5). Batch-Tag can access a local repository containing raw data and peak lists. Accessing this allows creation of projects from files in the repository using Batch-Tag, rather than having to upload files using Batch-Tag Web. Not having to upload files makes project creation faster. In addition, most web browsers have an upload limit of around 2 GB. Thus, a lab repository is recommended if you process large, multi-fraction data sets, have multiple users who share data, or if you want to perform quantification, where raw data is required. If you set up a lab repository then it is still possible for users to upload data.

6). Search Compare needs access to the manufacturer's raw data files in order to perform quantification of Batch-Tag searches (label-free or isotope-labeling), or to access the MS precursor data (e.g. to allow the user to check for correct monoisotopic peak selection and the presence of co-eluting precursors within the isolation window). Access to these files requires a Windows instance in your system. You can either solely have a Windows installation or have a LINUX system with an additional Windows system to pick up the quantification requests. This could either be a separate computer or a virtual machine. Raw data from Thermo and SCIEX instruments can currently be accessed by Protein Prospector. Currently accessing SCIEX data requires an installation of Analyst QS 2.0 which only runs on Windows XP, Windows XPx64 or Windows Server 2003.

External Package Requirements

1). Apache. Apache is required if you require a web rather than command line interface. On Windows it is also possible to use IIS. However this option is not supported by the installation wizard.

2). Perl. Perl is required for:

a). Running Batch-Tag searches on LINUX. It is not necessary to run Batch-Tag searches on Windows.

b). Running the tool autofaindex.pl which can automatically download FASTA databases and run the FA-Index program to index them. It is also possible to download the files and run FA-Index yourself.

c). For some MS-Viewer format conversion scripts and the batch MS-Viewer processing script automsviewer.pl.

d). It is possible to arrange to email a user once Batch-Tag searches have finished via a Perl script.

3). MPI. MPI is required for running multi-processor Batch-Tag searches. It is possible to run Batch-Tag as a single processor process without this package. The openmpi package is suggested for LINUX although mpich2 is also possible. On Windows MPI is installed if required by the installation wizard.

4). mySQL. mySQL is required if you want to run Batch-Tag via a web interface. It keeps track of users, projects (collections of peak lists) and searches.

5). R. R is used by some programs for drawing graphs in the HTML results pages. It is not required for command line operation.

6). Ghostscript. Ghostscript is required by R on LINUX installations. It is generally included in most standard LINUX distributions.

7) start-stop-daemon. This software is used to manage and queue Batch-Tag submissions. On LINUX the Batch-Tag daemon uses the package start-stop-daemon. If this isn't included in your LINUX distribution it can be compiled from source code. The Batch-Tag daemon runs as a service on Windows.

8). 7z. This is required on LINUX if you want to upload 7z compressed files.

9). MSFileReader. This is required for reading Thermo RAW files. Install the x86 Standalone version.

10). SCIEX Analyst QS 2.0. This is required for reading Sciex wiff files. To date we have only been able to install this on Windows XP (32 and 64 bit) and Windows Server 2003.

Compiling Protein Prospector from Source Code

The source code is available for both the Windows or LINUX versions, but is not generally required for installation.

  • If you want to compile the code then the zlib library is necessary.

  • The MPI library is required to compile code for multi-processor Batch-Tag Searches.

  • The mySQL library is necessary for web based Batch-Tag searches.

  • You need to install make and gcc-c++ for LINUX.

  • For Windows we currently use the Microsoft Visual C++ 6 compiler. We have tried newer Visual C++ compilers but they seem to produce slower code.
License

Users need to agree to the following licensing conditions.

Copyright © 2014, The Regents of the University of California ("The Regents")
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. Neither the name of the University of California, San Francisco nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
  4. This license is intended for academic and non-profit use only. For commercial licenses, please contact the UCSF Office of Innovation, Technology & Alliances: http://ita.ucsf.edu

THIS SOFTWARE IS PROVIDED BY THE REGENTS ''AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


Please give feedback, by sending e-mail to Prospector email.