
LAST UPDATE: May 23, 2012 (New release: version 3.0 beta)
PPfold is a new version of pfold, written in Java 6.0. It can predict the consensus secondary structure of RNA alignments
through a stochastic context-free grammar coupled to an evolutionary model. It can also use data from chemical probing experiments
to predict RNA secondary structure
PPfold is multithreaded, and can solve the structure of
much longer alignments than pfold without breaking down (due to underflow errors). It
is also platform-independent and much more user-friendly than pfold. PPfold can be downloaded
as a standalone program as well as a free plugin to the CLC Workbenches. (Scroll down for downloads!)
NEW RELEASE (May 23, 2012): PPfold version 3.0 includes support for data from chemical probing experiments. Download the newest version below.
PPfold has been created as part of the Collaborative Minigrid Project,
in close cooperation between
If you experience any issues, or have questions or comments, email:
Zsuzsanna Sukosd (zs@mb.au.dk)
Instructions and download links |
Choose this if you just want to run the program as a standalone application
Version 3.0 beta (current release):
PPfold-v3-0.jar (expect updates over the next two weeks, please send bug reports to zs@mb.au.dk)
(Old versions below.)
Given that you have Java installed on your computer, you can execute it
- From the desktop: Simply double-click the file to start the program. (On some systems, you need to make the file executable first.)
Select the input alignment (it must be in FASTA format). You can optionally also select a phylogenetic tree (in NEWICK format), and/or additional data tracks,
eg. from chemical probing experiments.
- From the command line: type java -jar PPfold --help for information on command-line arguments and options, and see below for some examples.
Example 1. Fold a simple alignment.
Download the example input file: gca-alignment.fasta
Type (assuming the file is in the same folder as PPfold):
java -jar PPfold-v3-0.jar gca-alignment.fasta
Example 2. Fold an alignment with hard constraints.
Download the example input file: gca-alignment.fasta and the constraint file:
constraint.txt
Type (assuming the files are in the same folder as PPfold):
java -jar PPfold-v3-0.jar gca-alignment.fasta --usedata constraint.txt gca_bovine --force
In this case, PPfold will assume the constraints are for the gca_bovine sequence.
Example 3. Fold an alignment with SHAPE data:
Download the example input file:
gca-alignment.fasta and the data file:
shapeexample.txt
Type (assuming the files are in the same folder as PPfold):
java -jar PPfold-v3-0.jar gca-alignment.fasta --usedata shapeexample.txt gca_bovine --dist DEFAULT
In this case, PPfold will assume the shapeexample.txt is a data file from the SHAPE experiment for the gca_bovine sequence. If your data is from a different chemical probing experiment,
PPfold can also handle that. In that case, just type the distribution file name instead of DEFAULT.
- Chemical probing data file format:
The data should have the following columns, WITHOUT any header:
position data_value
The position must be specified as sequence (rather than alignment) position. The first position in the sequence is denoted by 1. Not all positions need to have a data value.
The column delimiter is space or tab.
- Distribution file format:
The data should have the following HEADER (exactly):
lower_bound P_density_paired P_density_unpaired
The HEADER is followed by the data. Column 1 is the lower_bound of the experimentally measured value. P_density_paired is the probability of observing an experimental value
between lower_bound in the same row and lower_bound in the next row, given that a nucleotide is paired. P_density_unpaired is the probability of observing an
experimental value between lower_bound in the same row and lower_bound in the next row, given that the nucleotide is unpaired. The spacing of the lower_bound values need not
be constant. The column delimiter is space or tab. For example, the following lines:
0 0.107 0.024
0.01 0.057 0.010
indicate that the probability of observing an experimental value between 0 and 0.01 is 0.107 if the nucleotide is paired, and 0.024 if the nucleotide is unpaired. In other words,
of all paired nucleotides (in known structures) probed with this experimental method, 10.7% will get a measurement value between 0 and 0.01, and of all unpaired nucleotides,
2.4% will get a measurement value between the same bounds. Note that the second and third columns must both sum to 1.
Other information that might be useful:
- Running out of memory:
PPfold is a memory-intensive program, so if you are trying to fold large alignments it might complain about not enough memory before you start folding things,
or actually crash with an error. To avoid this, given that your machine has enough memory, you can increase what is known as the "Java heap space" by using an extra
command-line argument, "-Xmx" followed by the amount of memory you want to allocate to PPfold (Java). So for example:
java -Xmx256m -jar PPfold-v3-0.jar gca-alignment.fasta
will run PPfold with 256 MB memory allocated.
-
The .jar file is essentially just a .zip, so you can open it in your favourite archive editor. This might be useful in case you want to replace the default distribution for
chemical probing data (dist.dat) with the distribution for your favourite method, or if you want to change the parameter file (matrices.in). For your convenience, the
parameter file used in PPfold can be downloaded here so you can see what it looks like: matrices.in
Choose this if you want to integrate it into a CLC Workbench:
PPfold is now an official (free) plugin to the CLC Workbenches. Please use CLC bio's plugin site for download:
PPfold at CLC bio
This will ensure that you always have the latest version compatible with your current installation. Instructions on how to use it can be also be found on CLC bio's website.
PPfold (source code, old versions and extras) |
Choose this if you want to have a look at the source code, need the extra programs or want to use the old versions:
- Current release: Version 3.0 (BETA).
New features include support for data from chemical probing experiments (and elsewhere), various bugfixes,
and a MUCH better graphical user interface.
- Version 2.0/2.01.
New features include maximum likelihood tree estimation, small bugfixes and a better user interface.
Note changed command-line options (more UNIX-like than version 1.0)
- Version 1.0. First release