A python module for parsing trees in the Newick file format.
The module contains a python parser for the Newick tree format. It reads in trees like:
(
('Chimp' 1 : 0.052,
'Human' 1 : 0.042) 0.71 : 0.007,
'Gorilla' 1 : 0.060,
('Gibbon' 1 : 0.124,
'Orangutan' 1 : 0.0971) 1 : 0.038
);
where parenthesis denotes the sub-trees and the edges are annotated "bootstrap-value : length", where both bootstrap value and length is optional.
Leaf-labels (identifiers) must be quoted (using '...') if they contain spaces, but need not be otherwise. Empty labels can be specified either as '' or by simply leaving out the identifier, as in "(,(,,),);".
The simplest usage is to load the parser from newick.tree and read the tree from a string:
from newick import parse_tree
print parse_tree("(A,B);")
A tree parsed this way can then be traversed and manipulated.
By using "handlers" it is possible to extract information from the tree—through call-backs—without first building the entire tree. The following example uses this to calculate the total branch sum:
import newick
import sys
class BranchLengthSum(newick.AbstractHandler):
def __init__(self):
self.sum = 0.0
def new_edge(self,b,l):
if l:
self.sum += l
def get_result(self):
return self.sum
print newick.parse(sys.stdin.read(),BranchLengthSum())
Download the source code and unpack it (tar xzf newick-a.b.c.tar.gz, where a.b.c is the version number of newick). Then run the setup script:
python setup.py install
See python setup.py --help for more details.
Thomas Mailund, <mailund@birc.au.dk>, Bioinformatics Research Center, University of Aarhus.
Time-stamp: "2006-01-26 21:59:50 mailund"