About hmmcomp ************* The hmmcomp program implements methods to compare left-right hidden Markov models, e.g. profile hidden Markov models. The methods currently implemented in hmmcomp are described in: "Metrics and similarity measures for hidden Markov models". Rune Lyngsų, Christian N. S. Pedersen and Henrik Nielsen. To appear in the Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB). Various extensions to these methods are described in: "Measures on hidden Markov models". Rune Lyngsų, Christian N. S. Pedersen and Henrik Nielsen. BRICS Technical Report RS-99-6. If you have suggestions or comments to the implementation of hmmcomp, feel free to send an e-mail to cstorm@daimi.au.dk Installing hmmcomp ****************** To compile hmmcomp you must have the GNU multiple precision arithmetic library (gmp) installed. If you don't then it is available from prep.ai.mit.edu/pub/gnu/gmp/gmp-2.0.2.tar.gz When gmp is installed you can build hmmcomp simply by running make in the directory containing the source code to hmmcomp. This should produce the executable hmmcomp. Copy this to where you keep executables and you're ready to compare hidden Markov models. Using hmmcomp ************* The compiled program hmmcomp is used as hmmcomp [-pv] file1 file2 which compares the two HMMs specified in file1 and file2. The format of these files are described below. The option '-p' determines if a description of the HMMs specified in file1 and file2 is printed. The option '-v' determines how the result of the comparison is presented. Using the '-v' gives: $ hmmcomp -v hmm1.def hmm2.def Comparing hmm1.def and hmm2.def Co-emmision prob : 0.30375e0 Similarity1 : 0.8761892655710165528e0 Similarity2 : 0.8756756756756756757e0 Distance1 : 0.5028984936217206769e0 Distance2 : 0.2936835031117682647e0 While omitting it gives: $ hmmcomp hmm1.def hmm2.def 0.30375e0 0.8761892655710165528e0 0.8756756756756756757e0 0.5028984936217206769e0 0.2936835031117682647e0 If we use A(hmm1,hmm2) to denote the co-emission probability of the two HMMs specified in file1 and file2 respectively, then the five results of running hmmcomp corresponds to Co-emission prob : A(hmm1,hmm2) Similarity1 : A(hmm1,hmm2)/sqrt(A(hmm1,hmm1)*A(hmm2,hmm2)) Similarity2 : 2*A(hmm1,hmm2)/(A(hmm1,hmm1)+A(hmm2,hmm2)) Distance1 : acos(Similarity1)) Distance22 : sqrt(A(hmm1,hmm1)+A(hmm2,hmm2)-2*A(hmm1,hmm2)) The theory behind of motivation of these measures are described in the papers referenced above. Description of a HMM ******************** In hmmcomp a HMM is viewed as a directed acyclic graph where nodes corresponding to insert- and match-states (nodes of type GEN) are allowed to have self-loops. This description coincides with the class of left-right hidden Markov models, e.g. profile hidden Markov models. The format used to describe HMMs is intentionally kept simple to make it easy to convert from existing formats. The current implementation of hmmcomp does not check if an input file has the proper format. We present the input format by an example. Consider the following simple HMM over the alphabet acgt with four states and a self-loop. _ / \ \ / (0)----->(1)----->(3) | ^ | | `------>(2)-------' This HMM is described as shown below. Empty lines and lines beginning with '#' are ignored. # # Description of simple HMM with four states # # Alphabet acgt # Size of the model (number of states) 4 # State 0 # # Type (DEL=0 or GEN=1) and degree (number of outgoing transitions) 0 2 # Target state and probability of each transition from this state 1 0.5 2 0.5 # State 1 # # type (DEL=0 or GEN=1) and degree (number of outgoing transitions) 1 2 # Target state and probability of each transition from this state 1 0.25 3 0.75 # Emission probs for each symbol in the alphabet (only for GEN states) 0.1 0.1 0.1 0.7 # State 2 # # Type (DEL=0 or GEN=1) and degree (number of outgoing transitions) 1 1 # Target state and probability of each transition from this state 3 1 # Emission probs for each symbol in the alphabet (only for GEN states) 0.8 0.1 0.05 0.05 # State 3 # # Type (DEL=0 or GEN=1) and degree (number of outgoing transitions) 0 0