Introduction ------------ An HMM M is a probabilistic model which describes a probability distribution for the strings S over some fixed alphabet. A run in M starts in the start-state and follows a Markovian sequence of states until it reaches the end-state. Every time a non-silent state is entered a symbol is emitted according to the emission-probabilities of the state. The run generates the string which is the concatenation of all the emitted symbols. The total probability P_M(S) of HMM M generating string S can be computed using the forward algorithm. The probability of the most likely run which generates S can be computed using the Viterbi algorithm. Implementation project ---------------------- 1. Design a suitable representation of an HMM which makes it possible describe a HMM containing silent and non-silent states. 2. Implement the Viterbi- and Forward-algorithms as presented on slides 25-26. Given an HMM M (in your format) and a string S, your program should compute the probability of the most likely run in M generating S, and P_M(S), respectively. 3. Observe that the forward algorithm on slides 26 assumes that the transition structure of the HMM does not contain cycles of silent states. Design and implement a solution which can compute P_M(S) even if the transition structure of M contains cycles of silent states. Hint: Think of the set of recursions as a set of linear equations. 4. Implement the computation of the collision probability of two HMM's as presented on slide 34 under the assumption of an unrestricted transition structure, i.e. there can be cycles of silent- and non-silent states. You can probably reuse your solution to (3). Make a program which computes the L_2 norm between two HMM's based on your computation of the collision probability. Experimental project -------------------- Investigate and implement heuristics for computing the L_2 norm between HMM's based on sampling. Compare the performance of your heuristic solutions to the exact solution presented on slide 34, i.e. how well does your heuristics approximate the true L_2 distance (you are welcome to use an existing implementation of the exact algorithm on page 34, available via Christian Storm Pedersen). Extend (the best of) your heuristics to compute the L_1 norm or the Kullback-Leibler distance (see slide 39) between two HMM's. How does these distances compare to the L_2 distance? Survey project -------------- Write a survey of recent approaches and applications for computing distances HMM's, e.g. the Kullback-Leibler distance. --------------080902050608040807020805--