A Database of Multiple Alignments

In this exercise we write a database for storing multiple alignments.

Motivation

Up till now, when we have wanted a multiple alignment of a set of sequences, we have calculated it using clustal w. For large sets of sequences, this can be an expensive computation, so if we have already calculated the multiple alignment for the given set of sequences, we would rather just use that alignment than re-compute the alignment.

In this exercise, we construct a database of multiple alignments, that enable us to do that.

Designing the Database

We want the database to contain both the sequences and the multiple alignments. Since we can create the multiple alignments in different ways, depending on the tool used and the parameters given to the tool, we also want some meta-information associated with the alignments.

At the least, it should be possible to look up sequences by GI numbers, to get all multiple sequences where a given sequence appears, and to extract all sequences appearing in a multiple alignment.

EXERCISE DB2X.1: Design the database for multiple alignments. I suggest to use three tables: one for the multiple alignments, one for sequences, and one for tools and options.

EXERCISE DB2X.2: Populate the database with a few sequences downloaded from NCBI, and with a few alignments constructed using clustal w.

EXERCISE DB2X.3: Write a function that extracts all multiple alignments where a given sequence, identified by GI number, appears.

EXERCISE DB2X.4: Write a function that returns a list of the GI numbers of the sequences appearing in a given multiple alignment.

EXERCISE DB2X.5: Write a function that, given a list of GI numbers, extracts the sequences, calculates a multiple alignment, and inserts that multiple alignment into the database. The function should first check whether the multiple alignment already exists in the database (which means with the same sequences and the same options for clustal w).

Summary

We have build a database of multiple alignments.

This database will be integrated with the code we have developed in the last few weeks in the third mandatory project.

Valid XHTML 1.0! Valid CSS! Time-stamp: "2003-12-01 11:26:04 mailund"