| ![]() |
TBiB Q4/2006
|
A Database of Multiple Alignments
In this exercise we write a database for storing multiple
alignments.
MotivationUp till now, when we have wanted a multiple alignment of a set of sequences, we have calculated it using ClustalW. For large sets of sequences, this can be an expensive computation, so if we have already calculated the multiple alignment for the given set of sequences, we would rather just use that alignment than re-compute the alignment. In this exercise, we construct a database of multiple alignments, that enable us to do that. Designing the DatabaseWe want the database to contain both the sequences and the multiple alignments. Since we can create the multiple alignments in different ways, depending on the tool used and the parameters given to the tool, we also want some meta-information associated with the alignments. At the least, it should be possible to look up sequences by accession numbers (and its version number), to get all multiple alignments where a given sequence appears, and to extract all sequences appearing in a multiple alignment. EXERCISE DB2X.1: Design the database for multiple alignments. I suggest to use three tables: one for the multiple alignments, one for sequences, and one for tools and options. EXERCISE DB2X.2: Populate the database with a few sequences downloaded from NCBI, and with a few alignments constructed using ClustalW. EXERCISE DB2X.3: Write a function that extracts all multiple alignments where a given sequence, identified by accession and version number, appears. EXERCISE DB2X.4: Write a function that returns a list of the accession and version numbers of the sequences appearing in a given multiple alignment. EXERCISE DB2X.5: Write a function that, given a list of accession and version numbers, extracts the sequences, calculates a multiple alignment, and inserts that multiple alignment into the database. The function should first check whether the multiple alignment already exists in the database (which means with the same sequences and the same options for clustal w). SummaryWe have build a database of multiple alignments, with a few Python functions for convenient access to the database. Save this module for later — we will use it in the final mandatory project. |