We want to combine the code we have written in the last several weeks exercises into an integrated web-service. This includes the Clustal W wrapper for creating multiple alignments, the NCBI searching, and the multiple alignment database.
First, we focus on providing sequences to our service.
We want two ways of inputting sequences to our service: directly (by providing the sequence) or by GI number. In both cases, the sequence should be inserted into the underlying database for later use.
EXERCISE PROJECT2.1: Write a function that, given a sequence or GI number, if the sequence is not already in the database, inserts it. This includes downloading it from NCBI if it is given as a GI number. Decide how to handle sequences without a GI number; how do you give them an identifier? how do you check whether they are already in the database? how do you later on extract them from the database?
EXERCISE PROJECT2.2: Write a function that, given a sequence identifier in the framework you decided on above, extracts the sequence from the database.
We now turn to building multiple alignments from sequences in the database.
As in the multiple alignment database exercises, we want an association between the multiple alignment and the sequences appearing in it.
EXERCISE PROJECT2.3: Write a function that, given a list of sequence identifiers, either extracts a multiple alignment of the sequences (if one exists) or creates the alignment, inserts it into the database, and return it. This is a variation of exercise DB2X.5; the difference is that we now use the new variant of sequence identifiers rather than GI numbers.
EXERCISE PROJECT2.4: Re-do exercises DB2X.3 and DB2X.4 with the new identifiers.
We now only need to write a user interface to the functionality above. This will, naturally, be in the form of CGI scripts.
EXERCISE PROJECT2.5: Write a web-page plus CGI script for populating the database with sequences. This should use the function from PROJECT2.1.
EXERCISE PROJECT2.6: Write a web-page plus CGI script for displaying multiple alignments. The sequences used can be provided as your database ids, GI numbers, or explicitly. All sequences that are not already in the database should be inserted and the alignment, if it is not already in the database, should be generated and displayed on a web-page.
EXERCISE PROJECT2.7: Update the alignment web-page above such that it now contains links to (scripts the extracts) the individual sequences. That is, from the multiple alignment page, it should be possible to click your way to the individual sequences.
EXERCISE PROJECT2.8: Update the sequence-pages from above so they contain links to pages for all the alignments the sequence appears in. That is, the sequence page should contain a link for each alignment the sequence appears in, and the link should call a script that generates the specified multiple alignment page.
To complete the project, all you have to do now is to write a report briefly describing the design of the web-service, and giving a short users-introduction to the service.
The report should explain the main design-choices in the code, but not the low-level details. Instead, put the code somewhere where we can find it, and include the path to the code in the report.
We have combined a number of previously written modules to build a web-service for multiple alignments. The web-service provides a CGI interface to alignment building, sequence download, and alignment visualisation.
This concludes the third mandatory project.