Tool-Building in Bioinformatics

TBiB Q4/2006

BiRC / Courses / TBiB / Lecture Notes / CGI Interface to Clustal W

Exercise: CGI Interface to Clustal W

In this exercise, we write a CGI-script interface to the Clustal W module we wrote in the exercises in week 14.

Motivation

The module wrapping Clustal W, that we wrote earlier, provides a nice interface between scripts and Clustal W, but not necessarily a nicer user interface.

In this weeks exercise we write a CGI-based user interface for our module. The interface will let the user upload a file containing the sequences to align, and then display the alignment.

The exercise this week is somewhat shorter than the exercises for the last two weeks. This is to let you catch up with the exercises, if you are behind, before taking on the second mandatory project, where you will be needing the code you wrote for this and the previous two weeks. Make good use of the extra time you have this week!

The Basic Interface

At the very minimum, a CGI interface to Clustal W should let the user upload sequences and get shown a multiple alignment. The easiest way of uploading sequences is through a text area, so:

EXERCISE CGI-CW.1: Write a CGI script that lets the user provide a list of sequences in a text-area input form, and then displays the multiple alignment of the sequences as calculated by Clustal W.

For displaying the multiple alignment, you can simply print it between <pre>-</pre> tags, for instance as this:

<pre>
foobar1         AGGTTGTATACTATC
foobar3         AGGTTGT--ACTATC
foobar2         AGGTTGTTTACTATC
foobar4         CGGTTGT--ACTATC
                 ******  ******  </pre> 

which is displayed as this:

foobar1         AGGTTGTATACTATC
foobar3         AGGTTGT--ACTATC
foobar2         AGGTTGTTTACTATC
foobar4         CGGTTGT--ACTATC
                 ******  ******  

If the user has the sequences in a file, it should not be necessary for him to copy and paste the sequences from the file; he should be able to provide the file directly.

EXERCISE CGI-CW.2: Extend the CGI-script so the user can provide the sequences from a file.

While We're Waiting...

Generating the multiple alignment can take some time, and while this goes on, the user will not know whether anything is happening or not. We should show him a page, telling him that the alignment will arrive shortly.

What we want is to immediately return a HTML page, saying that we are working on the problem, and then start a background process that generates the "real" HTML page. When the real page has been generated, we want to redirect the user to that page.

The redirecting can be done using the "Refresh" HTTP-header. With it, you specify a number of seconds to wait, and then an URL to goto afterwards. Thus, we can show the temporary page using code like this:

print 'Refresh: 5; URL = url-to-redirect-to'
print "Content-type: text/html"
print
print """
<html>
  <title>Just a sec...</title>
  <body>
    <h1>Generating the page, please wait</h1>
  </body>
</html>"""  

After printing the temporary page, we need to do the real work. We need to do this in the background, and after we close the stdout of the script (otherwise the temporary page will not be displayed until after the background job is done).

You start a job as a background job--whether another script or a "real" program--by suffixing the command with an ampersand & You close the stdout of the background process by redirecting as >/dev/null 2>&1.

Thus, to start a background process and terminate the CGI-script, you need code like this:

import os
os.system("(the real command) >/dev/null 2>&1 &") 

A script (waiting.py) waiting for a command to generate a HTML page, can, in its complete form, look like this:

#!/usr/local/bin/python

import cgi
form = cgi.FieldStorage()
if form.has_key("realpage"):
    fname = form.getfirst("realpage")

    try:
        # if the page is ready...
        f = open(fname)

        # show the real page
        print "Content-type: text/html"
        print
        print f.read()
        f.close()

        # and remove temporary file
        import os, sys
        os.unlink(fname)
        sys.exit(0) # all done

    except IOError:
        # if the page isn't ready yet
        pass

else:
    # realpage wasn't provided, assume first call so start process
    import os
    fname = os.tempnam("/tmp")

    os.system("(background-process > "+str(fname)+") >/dev/null 2>&1 &")


#  print "waiting" page
print 'Refresh: 5; URL = http://domain/waiting.py?realpage='+str(fname)
print "Content-type: text/html"
print
print """
 <html>
   <title>Just a sec...</title>
   <body>
     <h1>Generating the page, please wait</h1>
   </body>
 </html>"""  

The script checks whether the name for the real page exists (which it will if the background process has been started), and if so, checks whether it can be read. If it can, it reads the page, prints it to the web-client, and remove it from the server file system. If not, it waits some more.

If the real page has not got a name, we create one (using os.tempnam--see the on-line help), starts the background process that will write its output, as an HTML page, into the new real page name, and tells the web-client to wait for it.

EXERCISE CGI-CW.3: Modify your Clustal W script to use this interaction pattern.

Extensions to the Interface

Just for fun, we will extend the interface a bit. (The exercises in this section are not that much more complicated than the exercises in the previous section, but since they are just for fun, they are all bonus exercises).

Colouring the Alignment

EXERCISE CGI-CW.4: Above, we printed the generated alignment using plain text in <pre>-</pre> tags. Wouldn't it look better if we highlighted matches, mismatches, and gaps using colours?

We could write alignment shown above like this:

foobar1         AGGTTGTATACTATC
foobar3         AGGTTGT--ACTATC
foobar2         AGGTTGTTTACTATC
foobar4         CGGTTGT--ACTATC  

which in HTML looks like this:

foobar1         <b style="color:blue">A</b><b style="color:green">GGTTGT</b><b style="color:red">AT</b><b style="color:green">ACTATC</b>
foobar3         <b style="color:blue">A</b><b style="color:green">GGTTGT</b><b style="color:red">--</b><b style="color:green">ACTATC</b>
foobar2         <b style="color:blue">A</b><b style="color:green">GGTTGT</b><b style="color:red">TT</b><b style="color:green">ACTATC</b>
foobar4         <b style="color:blue">C</b><b style="color:green">GGTTGT</b><b style="color:red">--</b><b style="color:blue">ACTATC</b>  

EXERCISE CGI-CW.5*: Add this formatting of the output to your script.

Multiple Sequence Submissions

We have written the script such that the input is given as a list of sequences, either in a text area or in a file. We could also allow the user to provide sequences one at a time, and then keep track of the list ourself until he submit the alignment calculation.

One way of doing this is to keep track of the state of the session in a temporary file on the server-side and store a session id in a hidden form in the generated HTML pages.

EXERCISE CGI-CW.6*: Enhance the script in this way.

Summary

We have written a CGI-script interface to our Clustal W module. This provides a better user interface for creating multiple alignments than the script module.

We will now combine the NCBI searching module from last weeks exercise with the Clustal W module and interface in the second mandatory project.

Time-stamp: "2003-11-28 14:44:38 mailund"