Tool-Building in Bioinformatics

TBiB Q4/2006

BiRC / Courses / TBiB / Lecture Notes / CGI Scripts

CGI Scripts

In this lecture we cover CGI (Common Gateway Interface) scripts. We learn

  • How to create dynamic web content.
  • How to provide input to our scripts through HTML forms.

Supplementary reading: Python--how to program, Chap. 6, pages 194-219, the cgi module manual, and the W3C Forms Manual.

Motivation

Command-line interfaces are appropriate for small programs with simple option-sets, but for programs with more complex interaction, graphical user interfaces (GUIs) are often superior.

One way of providing a program with a graphical user interface is through web-technology: output from the program is provided as HTML and can be displayed in a web-browser, and input to the program can be given by filling out HTML forms.

Advantages of web-based user interfaces include:

Familiarity.
The web-based interface is familiar to anyone familiar with web-browsing, which is practically anyone working with a computer.
Remote Access.
An application with a web-interface will just as easily service a remote as a local user--provided the web-server permits this.
Sequential Execution.
With a web-based interface, the programmer can, to a high degree, control the order of events in a user interaction. Where a typical (WIMP) GUI lets the user switch between several unrelated interactions--leading to an explosion in possible interaction scenarios--the web-based interface can guide the user, step by step.

Disadvantages include:

Non-Responsively.
Interaction with CGI-scripts goes through web clients and servers, and is much slower than interaction with a locally-running program. This makes CGI-scripts inappropriate for e.g. real-time monitoring or arcade-like video games.
Sequential Execution.
The sequential execution is both an advantage and a disadvantage. For the programmer, it is always nice to control the interaction, but for the user, more freedom can be preferred. It depends, very much, on the application: in some application, being able to switch between tasks, performing two or more jobs at a time, is preferable, in other applications the simpler sequential interaction is preferable.

Most of the applications we will use in bioinformatics (excluding interactive visualisation programs), are provided with input, perform a lengthy computation or search, and the provide output. For this kind of applications, the benefits of web-based interfaces out-weight the disadvantages.

Running CGI-Scripts

In accordance with ancient traditions, the first CGI program we write will be "hello, world". This program is shown below. It consists of three parts: the "pound-bang-path" comment, the header, and the body.

#!/usr/local/bin/python

# header
print "Content-type: text/plain"
print

# body
print "Hello, World."  

The pound-bang-path (so called because it consists of a pound (#) and a bang (!) followed by a path) specifies the interpreter to be used to execute the script. If you have used python script-name to execute your scripts up to this point, you haven't seen this construction yet. It is used to tell the operating system to execute the script as a python script; if the execution bit is set for the program (see below) you can execute the script using just script-name--if the script is in your $PATH. When we access a CGI-script, the web-server needs to know how to execute the script, thus we need the path.

Warning!
It is a serious error to forget the header! Your CGI-scripts must start by printing a header specifying content-type and terminating with a newline. If you forget this, the web-server will give an error (but probably not a meaningful one).

The header contains information to the web-server and web-client. In the hello-world script, the header tells that the body of the script is plain text. Knowing this, the web-client knows how to display the body. Normally, you will specify the content-type as either text/plain, as above, or as text/html, when the body is HTML and should be displayed as such.

The header is terminated by a blank line (the second print statement above).

The body is the remainder of the program. The output produced by the body is sent to the web-client to be displayed.

Setting Up CGI-Scripts on BiRC/DAIMI

To run CGI-scripts at DAIMI, you must put them in the directory ~/cgi-www/cgi-bin/ and make them executable with the command chmod 755 scriptname.py.

You can then run the script by pointing your browser at http://cgi-www.daimi.au.dk/cgi-yourusername/yourscript.py, where yourusername is your DAIMI user name and yourscript.py is the name of the file containing your script.

EXERCISE CGI.1: Run the hello-world script.

EXERCISE CGI.2: Change the hello-world script so it outputs HTML instead of plain text. What happens if you output HTML, but the header says you output plain text?

There is a few things you must be aware of, when running CGI-scripts. The scripts will not be run with your user-id; they will be run as the gopher user (or some other, site-specific, user at other servers). This means that your scripts can only access files accessible to that user. When you read or write files, this is important to remember.

Furthermore, the scripts will not be run on your local machine, but on the web-server cgi-www. (An alias for cedi). All of your scripts will run on this machine. If one of your scripts exhaust the resources (memory, CPU, disk-space), all your scripts will suffer. So be careful when you write your scripts!

Debugging CGI-Scripts

Before we go on with writing CGI-scripts, a few words about debugging should be said.

You need to test your scripts through the web-client/-server setup; you can often do an initial test of the scripts by running them on the command-line and see what they output, but the environment of the web-server setup and your command-line setup is different, so need to do some testing in the web-server setup.

However, testing a script through the web-server is complicated by the way the server reacts to errors. Often, you will only be told that something weird has happened, but not what.

To get more meaningful error messages, you can use the module cgitb. You do this by importing the module and calling the enable() function.

import cgitb; cgitb.enable() 

When cgitb is enabled, and an error occurs, you will get a web-page explaining the error.

EXERCISE CGI.3: Create a script that uses cgitb, and put in a deliberate error. See what happens when you run it.

You should still test the program at the command line first, whenever possible. The cgitb module will not catch all errors--if the script fails to compile when called, you will get an error and cgitb will not provide an error trace.

Dynamic Content

The hello world script is not particularly interesting. It produces the same output each time it is run, so the same effect could be achieved using a static web page. The reason we are interested in CGI-scripts is that they can provide dynamic content.

Remember that the page displayed when calling a CGI-script is whatever the script outputs. If the script provides different output when called, we have dynamic content on the page.

As a toy-example, we can print the time of day in our script.

#!/usr/local/bin/python

# for debugging purposes
import cgitb; cgitb.enable()

# header
print "Content-type: text/html"
print

# body
import time
print '''
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC
  "-//W3C//DTD XHTML 1.0 Strict//EN"
  "DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>The Time of Day Page</title>
  </head>
  <body>
    <h1>The Time of Day Page</h1>
    <p>
      The time of day: %s
    </p>
  </body>
</html>  
''' % time.ctime(time.time())

The body of this CGI-script is XHTML with a paragraph containing the current time as given by time.time(). The time is inserted in the string printed to stdout using the string format operator %.

Using Templates

In the time-of-day script above, we construct the output by inserting the dynamic content of the page into a static template of the page. This is a nice way of constructing the output, because it separates the control-flow of the script from the output that it constructs. The document produced can easily be seen in the script; it is not intermixed with the python code.

We can take this idea a bit further. We can put the templates and the scripts in different files, such that the templates are read by the script when the script is run. This way the template and script can be easily edited independently, and a web-page author, for instance, can write the HTML pages without worrying about the scripts.

As an example, if the following HTML is saved in template.html:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC
  "-//W3C//DTD XHTML 1.0 Strict//EN"
  "DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>%s</title>
  </head>
  <body>
    <h1>%s</h1>
    %s
  </body>
</html> 

then this script creates a page by supplying title, header, and body of the template:

#!/usr/local/bin/python

# for debugging purposes
import cgitb; cgitb.enable()

# header
print "Content-type: text/html"
print

# body
f = open('template.html')
doc = f.read()
f.close()

print doc % ('Hello, World', 'Hello, World', '<p>Howdy!<p>')  

It is still not a perfect solution, though. The script is very dependent on the HTML-template; if the order of "dynamic-content-slots" in the template changes, the arguments to % must change.

Warning!
When using templates, and when reading files in general, make sure that the file read has the right permissions. Remember that the script is not run using your user-id, so if you can read the file it does not imply that the script can. And never make the file generally writable!

A better solution uses string-replacements. The template can contain special markup-comments that can be overridden by the script. For instance, we can rewrite the example above as this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC
  "-//W3C//DTD XHTML 1.0 Strict//EN"
  "DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title><!--TITLE--></title>
  </head>
  <body>
    <h1><!--TITLE--></h1>
    <!--BODY-->
  </body>
</html>  
#!/usr/local/bin/python

# for debugging purposes
import cgitb; cgitb.enable()

# header
print "Content-type: text/html"
print

# body
f = open('template.html')
doc = f.read()
f.close()

print doc.replace('<!--TITLE-->','Hello, World') \
         .replace('<!--BODY-->','<p>Howdy!<p>') 

This example also exploits that with the string replace approach, we can let the template decide that the same dynamic content should be shown several places in the document.

EXERCISE CGI.4: Re-write the time-of-day script to use a template.

It is not always possible to rely completely on templates. When constructing tables and lists, for instance, a list of HTML must be constructed, and there is no obvious way of doing that using a template file.

#!/usr/local/bin/python

# for debugging purposes
import cgitb; cgitb.enable()

# header
print "Content-type: text/html"
print

# body
f = open('template.html')
doc = f.read()
f.close()

list_body = []
for i in xrange(10):
    list_body.append("<li>%d</li>" % i)

import string
list_html = '''
<ul>
  %s
</ul> ''' % string.join(list_body)

print doc.replace('<!--TITLE-->','List Example') \
         .replace('<!--BODY-->',list_html) 

Whenever possible, though, you should use the template design.

Escaping Output

When you output HTML you should be careful escape special characters, such as <, >, and & that should be escaped as &lt;, &gt;, and &amp; respectively, when they are not part of the mark-up. Consider:

#!/usr/local/bin/python

# for debugging purposes
import cgitb; cgitb.enable()

# header
print "Content-type: text/html"
print

# body
f = open('template.html')
doc = f.read()
f.close()

body = '''
  <p>%s</p>
  <p>%s</p>
''' % ("&lt;b&gt;escaped&lt;b&gt;","<b>not escaped</b>")

print doc.replace('<!--TITLE-->','Escape example') \
         .replace('<!--BODY-->',body) 

EXERCISE CGI.5: Run the script above and see what happens.

EXERCISE CGI.6: Find the special characters in the script-listing above, and try to write it as escaped HTML. Compare with the source of this page.

The function escape from the cgi module can be used to escape strings in this way for you.

Calling External Programs

If your script needs to call external programs, as we discussed a few lectures ago, there is an extra complication: If the program called generates output, where will it end up?

If you call a program using the os.system method, and the program generates output, that output will be written to your CGI-scripts stdout. This means that it will either be merged with your html or, worse, appear before your HTTP header. In the later case, the web-server will report an error (and give little indication of what went wrong).

Two ways to avoid this is to: use os.popen (or one of its relatives) instead of os.system, or redirect the output of the program in system to a file or to /dev/null: os.system("ls > result.txt") or os.system("foo > /dev/null").

Input to CGI-Scripts

With dynamic content we can provide news-services, monitoring (although not real-time), and such. It is not enough for the kinds of applications we want to build, however; for those we need to provide user input to our scripts.

There is essentially two ways to provide input to a CGI-script: Through a query string or through HTML Forms.

Query Strings

In the exercises in week 16 we saw URLs on the form http://www.domain.gov/cgi-script?key-value-pairs, e.g. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=search&term=SARS.

We didn't really know what was going on at that point, but we knew that the key-value pairs were passed as input to the script. We are now ready to learn how.

The string after the question-mark in the URL is put in the environment variable QUERY_STRING when the CGI-script is executed. The script can get to it using the environ dictionary in the os module:

os.environ["QUERY_STRING"] 

From this string we can extract the key-value pairs. The cgi module provides two functions for doing that, so we do not have to parse the string manually.

The two functions for parsing the query string are cgi.parse_qs and cgi.parse_qsl. They only differ in the way the provide the parsed data: parse_qs returns a dictionary where the keys are the keys in the query string and the values are lists of the values given for that key; parse_qsl returns a list of key-value pairs, where a key that appears more than once in the query string results in several pairs.

To see the difference, experiment with this script:

#!/usr/local/bin/python

# for debugging purposes
import cgitb; cgitb.enable()

# header
print "Content-type: text/plain"
print

# body
import cgi, os

print cgi.parse_qs(os.environ["QUERY_STRING"])
print cgi.parse_qsl(os.environ["QUERY_STRING"])

EXERCISE CGI.7: Run the script above with a query string. Try with a query string such as foo=bar&foo=baz where the same key appears more than once. What happens with the output?

This script, qstring.py, shows how you can use the query string to create dynamic HTML.

HTML Forms

In some cases, the query string is the way to provide input to CGI-script, but in, by far, the most cases, we want to interact with the script through web-pages, not through URLs.

The way to achieve this is through HTML Forms; special markup elements that lets you provide the input to CGI-scripts.

Forms are on the form:

  <form action="script" method="post or get">
    <!-- controls -->
  </form> 

where the action parameter ("script") is the CGI-script to call when "submitting" the form and the method parameter ("post or get") is either "post" or "get" depending on how the input should be passed to the CGI-script.

The form tags contains a set of "controls" or "input markups" that specify how to provide the actual input to the script--we will return to them shortly. When a form is submitted, the input provided through the markups between the form start and end tag is provided, as a whole, to the script specified in the action attribute of the start tag.

The input is sent to the script when a special control, the submit button, is clicked, or when you press Enter in a text form or such. The way the input is sent depends on the method attribute of the form tag.

Submission Method

There are two methods for submitting data: get and post. The main difference between the two is that get provides the input through the URL, similarly to the query string we saw above, while post provides the input through stdin of the script. When there is many input controls, or if the input contains long strings or is sensitive, it is therefore better to use post than get.

Another difference is that the web-client is likely to cache get submissions but not post submissions. If you reload a page created with a get submission, you will get the old value, with a post submission the server will be contacted again. If you write a script where the output can differ between calls with the same input, you must also use a post submission.

To illustrate the difference between post and get, consider the following script, working on template form.html:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC
  "-//W3C//DTD XHTML 1.0 Strict//EN"
  "DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Form Example</title>
  </head>
  <body>
    <h1>Form Example</h1>

    <h2>Submitted data:</h2>
    <!-- PREVIOUS SUBMISSION -->

    <!-- get form -->
    <h2>A "get" form</h2>
    <form action="form.py" method="get">
      Input some text: <input type="text" name="data1"/>
      And some more: <input type="text" name="data2"/>
      <button type="submit">Submit</button>
    </form>

    <!-- post form -->
    <h2>A "post" form</h2>
    <form action="form.py" method="post">
      Input some text: <input type="text" name="data1"/>
      And some more: <input type="text" name="data2"/>
      <button type="submit">Submit</button>
    </form>

    <!-- input-independent dynamic content -->
    <!-- TIME -->

  </body>
</html> 
#!/usr/local/bin/python

# for debugging purposes
import cgitb; cgitb.enable()

# header
print "Content-type: text/html"
print

# body
f = open('form.html')
doc = f.read()
f.close()

import cgi, os, sys, time
submission_data = '''
  <br/><b>Read from stdin:</b> %s
  <br/><b>os.environ["QUERY_STRING"]:</b> %s
''' % (sys.stdin.read(), os.environ["QUERY_STRING"])

print doc.replace('<!-- PREVIOUS SUBMISSION -->', submission_data) \
         .replace('<!-- TIME -->', time.ctime(time.time()))  

The script generates an HTML page with two forms, one with the get method and one with the post method. When you submit data (by adding it to one of the text areas and pressing enter, or clicking on one of the submit buttons), the page will be updated with information about the QUERY_STRING environment variable and the text on stdin.

At the bottom of the page there is a time-stamp that is updated independent of the input data. On some browsers it will not be updated when you refresh a get-submission. (On others it will).

EXERCISE CGI.8: Run this script and see what happens. Try to refresh the page after a get submission and after a post submission.

Whether the input is given as a get or a post submission, you can let the cgi module parse it, using either method cgi.parse() or class cgi.FieldStorage.

The cgi.parse() method returns a dictionary like the one returned by cgi.parse_qs(); cgi.FieldStorage wraps the form in a dictionary-like interface (you can sub-script it similar to a dictionary), but with a lot more functionality. For most of our purposes, it doesn't matter which we use, but you might as well get used to the more general interface.

The script and template below illustrates the use of the FieldStorage class.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC
  "-//W3C//DTD XHTML 1.0 Strict//EN"
  "DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>FieldStorage Example</title>
  </head>
  <body>
    <h1>FieldStorage Example</h1>

    <!-- post form -->
    <h2>A Form</h2>
    <form action="fieldstorage.py" method="post">
      Input some text: <input type="text" name="data1"/>
      And some more: <input type="text" name="data2"/>
      <button type="submit">Submit</button>
    </form>

    <h2>Submitted data:</h2>
    <!-- PREVIOUS SUBMISSION -->

  </body>
</html> 
#!/usr/local/bin/python

# for debugging purposes
import cgitb; cgitb.enable()

# header
print "Content-type: text/html"
print

# body
f = open('fieldstorage.html')
doc = f.read()
f.close()

import cgi, string
form = cgi.FieldStorage()

key_val_list = []
for key in form.keys():
    key_val_list.append('<tr><td>%s:</td><td>%s</td></tr>' % \
                        (cgi.escape(key), cgi.escape(str(form[key].value))))

table = '''
<table>
  <tr><th>Key</th><th>Value</th></tr>
 %s
</table>
''' % string.join(key_val_list)

print doc.replace('<!-- PREVIOUS SUBMISSION -->', table) 

The script uses the keys() method of the FieldStorage class to get the data fields that were submitted (all the keys with a non-empty list if you used cgi.parse()). It then uses the sub-script operation, form[key], to get the representation of the submitted input, and get the value from that.

The value we get from the expression form[key].value is either a representation of a single value--if only a single value was submitted with the given key--or a list of such representations.

Since the type of the value depends on the input, you will usually need to check the time before you use it. In the example above it is not necessary because we translate the value into a string using pythons str() function which will work on both lists and objects, but often you will have to distinguish between objects and lists of objects.

To get to the value in a type-safe (or at least type-consistent) way, you can use the functions getfirst(name) and getlist(name). The first function always returns an object, the second always a list, that is

  form.getfirst(key)

will return an object, while

  form.getlist(key)

will return a list (which might contain a single object and might contain more).

EXERCISE CGI.9: Modify the fieldstorage.py script above to use getfirst and getlist.

Controls

The controls are the markups for letting the user input data. We have already seen two examples above, an input-control and a button-control.

<input type="text" name="data"/>
<button type="submit">Submit</button> 

The input control can be used for a number of different kinds of data-entries: text, password, checkbox, radio, submit, reset, file, hidden, image, and button. The button control is a specialised input control that provides more markup options than an input-control with type button. A button-control with type "submit" us similar to an input-control with type submit.

There are a number of other controls, but we will only consider one other here, the textarea. The input control will do for most of your uses, and you can look at the form manual yourself if you need more than what it provides.

You have already seen how an input-control of type text works; it lets the user type in text and the CGI-script will get that text as the value associated with the key given as the name attribute of the input tag. If you want to provide a default string, you can specify it using the value-attribute of the input tag:

<input type="text" name="textdata" value="hello"/>
if form.has_key("textdata"):
    text_data = form.getfirst("textdata") 
else:
    text_data = '' # empty-string if data not specified. 

The password type works exactly like the text type, except that the text shown on the web-page is hidden in the text field. It is only on the web-page, however, the text is still transmitted as raw text, so it is not exactly a secure password prompt.

EXERCISE CGI.10: Write a CGI-script that prompts a user for a user name and a password and reply with printing the user name and password. (Please ignore how stupid this is, security-wise).

A textarea control works like an input control of type text, except that it permits input of more than a single line. In addition to the name attribute you must provide a rows and a cols attribute for the number of rows and columns the text area should span:

<textarea name="textdata" rows="20" cols="80">Default text</textarea> 

EXERCISE CGI.11: Write a CGI-script that prompts the user for a FASTA-sequence, to be entered in a text area. When submitted, show the FASTA-sequence with the header in boldface (<b>header</b>). Remember that the `>' in the header must be escaped.

A checkbox is a button you can be either "checked" or not checked, and can be used for setting boolean flags. They are created using an input control with type checkbox. If, by default, the button should be checked, you specify this by setting the attribute checked to "checked".

<label for="ch1">Not checked by default:</label>
<input id="ch1" type="checkbox" name="ch1"/>
<br/>
<label for="ch2">Checked by default:</label>
<input id="ch2" type="checkbox" name="ch2" checked="checked"/> 
if form.has_key("ch1"): ch1_text = 'true'
else:                   ch1_text = 'false'
if form.has_key("ch2"): ch2_text = 'true'
else:                   ch2_text = 'false' 

The HTML example above also illustrates the use of label-tags. These can be used to connect a string in the HTML with an input control; the for attribute of the label must match the id attribute of the input control. In the example above, the use of labels makes it possible to toggle the checkbox by clicking the text.

EXERCISE CGI.12: Write a script where you use a label, as above, and a script where you don't. Check the difference by clicking the string next to the checkbox.

You can also use checkboxes to provide several options to a single customisable value. If two or more checked checkboxes share the same name, the Fieldstorage will contain a list under that name, with an element for each checked box. By default, the value for a checked checkbox will be the string "on", but you can change that using the value attribute:

<label for="ch3_1">Option 1:</label>
<input id="ch3_1" type="checkbox" name="ch3" value="option1"/>
<br/>
<label for="ch3_2">Option 2:</label>
<input id="ch3_2" type="checkbox" name="ch3" value="option2"/>
<br/>
<label for="ch3_3">Option 3:</label>
<input id="ch3_3" type="checkbox" name="ch3" value="option3"/>
  
if form.has_key("ch3"): ch3_text = form.getlist('ch3')
else:                   ch3_text = [] 

EXERCISE CGI.13: Write a script that lets the user check a number of checkboxes. When submitted, the script should display the names and values of the boxes clicked. Use both checkboxes with unique names and with a shared name.

The radio-button input type works like the checkbox type, except that only one input control with a given name can be checked at a time. That is, with the HTML:

<label for="ra3_1">Option 1:</label>
<input id="ra3_1" type="radio" name="ra3" value="option1"/>
<br/>
<label for="ra3_2">Option 2:</label>
<input id="ra3_2" type="radio" name="ra3" value="option2"/>
<br/>
<label for="ra3_3">Option 3:</label>
<input id="ra3_3" type="radio" name="ra3" value="option3"/> 

only one of the options can be given for ra3.

EXERCISE CGI.14: Write a script that, like in exercise CGI.13, lets the user click a number of input controls, this time radio buttons, and then shows the user which choices he submitted.

The submit and reset input types are for submitting the form (similar to the button we saw earlier) and for resetting the input controls to their original configuration, respectively. You can either use the input control version or the button control version:

<input type="submit"/>
<input type="reset"/> 
<button type="submit">Submit</button>
<button type="reset">Reset</button> 

The image input type is just a variant of the submit button, where you can use an image for the button. Alternatively, you can just put an image inside a submit button, if you use a button control

<input type="image" src="some-image.gif"/> 
<button type="submit"><img src="some-image.gif"/></button>

Using the input control type button you can create general push-buttons. These are used together with client-side scripts, and since we have not seen any such, nor will we in this course, we will not consider general push-buttons here.

A hidden input control can be used store key-value pairs in the form, that are not modifiable by the user. For instance, with the input control:

<input type="hidden" name="key" value="value"/> 

the FieldStorage will associate "value" with "key".

Hidden input controls can be used as a simple way of keeping track of state information between submissions; they can, for example, contain a session id or similar. This is simple, but not particularly secure, and other, more complicated, techniques exist.

EXERCISE CGI.15: Write a CGI-script that keeps track of how often a user have pressed the submission button, by storing this number in a hidden input control.

The final input control type we consider is the "file" type. This type lets us select a file on the client side, and provide it to the CGI-script.

You add a file input control the way you would expect:

Upload file: <input name="file" type="file"/> 

and you can then get the value as before:

if form.has_key('file'): file_string = form.getfirst('file')
else:                    file_string = '...nope...' 

There is a catch, however. If you do as above, you will only get the file name, not the file content. Since the web-sever is unlikely to have access to the web-client's file system, this is of little use.

The way to fix this is to tell the form that it should bundle the file with the data send to the web-server. This is done by setting the form-tag attribute enctype to "multipart/form-data".

<form action="fieldstorage.py" method="post" enctype="multipart/form-data"> 

Don't worry too much about it (or read the documentation if you absolutely need to know)--it's just a bit of magic you must do to get the file.

As mentioned above, you can get the file-date using form.getfirst('file'). This reads in the entire file and give it to you as a string. Alternatively, you can get a file-handle to the file and manipulate it as such, through form['file'].file. You can also use this to check whether the data provided is a file:

fileitem = form["file"]
if fileitem.file:
    # it's a file
    for line in fileitem.file.xreadlines():
        print line  

EXERCISE CGI.16: Write a script that lets the user upload a file containing a FASTA-sequence, and display the sequence as in exercise CGI.11.

Summary

We have learnt how to build web-interfaces to our programs: how to use the CGI-protocol to execute scripts to create HTML-pages, and how to use forms to provide input to our scripts.

With this new knowledge, we are now ready to attack this weeks exercises, concerning a web-interface to the Clustal W module we wrote earlier. Combined, the exercises for this week and the previous two weeks comprise most of mandatory project number two.

Time-stamp: "2003-12-03 10:02:58 mailund"