Warning!
Designing databases is actually a lot more difficult than you
might think, and we do not have time to cover it properly here. The
material we cover here, however, should be enough to get you
started on simple databases; if you need more complicated
databases you should probably study the database literature a bit more.
Extracting data from databases, as we did last week, is well and good, and what we mainly need, but we must also know how to populate the databases in the first place.
In this lecture, we will learn how to build the database and its tables, and how to insert, delete, and update rows in the tables.
We will also, briefly, cover how to design your tables to reduce (or better, eliminate) redundancy, how to relate tables, and how to keep the relations consistent.
The very first thing we must do, is to create our own database. We need our own because, in general, we are not allowed to update data in foreign databases.
How you create a database at DAIMI is explained at this page: you need to run the muser with the name of your database and the password you want to use for the database.
On a solaris machine, for instance, you would run:
[~ (499)> /users/mysql/solaris/daimi/muser mailund mypassword
Database created, you may administer it using the URL
http://www.daimi.au.dk/MysqlTool
Username: mailund
Password: mypassword
Database: mailund
[~ (500)>
This creates the database `mailund' protected by the `mypassword' password.
You can log into this database using mysql:
[~ (503)> mysql -h mysql.daimi.au.dk -u mailund -p -D mailund Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 649278 to server version: 3.23.54 Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql>
This command logs me into the database (-D) mailund, on host (-h) mysql.daimi.au.dk as user (-u) mailund protected by password (-p).
EXERCISE DB2.1: Create a database. Use your username as its name.
Once created, we can populate the database with tables.
Let's say we want to create a database over experiments that we conduct. To an experiment we want to associate a description of the experiment and the date the experiment was conducted.
We can create a table holding this information using the CREATE TABLE SQL command:
CREATE TABLE experiment ( description TEXT, date DATE );
This creates a table called experiment with two columns, description and date. The description column has type TEXT--which is a string of arbitrary length--and the date column has type DATE. For more types, consult the reference manual.
We can now insert a row into the table using the INSERT INTO command:
INSERT INTO table name VALUES (list of values);
For example:
mysql> INSERT INTO experiment VALUES ('I left the milk in the fridge for six months','1997-12-09');
Query OK, 1 row affected (0.05 sec)
mysql> SELECT * FROM experiment;
+----------------------------------------------+------------+
| description | date |
+----------------------------------------------+------------+
| I left the milk in the fridge for six months | 1997-12-09 |
+----------------------------------------------+------------+
1 row in set (0.00 sec)
mysql> INSERT INTO experiment VALUES ('I created a MySQL database', CURRENT_DATE);
Query OK, 1 row affected (0.03 sec)
mysql> SELECT * FROM experiment;
+----------------------------------------------+------------+
| description | date |
+----------------------------------------------+------------+
| I left the milk in the fridge for six months | 1997-12-09 |
| I created a MySQL database | 2003-11-24 |
+----------------------------------------------+------------+
2 rows in set (0.00 sec)
mysql>
Notice that, in the first insertion, we explicitly set the date; in the second we use the variable CURRENT_DATE which, no surprise, contains the current date.
If you are not sure about the order of the columns, or if you only want to insert values for some of the columns, you can use a variation of the command:
INSERT INTO table name (list of columns) VALUES (list of values);
Here, you explicitly mention the columns, and the order of the columns.
mysql> INSERT INTO experiment (description,date) VALUES ('I updated a database',CURRENT_DATE)
Query OK, 1 row affected (0.03 sec)
mysql> SELECT * FROM experiment;
+----------------------------------------------+------------+
| description | date |
+----------------------------------------------+------------+
| I left the milk in the fridge for six months | 1997-12-09 |
| I created a MySQL database | 2003-11-24 |
| I updated a database | 2003-11-24 |
+----------------------------------------------+------------+
3 rows in set (0.00 sec)
mysql>
EXERCISE DB2.2: Create the experiment table and insert a few rows into it.
Now, say we also want a table containing a list of our lab assistants. We want their name and address, and we can create that table using the following command:
CREATE TABLE lab_assistant ( first_name VARCHAR(20), last_name VARCHAR(20), address VARCHAR(20) );
Here, all the columns are string of length max 20 characters.
We can populate this table using INSERT INTO commands as before.
mysql> CREATE TABLE lab_assistant (
-> first_name VARCHAR(20),
-> last_name VARCHAR(20),
-> address VARCHAR(20)
-> );
Query OK, 0 rows affected (0.05 sec)
mysql> SHOW tables;
+-------------------+
| Tables_in_mailund |
+-------------------+
| experiment |
| lab_assistant |
+-------------------+
2 rows in set (0.00 sec)
mysql> INSERT INTO lab_assistant VALUES ('Frank', 'Enstein', 'Ny Munkegade bl. 450')
Query OK, 1 row affected (0.04 sec)
mysql> SELECT * FROM lab_assistant;
+------------+-----------+----------------------+
| first_name | last_name | address |
+------------+-----------+----------------------+
| Frank | Enstein | Ny Munkegade bl. 540 |
+------------+-----------+----------------------+
1 row in set (0.00 sec)
mysql>
EXERCISE DB2.3: Create the lab_assistant table and insert a few rows. What happens if one of the strings are longer than 20 characters?
From Python, we can execute these SQL commands through cursors, just as we could execute SELECT-queries.
For instance, to insert a row in the experiment table, we could execute the following code:
cursor.execute('''
INSERT INTO experiment (description,date)
VALUES ('I inserted a row from Python',CURRENT_DATE)
''')
EXERCISE DB2.4: Write a Python script that inserts a few rows in the experiment table and a few rows into the lab_assistant table.
EXERCISE DB2.5: Write a script that reads a file of tab-separated lines, and insert these lines as rows into a table.
The idea now, is to relate the lab assistants to the experiments, so we know which assistants were involved with which experiments.
We could, of course, pile everything into a single table that contained both experiment description and the assistants:
CREATE TABLE everything ( ex_description TEXT, ex_date DATE, la_first_name VARCHAR(20), la_last_name VARCHAR(20), la_address VARCHAR(20) );
Warning!
Redundancy is bad! If you represent the same data twice,
you will inevitably update it one place but not the other, and
thus reach an inconsistent state. Therefore, use more tables and
reference between them using keys.
There is a major problem with this, though: The table will have lots of redundant data: The experiment description and date will be repeated for each assistant, and the assistant information will be repeated for each experiment.
This is not just a problem of wasting space resources. Far worse, it makes it very difficult to keep the database consistent.
If an assistant moves, we must find all occurrences of the assistant and update his address. That means finding each experiment he has been involved in.
It takes just a single error to corrupt the database. It is, therefore, a much better idea to represent each piece of data only once, and use keys to relate different pieces of data.
A primary key is a value that is unique for each row, that is, a row can be uniquely identified by its primary key. It is this one-to-one relationship between keys and rows that makes them important.
We can add a column of primary keys to experiment using the ALTER TABLE command, as this:
ALTER TABLE experiment ADD COLUMN experiment_key INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;
After executing this command, the experiment table contains a new column, experiment_key that contains unsigned integer values that are automatically assigned, are not allowed to be null, and acts as primary keys, which means that they are required to be unique.
The existing rows will all have been assigned a unique key.
mysql> describe experiment; +----------------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------------+------------------+------+-----+---------+----------------+ | description | text | YES | | NULL | | | date | date | YES | | NULL | | | experiment_key | int(10) unsigned | | PRI | NULL | auto_increment | +----------------+------------------+------+-----+---------+----------------+ 3 rows in set (0.01 sec) mysql> select * from experiment; +----------------------------------------------+------------+----------------+ | description | date | experiment_key | +----------------------------------------------+------------+----------------+ | I left the milk in the fridge for six months | 1997-12-09 | 1 | | I created a MySQL database | 2003-11-24 | 2 | | I updated a database | 2003-11-24 | 3 | +----------------------------------------------+------------+----------------+ 3 rows in set (0.00 sec) mysql>
Alternatively, we can delete the existing table, using the DROP TABLE command, and then create a new table where the primary key is included from the start:
DROP TABLE experiment; CREATE TABLE experiment ( experiment_key INT UNSIGNED NOT NULL AUTO_INCREMENT, description TEXT, date DATE, PRIMARY KEY (experiment_key) );
By the way, if you try both approaches, you will see that the ALTER TABLE adds the new column as the last column, while the CREATE TABLE above adds the key in the first column (as it is mentioned first in the creation).
If we re-create the table, and insert the rows from earlier, we get the new order when we select the data:
mysql> DROP TABLE experiment;
Query OK, 0 rows affected (0.05 sec)
mysql> CREATE TABLE experiment (
-> experiment_key INT UNSIGNED NOT NULL AUTO_INCREMENT,
-> description TEXT,
-> date DATE,
-> PRIMARY KEY (experiment_key)
-> );
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO experiment (description,date) VALUES ('I left milk in the fridge for six months', '1997-12-09');
Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO experiment (description,date) VALUES ('I created a MySQL database', CURRENT_DATE);
Query OK, 1 row affected (0.04 sec)
mysql> INSERT INTO experiment (description,date) VALUES ('I updated a database', CURRENT_DATE);
Query OK, 1 row affected (0.01 sec)
mysql> SELECT * FROM experiment;
+----------------+------------------------------------------+------------+
| experiment_key | description | date |
+----------------+------------------------------------------+------------+
| 1 | I left milk in the fridge for six months | 1997-12-09 |
| 2 | I created a MySQL database | 2003-11-24 |
| 3 | I updated a database | 2003-11-24 |
+----------------+------------------------------------------+------------+
3 rows in set (0.01 sec)
mysql>
Remember, unless you explicitly mention the columns when inserting rows into the table, the order of the columns is important, so be careful here!
When you insert into a table with an auto-incremented primary key, you should not specify the key column, so it is better to use the INSERT INTO variation that explicitly mentions the columns.
EXERCISE DB2.6: Add a primary key, called lab_assistant_key to the lab_assistant table.
A foreign key, or secondary key, is a column in one table that refers to a key in another table.
Since the lab assistants are represented as rows in a table, with a unique primary key, we can identify a specific lab assistant using his primary key. If we create a table, performed_experiment say, with a column containing lab assistants keys, the column is a secondary, or foreign, key of performed_experiment.
If we define performed_experiment as follows, both columns contain foreign keys, one identifying an experiment, the other an assistant.
CREATE TABLE performed_experiment ( experiment_key INT UNSIGNED NOT NULL, lab_assistant_key INT UNSIGNED NOT NULL );
We can then use this table to relate assistants with experiments, without duplicating experiment or assistant data.
If the experiment and lab_assistant table looks as follows:
mysql> SELECT * FROM experiment; +----------------+------------------------------------------+------------+ | experiment_key | description | date | +----------------+------------------------------------------+------------+ | 1 | I left milk in the fridge for six months | 1997-12-09 | | 2 | I created a MySQL database | 2003-11-24 | | 3 | I updated a database | 2003-11-24 | +----------------+------------------------------------------+------------+ 3 rows in set (0.03 sec) mysql> SELECT * FROM lab_assistant; +------------+-----------+----------------------+-------------------+ | first_name | last_name | address | lab_assistant_key | +------------+-----------+----------------------+-------------------+ | Frank | Enstein | Ny Munkegade bl. 540 | 1 | | Thomas | Mailund | Never-Never Land | 2 | +------------+-----------+----------------------+-------------------+ 2 rows in set (0.05 sec) mysql>
We can associate `Frank Enstein' with the `Milk' experiment and `Thomas Mailund' with the `update database' experiment, by adding (1,1) and (3,2) to the performed_experiment table:
mysql> INSERT INTO performed_experiment VALUES (1,1);
Query OK, 1 row affected (0.04 sec)
mysql> INSERT INTO performed_experiment VALUES (3,2);
Query OK, 1 row affected (0.00 sec)
mysql> SELECT ex.description, ex.date, la.first_name, la.last_name
-> FROM experiment AS ex
-> NATURAL JOIN performed_experiment
-> NATURAL JOIN lab_assistant AS la;
+------------------------------------------+------------+------------+-----------+
| description | date | first_name | last_name |
+------------------------------------------+------------+------------+-----------+
| I left milk in the fridge for six months | 1997-12-09 | Frank | Enstein |
| I updated a database | 2003-11-24 | Thomas | Mailund |
+------------------------------------------+------------+------------+-----------+
2 rows in set (0.02 sec)
mysql>
There is one remaining problem, though: we can also insert pairs that does not correspond to experiments and assistants, and if we delete a row in, say, the experiment table, then we do no automatically delete the corresponding rows in performed_experiment.
mysql> INSERT INTO performed_experiment VALUES (12,13); Query OK, 1 row affected (0.02 sec) mysql>
In principle, you can fix this problem using the FOREIGN KEY command. Unfortunately, that requires the InnoDB table type, which, as far as I can determine, is not supported by the database version installed at DAIMI.
So you are on your own when it comes to ensuring consistency between primary and foreign keys. But, whenever you start working on a new database installation, look for foreign key support!
The DELETE command is used to delete rows. The syntax is:
DELETE FROM table name
WHERE where expression
and the effect is, that all rows satisfying the where expression are deleted from the table.
mysql> SELECT * FROM experiment; +----------------+------------------------------------------+------------+ | experiment_key | description | date | +----------------+------------------------------------------+------------+ | 1 | I left milk in the fridge for six months | 1997-12-09 | | 2 | I created a MySQL database | 2003-11-25 | | 3 | I updated a database | 2003-11-25 | | 4 | I inserted a row from Python | 2003-11-25 | +----------------+------------------------------------------+------------+ 4 rows in set (0.00 sec) mysql> DELETE FROM experiment WHERE experiment_key=4; Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM experiment; +----------------+------------------------------------------+------------+ | experiment_key | description | date | +----------------+------------------------------------------+------------+ | 1 | I left milk in the fridge for six months | 1997-12-09 | | 2 | I created a MySQL database | 2003-11-25 | | 3 | I updated a database | 2003-11-25 | +----------------+------------------------------------------+------------+ 3 rows in set (0.00 sec) mysql> DELETE FROM experiment WHERE (YEAR(CURRENT_DATE)-YEAR(date)) > 1; Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM experiment; +----------------+----------------------------+------------+ | experiment_key | description | date | +----------------+----------------------------+------------+ | 2 | I created a MySQL database | 2003-11-25 | | 3 | I updated a database | 2003-11-25 | +----------------+----------------------------+------------+ 2 rows in set (0.01 sec) mysql>
In the last DELETE we used date calculations to select all experiments more than a year old.
EXERCISE DB2.6: Did you see a problem with what I just did? If not, take a look at the performed_experiment column. Do you see a problem now?
Updating rows is done using the UPDATE command. The syntax is as follows:
UPDATE table name
SET col1=expr1 [,col2=expr2 ...]
WHERE where expression
The effect is, that the value of column col1 is set to the result of expr1, the value of column col2 is set to the result of expr2 and so forth, for all the rows satisfying the where expression.
mysql> SELECT * FROM lab_assistant;
+-------------------+------------+-----------+----------------------+
| lab_assistant_key | first_name | last_name | address |
+-------------------+------------+-----------+----------------------+
| 1 | Frank | Enstein | Ny Munkegade bl. 450 |
| 2 | Thomas | Mailund | Never-Never Land |
+-------------------+------------+-----------+----------------------+
2 rows in set (0.05 sec)
mysql> UPDATE lab_assistant
-> SET last_name='Franksen', address='Ny-Ny Munkegade'
-> WHERE lab_assistant_key=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> SELECT * FROM lab_assistant;
+-------------------+------------+-----------+------------------+
| lab_assistant_key | first_name | last_name | address |
+-------------------+------------+-----------+------------------+
| 1 | Frank | Franksen | Ny-Ny Munkegade |
| 2 | Thomas | Mailund | Never-Never Land |
+-------------------+------------+-----------+------------------+
2 rows in set (0.01 sec)
mysql>
You can also use the column-values of the current row in the update expressions. That is, you can use the current values in the row to define the new values as below, where I add a year to all dates more than a year in the past:
mysql> SELECT * FROM experiment;
+----------------+----------------------------------------------+------------+
| experiment_key | description | date |
+----------------+----------------------------------------------+------------+
| 1 | I left the milk in the fridge for six months | 1997-12-09 |
| 2 | I created a MySQL database | 2003-11-25 |
| 3 | I updated a database | 2003-11-25 |
+----------------+----------------------------------------------+------------+
3 rows in set (0.01 sec)
mysql> UPDATE experiment
-> SET date = DATE_ADD(date, INTERVAL 1 YEAR)
-> WHERE (YEAR(CURRENT_DATE)-YEAR(date)) > 1;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> SELECT * FROM experiment;
+----------------+----------------------------------------------+------------+
| experiment_key | description | date |
+----------------+----------------------------------------------+------------+
| 1 | I left the milk in the fridge for six months | 1998-12-09 |
| 2 | I created a MySQL database | 2003-11-25 |
| 3 | I updated a database | 2003-11-25 |
+----------------+----------------------------------------------+------------+
3 rows in set (0.01 sec)
mysql>
EXERCISE DB2.7: Subtract a year from all dates in the experiment.
From Python, deleting and updating is also done through the execute() method on cursors.
Warning!
There might be concurrency problems with this
solution, but lacking transactions
in the installed database, we cannot do much better.
Remember the problem from exercise DB2.6? We deleted an experiment, but did not update the performed_experiment table. From Python we can solve this by first extracting the keys of the rows we wish to delete, and then delete them in a second SQL command.
In the code below, we use two cursors, one for extracting the keys and another for deleting rows. We cannot use the same, because the first cursor should remember the state of the SELECT query, which would be deleted in the DELETE commands. Alternatively, we could have used fetchall() to avoid this problem.
select_cursor = connection.cursor()
delete_cursor = connection.cursor()
select_cursor.execute('''
SELECT experiment_key
FROM experiment
WHERE (YEAR(CURRENT_DATE)-YEAR(date)) > 1
''')
row = select_cursor.fetchone()
while row:
key, = row
delete_cursor.execute('''
DELETE FROM experiment
WHERE experiment_key = %d
''' % key)
delete_cursor.execute('''
DELETE FROM performed_experiment
WHERE experiment_key = %d
''' % key)
row = select_cursor.fetchone()
If, before we execute the python code, the tables look like this:
mysql> SELECT * FROM experiment; SELECT * FROM performed_experiment; ----------------+----------------------------------------------+------------+ | experiment_key | description | date | +----------------+----------------------------------------------+------------+ | 1 | I left the milk in the fridge for six months | 1997-12-09 | | 2 | I created a MySQL database | 2003-11-25 | | 3 | I updated a database | 2003-11-25 | +----------------+----------------------------------------------+------------+ 3 rows in set (0.00 sec) +----------------+-------------------+ | experiment_key | lab_assistant_key | +----------------+-------------------+ | 1 | 1 | | 3 | 2 | +----------------+-------------------+ 2 rows in set (0.05 sec) mysql>
Then, after executing the python code, we have
mysql> SELECT * FROM experiment; SELECT * FROM performed_experiment; +----------------+----------------------------+------------+ | experiment_key | description | date | +----------------+----------------------------+------------+ | 2 | I created a MySQL database | 2003-11-25 | | 3 | I updated a database | 2003-11-25 | +----------------+----------------------------+------------+ 2 rows in set (0.00 sec) +----------------+-------------------+ | experiment_key | lab_assistant_key | +----------------+-------------------+ | 3 | 2 | +----------------+-------------------+ 1 row in set (0.00 sec) mysql>
EXERCISE DB2.8: Write a script that removes all experiments conducted the last month, and subtracts a month from the rest.
We have learnt how to set up a database at DAIMI, how to create tables, and how to modify them.
We can now move on to this weeks exercises and the third mandatory exercise.