Databases Behind Shops
Published on March 2, 2005
You’ve just launched your brand new database-driven Web site. You celebrate this milestone by logging into your database and adding a new row to the “articles” table, telling the world about how wonderful your new site is. You save, and voilà! The article immediately appears in the news section on your site.
You’re pretty pleased with yourself, and schedule a lunch meeting with your biggest clients to boast a little bit. You tell them all about it, and they smile and look impressed and then ask you, “Is your site e-commerce enabled? Can we buy your products online?”
It turns out that your competitors are one step ahead of you. They not only have database-driven sites, but they are using those databases to keep track of their products, customers and orders, and to sell online.
This article is a continuation of Introduction to Databases. We’ll cover queries, relationships, joins and indexes in the context of a very useful example: the database behind an online shop.
Databases contain tables of data. Each table (like “customers”, “products” and “orders”) contains a list of rows, and each row has several columns (“name” and “address” for “customers”; “description” and “price” for “products”, etc). A database table is broadly akin to a worksheet in an Excel spreadsheet.
Structured query language (SQL) is a common language for database applications like Microsoft Access, SQL Server, MySQL, Oracle and many others.
Throughout this article, I will show the SQL statements for creating database tables and manipulating their data. To try these examples yourself, you will need an SQL environment, such as phpMyAdmin, installed on a PHP/MySQL-capable Web server.
Create a database in phpMyAdmin and click the SQL link. You will need to copy-and-paste the SQL statements into the SQL window and click “Go”.
To actually convert these examples into a working e-commerce system, you will also need to make your site communicate with your database using a language like PHP. Take the case of a user entering a search phrase on your site. PHP will take this search phrase and translate it into an SQL command. MySQL will then perform the search and return the results, and PHP will turn these results into HTML to be displayed on your site. This article only deals with the database aspects of this process and not with the PHP or HTML.
Creating a table and inserting data
First, for your new e-commerce–enabled site, you’ll need something to sell. So we’ll create a “products” table in your database and add a few items to it:
CREATE TABLE products (
id INTEGER PRIMARY KEY,
INSERT INTO products VALUES (1, 'Book about art', 'A great book about art by someone famous who knows about art', 12.50);
INSERT INTO products VALUES (2, 'Dinosaur', 'A small toy dinosaur', 2.90);
INSERT INTO products VALUES (3, 'Dinosaur book', 'A book all about dinosaurs', 7.90);
CREATE TABLE statement creates a table called “products” with four columns. The
id column is an integer (a number), and
PRIMARY KEY indicates that it is unique. The “name” column stores a variable length string of up to 255 characters, and the “price” column is a decimal number with two digits to the right of the decimal point.
INSERT statements add data to this table. The four comma-separated values after the
VALUES go into the table’s four columns. Each
INSERT adds one row.
This article will not discuss the two other types of SQL statements which change data:
Simple database queries
Next, you need a way for your customers to search these products. Your site will ask the customer what they are looking for, and this will be converted into an SQL
SELECT statement (a.k.a. database query).
In other words, a database query is simply a request for information. For example:
SELECT name, description
WHERE price > 5
ORDER BY price;
This is the general format of a simple
SELECT statement. The word
SELECT is followed by the names of the columns you are interested in. Then you have
FROM and the table name. You can also have optional conditions (after
WHERE), and specify the way results will be sorted (after
This particular query extracts the name and description of any products costing more than $5, ordered by price in ascending order, i.e. with the cheapest first. This would return the dinosaur book and the book about art.
Now try the following:
WHERE price < 3 AND description LIKE '%dinosaur%';
In this query, the
* asks the database to return all columns from the products table. There are also two
WHERE conditions. The database will check every single row in the products table, and only return rows which satisfy both conditions—a price less than $3 and a description containing “dinosaur.”
The latter condition uses the
LIKE operator. This allows for string matching and uses the percentage sign as a wild card. So the condition above means “check if the description column contains any text at all, followed by the letters 'dinosaur', followed by any text at all.” This query will return only the toy dinosaur.
The following query uses parentheses to group together different conditions:
WHERE price < 10 OR (name LIKE '%book%' AND description LIKE '%book%')
ORDER BY name DESC;
This returns the dinosaur book, the dinosaur toy, and the book about art. The toy matches the first condition (
price < 10) and the two books match both the second and third conditions. The results are returned in descending order by name.
Along with your products, you’ll also need a place in your database to store information about your customers and orders. When a customer comes to your site and decides to place an order, you’ll need to store that person’s name and address, and the order date and total.
There are two ways to achieve this. You can store all the information in one table, an “orders” table with columns for the order id, order date, total, customer name and customer address. Alternatively, you can put the customer data in a “customers” table and the order data in an “orders” table, and link them together.
The first method is easier because it requires only one table. However, it leads to data duplication. For instance, if the same person (say Luisa Smith of 123 Main St.) places several orders on your site, then you will end up with her name and address several times in the “orders” table.
And when you prepare to mail a brochure to all your customers, you will have to filter out all the duplicates so that Luisa only gets one brochure. Furthermore, she might have written her address as “123 Main St” some times and other times as “123 Main Street,” so your filtering will probably have to be done manually to make sure you don’t miss any data entry anomalies.
The second method is definitely better, especially for a site where people are likely to come back order again and again (a very good thing). However, it is slightly more involved and necessitates your first database relationship.
In our database, there will be a relationship between the “customers” and “orders” tables. In particular, one customer can have many orders, but one order is only ever associated with one customer. In database parlance, this is known as a “one-to-many” relationship.
If you were to draw a database diagram, it would look something like this:
The line with the arrows indicates a relationship going from one customer to many (the infinity symbol) orders. Note that one order does not have many customers. The “many” in a one-to-many relationships means “any number.” So a customer could have no orders at all, or just one, or 564.
Later we will look at many-to-many relationships, but first we’ll see how to represent a one-to-many relationship in a database table.
First we will create the “customers” table, giving each customer an ID number, then insert some sample customers:
CREATE TABLE customers (
id INT PRIMARY KEY,
INSERT INTO customers VALUES (1, 'Luisa Smith', '3 Main St, Tallinn, Estonia');
INSERT INTO customers VALUES (2, 'Julie Half', '17 Yep Terrace, Glasgow, Scotland');
INSERT INTO customers VALUES (3, 'Frank Hubert', '123 Upper Rd, Dublin, Ireland');
Now we’ll create the “orders” table, adding a “customerid” column. The “customerid” column will refer to the ID number of the customer who made the order.
CREATE TABLE orders (
id INT PRIMARY KEY,
INSERT INTO orders VALUES (1, 1, '2004-11-01', 10.80);
INSERT INTO orders VALUES (2, 1, '2004-11-02', 12.50);
INSERT INTO orders VALUES (3, 3, '2004-11-01', 14.50);
INSERT INTO orders VALUES (4, 3, '2004-11-02', 25.00);
Order numbers 1 and 2 both have a customer ID of 1. This means that both these orders were made by customer 1, Luisa Smith.
Note that some database software allows you to make this relationship more explicit by labeling the “customerid” column as a
FOREIGN KEY in the “orders” table, such that this foreign key refers to the customer’s
PRIMARY KEY. Also, in a real system, the customer data would come from a form on your site, and some of the order data would be computed and entered automatically.
So now you’ve got your first few customers and orders (fictional, unfortunately), and you want to create a report to see who your customers are and how much money they have spent.
You could run a query on your orders table:
SELECT id, customerid, orderdate, total FROM orders ORDER BY orderdate;
And for each order, you could run another query to look up the customer’s name and address:
SELECT name FROM customers WHERE id = 1;
SELECT name FROM customers WHERE id = 3;
Then combine all that information to make your report. Alternatively, you could join the data from both tables in a single database query:
SELECT orders.id, orderdate, total, name
FROM orders INNER JOIN customers ON customerid = customers.id
ORDER BY orderdate;
This query returns four rows of data, showing the date and total for each of your orders, and the name of the customer who placed the order. It uses the
INNER JOIN ... ON syntax to combine the data from the two tables by matching the “customerid” column of the “orders” table with the “id” column of the “customers” table.
1, 2004-11-01, 10.80, Luisa Smith
2, 2004-11-01, 14.50, Frank Hubert
3, 2004-11-02, 12.50, Luisa Smith
4, 2004-11-02, 25.00, Frank Hubert
For some of the columns in the query above, we also specified the table name by writing tablename-period-columnname (“orders.id”). You are only required to do this when there might be a conflict. For instance, both tables have an “id” column, so the database needs to know that we want to display only the “id” column from the “orders” table.
How about if you wanted this information from a customer perspective rather than an orders perspective?
Try this query:
SELECT customers.id, name, address, orderdate, total
FROM customers INNER JOIN orders ON customers.id = customerid
ORDER BY name;
You will get the same four rows as before, but you’ll be missing one customer. You won’t get any information about Julie Half because she didn’t place an order. What you need in this case is a
SELECT customers.id, name, orders.id, orderdate, total
FROM customers LEFT JOIN orders ON customers.id = customerid
ORDER BY customers.id, orders.id;
LEFT JOIN, every row from the table on the left (“customers”) is shown, and, if possible, this data is combined with the data from the table on the right (“orders”):
1, Luisa Smith, 1, '2004-11-01', 10.80
1, Luisa Smith, 2, '2004-11-02', 12.50
2, Julie Half, null, null, null
3, Frank Hubert, 3, '2004-11-01', 14.50
3, Frank Hubert, 4, '2004-11-02', 25.00
nulls” indicate missing order data.
Joins are especially powerful when you are able to group some of the rows together and compute statistics. SQL provides a
GROUP BY function for doing just this.
Try the following query:
SELECT customers.id, name, COUNT(orders.id), SUM(orders.total)
FROM customers LEFT JOIN orders ON customers.id = customerid
GROUP BY customers.id, name
ORDER BY customers.id
This will return the id and name of all customers, along with how many orders they have made and how much they have spent in total. Because it is a
LEFT JOIN, it also shows that Julie Half did not make any orders.
1, Luisa Smith, 2, 23.30
2, Julie Half, 0, 0.00
3, Frank Hubert, 2, 39.50
Internally, the database computes this by running a query similar to the last query in the previous section, which returned five rows. It then groups together the rows which have the same values in the “customers.id” and “name”columns (i.e. one group for each distinct customer). It then computes the aggregate functions (
SUM) on these grouped rows and displays the results. Note that the aggregate functions return zero when there are no values to aggregate, as in the case of Julie Half.
If you try to run this query without a
GROUP BY clause, the database will complain that
SUM can only be used when there’s a
There is one big piece still missing from our e-commerce system—we don’t have a way of representing what each order contains. For that, we will need a “many-to-many” relationship between the orders table and the products table.
This is because one order can contain many products, and one product can be in many orders. It’s like two one-to-many relationships operating in opposite directions.
Adding in this relationship, our database diagram now looks like this:
Unfortunately, we can’t represent a many-to-many relationship in our database simply by adding more columns to the existing tables.
Specifically, adding a “productid” column to the orders table and an “orderid” column to the products table would not work. Then we’d have a sort of one-to-one relationship, where each order referred to only one product, and each product could be in only one order.
What we need is another table that links the orders and products together:
The “orderitems” table essentially breaks the tricky many-to-many relationship into two separate one-to-many relationships. An “orderitem” is stored every time someone adds a product (an item) to an order. One order can have many order items, and one product can be ordered many times.
The SQL to create this intermediary table and fill it with some data is as follows. The table also has a quantity property, so we can store how many of each item is required in each order.
CREATE TABLE orderitems (
quantity INT DEFAULT 1
INSERT INTO orderitems VALUES (1, 2, 1);
INSERT INTO orderitems VALUES (1, 3, 1);
INSERT INTO orderitems VALUES (2, 1, 1);
INSERT INTO orderitems VALUES (3, 2, 5);
INSERT INTO orderitems VALUES (4, 1, 2);
In a real online shopping basket, you insert a row into the “orderitems” table whenever someone clicks on the “Add to Cart” button on a product page. If it was a brand-new customer adding the first item, you could automatically generate a new row in the “customers” table (with a new customer ID) and a new row in the orders table (with a new order ID). The new row in the “orderitems” table would use the product’s ID number and the order’s ID number.
Data duplication revisited
These are all the tables you need to create a fully functioning shopping cart system on your site. In fact, we’re actually storing too much information. The total column in the orders table isn’t really necessary because this figure can be computed (using a mega-query similar to the one below) from the data in all the other tables.
However, it may still be a good idea to leave it in there because it provides a record of how much the order cost at the time it was ordered. For example, if a year from now the toy dinosaur goes on sale for $2.50 instead of $2.90, then when you run your totaling query, it would appear that all the previous orders were discounted as well.
So, even though it duplicates data, it should stay.
Indexing and Optimizing
Before we move on to the grand finale—a SQL statement to outdo them all—we have to consider the speed of queries.
Although all of your queries will run very quickly when there are only four orders in the database, with 10,000 they might grind to halt. This is because the database software has to do an awful lot of cross checking and looking up to retrieve the results.
If this is the case, you can vastly improve performance by adding databases indexes. Like the index at the back of a technical book, the database uses indexes internally to look up your data much faster.
You will generally want to index any column that appears in the
ON part of your query, unless it is already designated as a
PRIMARY KEY (because those are automatically indexed). In our case, this means the “orders.customerid,” “orderitems.orderid” and “orderitems.productid” columns.
You can add an index to an existing table using the following SQL statement:
ALTER TABLE orders ADD INDEX customeridindex (customerid);
This creates an index for the “customerid” column in the orders table and will make queries which utilize this column much faster.
There are also many other types of optimizations and shortcuts that you can use to make your database run faster. For example,
TRIGGERs and stored procedures allow the database to perform actions that are often left to PHP or ASP.
And now for one more great big whooping database statement which will use all four tables at once to find out how many of each item each person has ordered:
SELECT customers.name, products.name, SUM(orderitems.quantity)
FROM customers LEFT JOIN orders ON customers.id=orders.customerid
LEFT JOIN orderitems ON orders.id=orderitems.orderid
LEFT JOIN products ON orderitems.productid=products.id
GROUP BY customers.name, products.name
ORDER BY customers.name;
Because of the grouping we have specified, there will be one result row for each customer and product they ordered, and one for each customer who didn’t order anything. And if one customer ordered the same item a few different times, the statement would add up the quantities for us.
So now imagine your next lunch date. Your new fantastic e-commerce–enabled online shopping extravaganza has just been launched. Your biggest client asks you, “Can we buy your products online yet?”
You don’t need to dodge the question, become very interested in the table cloth or order another round of drinks.
You can calmly reach into your bag and pull out a single sheet of paper, with the database statement above printed out in nice big letters.
“Yes” you can say, “and I know how it works.” Add a few well-placed acronyms to your explanation and they’ll be buying from you in no time.