Exposes five common database problems that arise in PHP applications—including database schema design, database access, and business logic code that uses the database—and their solutions.
If only one way to use a database was correct...
There are many ways you can create database design, database access, and database-based PHP business logic code, but they usually end up wrong. This article explains five common problems that arise in database design and PHP code that access the database, and how to fix them when you encounter them.
Problem 1: Using MySQL directly
A common problem is that older PHP code uses the mysql_ function directly to access the database. Listing 1 shows how to access the database directly.
Listing 1. Access/get.php
<?php
function get_user_id( $name )
{
$db = mysql_connect( 'localhost', 'root', 'password' );
mysql_select_db( 'users' );
$res = mysql_query( "SELECT id FROM users WHERE login='".$name."'" );
while( $row = mysql_fetch_array( $res ) ) { $id = $row[0]; }
return $id;
}
var_dump( get_user_id( 'jack' ) );
?>
Note that the mysql_connect function is used to access the database. Also note the query, which uses string concatenation to add the $name parameter to the query.
There are two good alternatives to this technology: the PEAR DB module and the PHP Data Objects (PDO) classes. Both provide abstractions from specific database selections. As a result, your code can run on IBM® DB2®, MySQL, PostgreSQL, or any other database you want to connect to without much tweaking.
Another value of using the PEAR DB module and the PDO abstraction layer is that you can use the ? operator in SQL statements. Doing so makes SQL easier to maintain and protects your application from SQL injection attacks.
Alternative code using PEAR DB is shown below.
Listing 2. Access/get_good.php
<?php
require_once("DB.php");
function get_user_id( $name )
{
$dsn = 'mysql://root:password@localhost/users';
$db =& DB::Connect( $dsn, array() );
if (PEAR::isError($db)) { die($db->getMessage()); }
$res = $db->query( 'SELECT id FROM users WHERE login=?',array( $name ) ) ;
$id = null;
while( $res->fetchInto( $row ) ) { $id = $row[0]; }
return $id;
}
var_dump( get_user_id( 'jack' ) );
?>
Note that all direct uses of MySQL have been eliminated, except for the database connection string in $dsn. Additionally, we use the $name variable in SQL through the ? operator. Then, the query data is sent in through the array at the end of the query() method.
Problem 2: Not using the auto-increment feature
Like most modern databases, MySQL has the ability to create auto-increment unique identifiers on a per-record basis. Beyond that, we'll still see code that first runs a SELECT statement to find the largest id, then increments that id by 1, and finds a new record. Listing 3 shows an example bad pattern.
Listing 3. Badid.sql
DROP TABLE IF EXISTS users;
CREATE TABLE users (
id MEDIUMINT,
login TEXT,
password TEXT
);
INSERT INTO users VALUES ( 1, 'jack', 'pass' );
INSERT INTO users VALUES ( 2, 'joan', 'pass' );
INSERT INTO users VALUES (1, 'jane', 'pass' );
Here the id field is simply specified as an integer. So, although it should be unique, we can add any value, as shown in the several INSERT statements following the CREATE statement. Listing 4 shows the PHP code to add users to this type of schema.
Listing 4. Add_user.php
<?php
require_once("DB.php");
function add_user( $name, $pass )
{
$rows = array();
$dsn = 'mysql://root:password@localhost/bad_badid';
$db =& DB::Connect( $dsn, array() );
if (PEAR::isError($db)) { die($db->getMessage()); }
$res = $db->query( "SELECT max(id) FROM users" );
$id = null;
while( $res->fetchInto( $row ) ) { $id = $row[0]; }
$id += 1;
$sth = $db->prepare( "INSERT INTO users VALUES(?,?,?) " );
$db->execute( $sth, array( $id, $name, $pass ) );
return $id;
}
$id = add_user( 'jerry', 'pass' );
var_dump( $id );
?>
The code in add_user.php first performs a query to find the maximum value of id. The file then runs an INSERT statement with the id value increased by 1. This code will fail in a race condition on a heavily loaded server. Plus, it's also inefficient.
So what is the alternative? Use the auto-increment feature in MySQL to automatically create a unique ID for each insert. The updated schema looks like this.
Listing 5. Goodid.php
DROP TABLE IF EXISTS users;
CREATE TABLE users (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
login TEXT NOT NULL,
password TEXT NOT NULL,
PRIMARY KEY(id)
);
INSERT INTO users VALUES ( null, 'jack', 'pass' );
INSERT INTO users VALUES (null, 'joan', 'pass' );
INSERT INTO users VALUES ( null, 'jane', 'pass' );
We added the NOT NULL flag to indicate that the field must not be null. We also added the AUTO_INCREMENT flag to indicate that the field is auto-incrementing, and the PRIMARY KEY flag to indicate that the field is an id. These changes speed things up. Listing 6 shows the updated PHP code to insert the user into the table.
Listing 6. Add_user_good.php
<?php
require_once("DB.php");
function add_user( $name, $pass )
{
$dsn = 'mysql://root:password@localhost/good_genid';
$db =& DB::Connect( $dsn, array() );
if (PEAR::isError($db)) { die($db->getMessage()); }
$sth = $db->prepare( "INSERT INTO users VALUES(null,?,?)" );
$db->execute( $sth, array( $name, $pass ) );
$res = $db->query( "SELECT last_insert_id()" );
$id = null;
while( $res->fetchInto( $row ) ) { $id = $row[0]; }
return $id;
}
$id = add_user( 'jerry', 'pass' );
var_dump( $id );
?>
Now instead of getting the maximum id value, I directly use the INSERT statement to insert the data, and then use the SELECT statement to retrieve the id of the last inserted record. The code is much simpler and more efficient than the original version and its associated patterns.
Question 3: Using multiple databases
Occasionally, we will see an application where each table is in a separate database. This is reasonable in very large databases, but for general applications this level of partitioning is not required. Additionally, relational queries cannot be performed across databases, which takes away from the whole idea of using a relational database, not to mention that it would be more difficult to manage tables across multiple databases. So, what should multiple databases look like? First, you need some data. Listing 7 shows such data divided into four files.
Listing 7. Database file
Files.sql:
CREATE TABLE files (
id MEDIUMINT,
user_id MEDIUMINT,
name TEXT,
path TEXT
);
Load_files.sql:
INSERT INTO files VALUES ( 1, 1, 'test1.jpg', 'files/test1.jpg' );
INSERT INTO files VALUES (2, 1, 'test2.jpg', 'files/test2.jpg' );
Users.sql:
DROP TABLE IF EXISTS users;
CREATE TABLE users (
id MEDIUMINT,
login TEXT,
password TEXT
);
Load_users.sql:
INSERT INTO users VALUES ( 1, 'jack', 'pass' );
INSERT INTO users VALUES (2, 'jon', 'pass' );
In the multi-database version of these files, you should load the SQL statement into one database and then load the users SQL statement into another database. The PHP code used to query the database for files associated with a specific user is shown below.
Listing 8. Getfiles.php
<?php
require_once("DB.php");
function get_user( $name )
{
$dsn = 'mysql://root:password@localhost/bad_multi1';
$db =& DB::Connect( $dsn, array() );
if (PEAR::isError($db)) { die($db->getMessage()); }
$res = $db->query( "SELECT id FROM users WHERE login=?",array( $name ) ) ;
$uid = null;
while( $res->fetchInto( $row ) ) { $uid = $row[0]; }
return $uid;
}
function get_files( $name )
{
$uid = get_user( $name );
$rows = array();
$dsn = 'mysql://root:password@localhost/bad_multi2';
$db =& DB::Connect( $dsn, array() );
if (PEAR::isError($db)) { die($db->getMessage()); }
$res = $db->query( "SELECT * FROM files WHERE user_id=?",array( $uid ) ) ;
while( $res->fetchInto( $row ) ) { $rows[] = $row; }
return $rows;
}
$files = get_files( 'jack' );
var_dump( $files );
?>
The get_user function connects to the database containing the users table and retrieves the ID of a given user. The get_files function connects to the files table and retrieves the file rows associated with a given user.
A better way to do all of these things is to load the data into a database and then execute a query, such as the one below.
Listing 9. Getfiles_good.php
<?php
require_once("DB.php");
function get_files( $name )
{
$rows = array();
$dsn = 'mysql://root:password@localhost/good_multi';
$db =& DB::Connect( $dsn, array() );
if (PEAR::isError($db)) { die($db->getMessage()); }
$res = $db->query("SELECT files.* FROM users, files WHERE
users.login=? AND users.id=files.user_id",
array( $name ) );
while( $res->fetchInto( $row ) ) { $rows[] = $row; }
return $rows;
}
$files = get_files( 'jack' );
var_dump( $files );
?>
The code is not only shorter, but also easier to understand and efficient. Instead of executing two queries, we execute one query.
Although this question may sound far-fetched, in practice we usually conclude that all tables should be in the same database unless there is a very compelling reason.
Question 4: Not using relationships
Relational databases are different from programming languages in that they do not have array types. Instead, they use relationships between tables to create a one-to-many structure between objects, which has the same effect as an array. One problem I've seen in applications is where engineers try to use the database like a programming language, by creating arrays using text strings with comma-separated identifiers. See the pattern below.
Listing 10. Bad.sql
DROP TABLE IF EXISTS files;
CREATE TABLE files (
id MEDIUMINT,
name TEXT,
path TEXT
);
DROP TABLE IF EXISTS users;
CREATE TABLE users (
id MEDIUMINT,
login TEXT,
password TEXT,
files TEXT
);
INSERT INTO files VALUES ( 1, 'test1.jpg', 'media/test1.jpg' );
INSERT INTO files VALUES (2, 'test1.jpg', 'media/test1.jpg' );
INSERT INTO users VALUES (1, 'jack', 'pass', '1,2' );
A user in the system can have multiple files. In programming languages, an array should be used to represent the files associated with a user. In this example, the programmer chooses to create a files field that contains a comma-separated list of file ids. To get a list of all files for a particular user, the programmer must first read the rows from the users table, then parse the text of the files and run a separate SELECT statement for each file. The code is shown below.
Listing 11. Get.php
<?php
require_once("DB.php");
function get_files( $name )
{
$dsn = 'mysql://root:password@localhost/bad_norel';
$db =& DB::Connect( $dsn, array() );
if (PEAR::isError($db)) { die($db->getMessage()); }
$res = $db->query( "SELECT files FROM users WHERE login=?",array( $name ) ) ;
$files = null;
while( $res->fetchInto( $row ) ) { $files = $row[0]; }
$rows = array();
foreach( split( ',',$files ) as $file )
{
$res = $db->query( "SELECT * FROM files WHERE id=?",
array( $file ) );
while( $res->fetchInto( $row ) ) { $rows[] = $row; }
}
return $rows;
}
$files = get_files( 'jack' );
var_dump( $files );
?>
The technology is slow, difficult to maintain, and does not make good use of the database. The only solution is to re-architect the schema to convert it back to traditional relational form as shown below.
Listing 12. Good.sql
DROP TABLE IF EXISTS files;
CREATE TABLE files (
id MEDIUMINT,
user_id MEDIUMINT,
name TEXT,
path TEXT
);
DROP TABLE IF EXISTS users;
CREATE TABLE users (
id MEDIUMINT,
login TEXT,
password TEXT
);
INSERT INTO users VALUES ( 1, 'jack', 'pass' );
INSERT INTO files VALUES ( 1, 1, 'test1.jpg', 'media/test1.jpg' );
INSERT INTO files VALUES (2, 1, 'test1.jpg', 'media/test1.jpg' );
Here, each file is related to the user in the file table through the user_id function. This may go against the grain of anyone who thinks of multiple files as arrays. Of course, arrays do not reference the objects they contain—in fact, vice versa. But in a relational database, that's how it works, and queries are much faster and simpler because of it. Listing 13 shows the corresponding PHP code.
Listing 13. Get_good.php
<?php
require_once("DB.php");
function get_files( $name )
{
$dsn = 'mysql://root:password@localhost/good_rel';
$db =& DB::Connect( $dsn, array() );
if (PEAR::isError($db)) { die($db->getMessage()); }
$rows = array();
$res = $db->query("SELECT files.* FROM users,files WHERE users.login=?
AND users.id=files.user_id",array( $name ) );
while( $res->fetchInto( $row ) ) { $rows[] = $row; }
return $rows;
}
$files = get_files( 'jack' );
var_dump( $files );
?>
Here, we make a query to the database to get all the rows. The code is not complex, and it uses the database as it was intended.
Question 5: n+1 Pattern
I can’t tell you how many times I’ve seen large applications where the code first retrieves a few entities (let’s say customers) and then goes back and forth to retrieve them one by one to get each entity’s Details. We call this n+1 mode because the query is executed so many times - one query retrieves the list of all entities, and then one query is executed for each of the n entities. This is not a problem when n=10, but what about n=100 or n=1000? Then there are bound to be inefficiencies. Listing 14 shows an example of this pattern.
Listing 14. Schema.sql
DROP TABLE IF EXISTS authors;
CREATE TABLE authors (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
name TEXT NOT NULL,
PRIMARY KEY(id)
);
DROP TABLE IF EXISTS books;
CREATE TABLE books (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
author_id MEDIUMINT NOT NULL,
name TEXT NOT NULL,
PRIMARY KEY(id)
);
INSERT INTO authors VALUES ( null, 'Jack Herrington' );
INSERT INTO authors VALUES ( null, 'Dave Thomas' );
INSERT INTO books VALUES ( null, 1, 'Code Generation in Action' );
INSERT INTO books VALUES ( null, 1, 'Podcasting Hacks' );
INSERT INTO books VALUES ( null, 1, 'PHP Hacks' );
INSERT INTO books VALUES ( null, 2, 'Pragmatic Programmer' );
INSERT INTO books VALUES ( null, 2, 'Ruby on Rails' );
INSERT INTO books VALUES ( null, 2, 'Programming Ruby' );
The pattern is reliable and there are no errors in it. The problem lies in the code that accesses the database to find all books by a given author, as shown below.
Listing 15. Get.php
<?php
require_once('DB.php');
$dsn = 'mysql://root:password@localhost/good_books';
$db =& DB::Connect( $dsn, array() );
if (PEAR::isError($db)) { die($db->getMessage()); }
function get_author_id( $name )
{
global $db;
$res = $db->query( "SELECT id FROM authors WHERE name=?",array( $name ) );
$id = null;
while( $res->fetchInto( $row ) ) { $id = $row[0]; }
return $id;
}
function get_books( $id )
{
global $db;
$res = $db->query( "SELECT id FROM books WHERE author_id=?",array( $id ) );
$ids = array();
while( $res->fetchInto( $row ) ) { $ids []= $row[0]; }
return $ids;
}
function get_book( $id )
{
global $db;
$res = $db->query( "SELECT * FROM books WHERE id=?", array( $id ) );
while( $res->fetchInto( $row ) ) { return $row; }
return null;
}
$author_id = get_author_id( 'Jack Herrington' );
$books = get_books( $author_id );
foreach( $books as $book_id ) {
$book = get_book( $book_id );
var_dump( $book );
}
?>
If you look at the code below, you might be thinking, "Hey, this is really clear and simple." First, get the author id, then get the list of books, and then get the information about each book. Yes, it's clear and simple, but is it efficient? The answer is no. See how many queries are executed just to retrieve Jack Herrington's books. Once to get the id, another time to get the list of books, then perform a query per book. Three books require five queries!
The solution is to use a function to perform a large number of queries as shown below.
Listing 16. Get_good.php
<?php
require_once('DB.php');
$dsn = 'mysql://root:password@localhost/good_books';
$db =& DB::Connect( $dsn, array() );
if (PEAR::isError($db)) { die($db->getMessage()); }
function get_books( $name )
{
global $db;
$res = $db->query("SELECT books.* FROM authors,books WHERE books.author_id=authors.id AND authors.name=?",
array( $name ) );
$rows = array();
while( $res->fetchInto( $row ) ) { $rows []= $row; }
return $rows;
}
$books = get_books( 'Jack Herrington' );
var_dump( $books );
?>
Retrieving the list now requires a quick, single query. This means I will most likely have to have several methods of these types with different parameters, but there's really no choice. If you want to have a scalable PHP application, you must use the database efficiently, which means smarter queries.
The problem with this example is that it's a little too clear. Generally speaking, these types of n+1 or n*n problems are much more subtle. And they only appear when the database administrator runs Query Profiler on the system when the system has performance issues.
Conclusion
Databases are powerful tools, and like all powerful tools, you can abuse them if you don't know how to use them correctly. The trick to identifying and solving these problems is to better understand the underlying technology. I've long heard business logic writers complain that they don't want to have to understand databases or SQL code. They use the database as an object and wonder why performance is so poor.
They fail to realize how important understanding SQL is to transforming a database from a difficult necessity into a powerful alliance. If you use databases every day but are not familiar with SQL, then please read The Art of SQL. This is a well-written and practical book that can guide you to a basic understanding of databases.