Today, developers continue to develop and deploy applications using the LAMP (Linux®, Apache, MySQL, and PHP/Perl) architecture. However, server administrators often have little control over the applications themselves because someone else wrote the applications. This three-part series discusses a number of server configuration issues that can impact application performance. This article, the third and final part of this series, will focus on tuning the database layer for maximum efficiency.
Regarding MySQL tuning,
there are three methods to speed up the running speed of the MySQL server. The order of efficiency from low to high is:
Replace the problematic hardware. Tune the settings of the MySQL process. Optimize the query.
Replacing problematic hardware is often our first consideration, primarily because databases can take up a lot of resources. But this solution only goes so far. In fact, you can often double the speed of your central processing unit (CPU) or disk, and increase the memory by 4 to 8 times.
The second method is to tune the MySQL server (also known as mysqld). Tuning this process means allocating memory appropriately and letting mysqld know what type of load it will be subjected to. Speeding up disk operation is not as important as reducing the number of disk accesses required. Similarly, ensuring that the MySQL process operates correctly means that it spends more time servicing queries than it does on background tasks such as working with temporary disk tables or opening and closing files. Tuning mysqld is the focus of this article.
The best approach is to ensure that the query has been optimized. This means that appropriate indexes are applied to the table and queries are written in a way that takes full advantage of MySQL capabilities. Although this article does not cover query tuning (a topic that has been covered in many books), it does configure mysqld to report queries that may need tuning.
Although the order has been assigned to these tasks, you still need to pay attention to the hardware and mysqld settings to properly tune the query. It's okay if the machine is slow, I've seen very fast machines fail due to heavy load when running well-designed queries because mysqld was occupied by a lot of busy work and could not serve the query.
Logging slow queries
In a SQL server, data tables are stored on disk. Indexes provide the server with a way to find specific rows of data in a table without having to search the entire table. When the entire table must be searched, it is called a table scan. Generally speaking, you may only want to obtain a subset of the data in the table, so a full table scan will waste a lot of disk I/O, and therefore a lot of time. This problem is compounded when data must be joined, because multiple rows of data on both sides of the join must be compared.
Of course, table scans don't always cause problems; sometimes it's more efficient to read the entire table than to select a subset of the data (the query planner in the server process is used to make these decisions). If the index is used inefficiently, or cannot be used at all, it will slow down queries, and this problem will become more significant as the load on the server and the size of the table increase. Queries that take longer to execute than a given time range are called slow queries.
You can configure mysqld to log these slow queries to an appropriately named slow query log. Administrators will then review this log to help them determine which parts of the application require further investigation. Listing 1 shows the configuration that needs to be done in my.cnf to enable slow query logging.
Listing 1. Enabling MySQL slow query log
[mysqld]; enable the slow query log, default 10 secondslog-slow-queries; log queries taking longer than 5 secondslong_query_time = 5; log queries that don't use indexes even if they take less than long_query_time; MySQL 4.1 and newer onlylog- queries-not-using-indexes
These three settings are used together to log queries that take more than 5 seconds to execute and do not use indexes. Please note the warning about log-queries-not-using-indexes: you must be using MySQL 4.1 or higher. Slow query logs are saved in the MySQL data directory and are named hostname-slow.log. If you wish to use a different name or path, you can use log-slow-queries = /new/path/to/file in my.cnf to achieve this.
Reading slow query logs is best done through the mysqldumpslow command. By specifying the path to the log file, you can see a sorted list of slow queries, along with the number of times they occur in the log file. A very useful feature is that mysqldumpslow removes any user-specified data before comparing the results, so different calls to the same query are counted as one; this can help identify the query that requires the most work.
Caching queries
Many LAMP applications rely heavily on databases but execute the same queries over and over again. Every time a query is executed, the database must perform the same job - analyze the query, determine how to execute it, load the information from disk, and return the results to the client. MySQL has a feature called the query cache, which stores query results (which will be used later) in memory. In many cases, this will greatly improve performance. The problem, though, is that query caching is disabled by default.
Add query_cache_size = 32M to /etc/my.conf to enable a 32MB query cache.
Monitoring the Query Cache
After enabling the query cache, it is important to understand whether it is being used effectively. MySQL has several variables you can look at to understand what is going on in the cache. Listing 2 shows the status of the cache.
Listing 2. Displaying query cache statistics
mysql> SHOW STATUS LIKE 'qcache%';+-------------------------+------------+ | Variable_name | Value |+-------------------------+------------+| Qcache_free_blocks | 5216 | | Qcache_free_memory | 14640664 || Qcache_hits | 2581646882 || Qcache_inserts | 360210964 || Qcache_lowmem_prunes | 281680433 || 7 || Qcache_total_blocks | 47042 |+------------- -----------+----------------+8 rows in set (0.00 sec)
The explanation of these items is shown in Table 1.
Table 1. MySQL query cache variable variable name description
Qcache_free_blocks The number of adjacent memory blocks in the cache. A large number indicates there may be fragments. FLUSH QUERY CACHE defragments the cache to obtain a free block.
Qcache_free_memory Free memory in cache.
Qcache_hits is incremented each time a query hits in the cache.
Qcache_inserts is incremented each time a query is inserted. The miss ratio is the number of hits divided by the number of insertions; subtract this value from 1 to get the hit rate. In the above example, approximately 87% of the queries hit the cache.
Qcache_lowmem_prunes The number of times the cache ran out of memory and had to be purged to make room for more queries. This number is best viewed over a long period of time; if the number is growing, it may indicate severe fragmentation or low memory. (The free_blocks and free_memory above can tell you which case it is).
Qcache_not_cached The number of queries that are not suitable for caching, usually because they are not SELECT statements.
Qcache_queries_in_cache The number of queries (and responses) currently cached.
Qcache_total_blocks Number of blocks in the cache.
Often the difference can be seen by displaying these variables a few seconds apart, which can help determine whether the cache is being used efficiently. Running FLUSH STATUS can reset some counters, which can be very helpful if the server has been running for a while.
It's very tempting to use a very large query cache and expect to cache everything. Because mysqld must perform maintenance on the cache, such as performing pruning when memory becomes low, the server may get bogged down trying to manage the cache. As a rule, if the FLUSH QUERY CACHE takes a long time, the cache is too large.
Enforcing Limits
You can enforce limits in mysqld to ensure that system load does not cause resource exhaustion. Listing 3 shows some important resource-related settings in my.cnf.
Listing 3. MySQL resource settings
set-variable=max_connections=500set-variable=wait_timeout=10max_connect_errors = 100
The maximum number of connections is managed in the first line. Similar to MaxClients in Apache, the idea is to ensure that only the number of connections allowed by the service is made. To determine the maximum number of connections currently established on the server, execute SHOW STATUS LIKE 'max_used_connections'.
Line 2 tells mysqld to terminate any connections that have been idle for more than 10 seconds. In a LAMP application, the time it takes to connect to the database is usually the time it takes the web server to process the request. Sometimes, if the load is too heavy, the connection will hang and take up connection table space. If you have multiple interactive users or use persistent connections to the database, setting this value lower is not advisable!
The last line is a safe method. If a host has problems connecting to the server and retries many times before giving up, the host will be locked and cannot be run until after FLUSH HOSTS. By default, 10 failures are enough to cause a lockout. Changing this value to 100 will give the server enough time to recover from the problem. If the connection cannot be established after 100 retries, then using a higher value won't help much and it may not connect at all.
Buffers and Caching
MySQL supports over 100 adjustable settings; but fortunately, mastering a few will satisfy most needs. To find the correct values for these settings, you can view the status variables through the SHOW STATUS command, which can determine whether mysqld is operating as we expect. The memory allocated to buffers and caches cannot exceed the memory available on the system, so tuning usually requires some compromise.
MySQL tunable settings can apply to the entire mysqld process or to individual client sessions.
Server-side settings
Each table can be represented as a file on disk, which must be opened first and then read. To speed up the process of reading data from files, mysqld caches these open files up to a maximum number specified by table_cache in /etc/mysqld.conf. Listing 4 shows a way to display activity related to opening a table.
Listing 4. Displaying activities that open tables
mysql> SHOW STATUS LIKE 'open%tables';+---------------+-------+| Variable_name | Value |+-------- -------+-------+| Open_tables | 5000 || Opened_tables | 195 |+---------------+------- +2 rows in set (0.00 sec)
Listing 4 shows that there are currently 5,000 tables open and 195 tables need to be opened because there are no available file descriptors in the cache (since the statistics have been cleared earlier, there may be only 5,000 open tables). 195 open record cases). If Opened_tables increases rapidly with re-running the SHOW STATUS command, it indicates that the cache hit rate is insufficient. If Open_tables is much smaller than the table_cache setting, the value is too large (but having room to grow is never a bad thing). For example, use table_cache = 5000 to adjust the table's cache.
Similar to the table cache, there is also a cache for threads. mysqld spawns threads as needed when receiving connections. On a busy server where connections change quickly, caching threads for later use can speed up the initial connection.
Listing 5 shows how to determine whether enough threads are cached.
Listing 5. Displaying thread usage statistics
mysql> SHOW STATUS LIKE 'threads%';+-------------------+--------+| Variable_name | Value |+---- ---------------+--------+| Threads_cached | 27 || Threads_connected | 15 || Threads_created | 838610 || Threads_running | 3 |+---- ---------------+--------+4 rows in set (0.00 sec)
The important value here is Threads_created, this value is incremented every time mysqld needs to create a new thread. If this number increases rapidly when executing successive SHOW STATUS commands, you should try to increase the thread cache. For example, you can use thread_cache = 40 in my.cnf to achieve this.
The key buffer holds the index block of the MyISAM table. Ideally, requests for these blocks should come from memory rather than from disk. Listing 6 shows how to determine how many blocks were read from disk and how many were read from memory.
Listing 6. Determining keyword efficiency
mysql> show status like '%key_read%';+-------------------+-----------+| Variable_name | Value |+ ------------------+-----------+| Key_read_requests | 163554268 || Key_reads | 98247 |+------- ------------+-----------+2 rows in set (0.00 sec)
Key_reads represents the number of requests hitting the disk, and Key_read_requests is the total number. The miss ratio is the number of read requests that hit disk divided by the total number of read requests - in this case approximately 0.6 misses in memory for every 1,000 requests. If the number of disk hits exceeds 1 per 1,000 requests, you should consider increasing the keyword buffer. For example, key_buffer = 384M will set the buffer to 384MB.
Temporary tables can be used in more advanced queries where data must be saved to a temporary table before further processing (such as a GROUP BY clause); ideally, the temporary table is created in memory. But if the temporary table becomes too large, it needs to be written to disk. Listing 7 gives statistics related to temporary table creation.
Listing 7. Determining the use of temporary tables
mysql> SHOW STATUS LIKE 'created_tmp%';+--------------------------+-------+| Variable_name | Value | +-------------------------+-------+| Created_tmp_disk_tables | 30660 || Created_tmp_files | 2 || Created_tmp_tables | 32912 | +---------------------+-------+3 rows in set (0.00 sec)
Created_tmp_tables will be increased each time a temporary table is used; Created_tmp_disk_tables will also be increased for disk-based tables. There are no strict rules for this ratio, as it depends on the query involved. Watching Created_tmp_disk_tables over time will show you the ratio of disk tables created and you can determine the efficiency of your setup. Both tmp_table_size and max_heap_table_size control the maximum size of temporary tables, so make sure both values are set in my.cnf.
Per-session settings
The following settings are specific to each session. Be very careful when setting these numbers because when multiplied by the number of connections that may exist, these options represent a large amount of memory! You can modify these numbers within a session through code, or modify these settings in my.cnf for all sessions.
When MySQL must sort, it allocates a sort buffer to hold the rows of data as it is read from disk. If the data to be sorted is too large, the data must be saved to a temporary file on disk and sorted again. If the sort_merge_passes status variable is large, this indicates disk activity. Listing 8 shows some status counter information related to sorting.
Listing 8. Displaying sort statistics
mysql> SHOW STATUS LIKE "sort%";+-------------------+---------+| Variable_name | Value |+--- ----------------+---------+| Sort_merge_passes | 1 || Sort_range | 79192 || Sort_rows | 2066532 || Sort_scan | 44006 |+-- ------------------+---------+4 rows in set (0.00 sec)
If sort_merge_passes is large, it means you need to pay attention to sort_buffer_size. For example, sort_buffer_size = 4M sets the sort buffer to 4MB.
MySQL also allocates some memory to read the table. Ideally, an index provides enough information to read in only the rows you need, but sometimes a query (poorly designed or due to the nature of the data) needs to read a large amount of data from the table. To understand this behavior, you need to know how many SELECT statements were run and how many times the next row of data in the table needed to be read (rather than accessed directly through the index). The command to achieve this functionality is shown in Listing 9.
Listing 9. Determining table scan ratio
mysql> SHOW STATUS LIKE "com_select";+---------------+--------+| Variable_name | Value |+--------- ------+--------+| Com_select | 318243 |+----------+--------+1 row in set (0.00 sec)mysql> SHOW STATUS LIKE "handler_read_rnd_next";+-----------------------+----------- +| Variable_name | Value |+-----------------------+-----------+| Handler_read_rnd_next | 165959471 |+- -----------------------+----------+1 row in set (0.00 sec)
Handler_read_rnd_next / Com_select results in a table scan ratio - in this case 521:1. If the value exceeds 4000, you should check read_buffer_size, for example read_buffer_size = 4M. If this number exceeds 8M, it's time to discuss tuning these queries with the developers!
3 Essential Tools
Although the SHOW STATUS command can be very useful when understanding a specific setup, you will also need some tools to interpret the large amounts of data provided by mysqld. There are three tools that I find essential; you can find links to them in the Resources section.
Most system administrators are familiar with the top command, which provides a continuously updated view of the CPU and memory consumed by tasks. mytop emulates top; it provides a view of all connected clients and the queries they are running. mytop also provides a live and historical data on keyword buffer and query cache efficiency, as well as statistics on running queries. This is a useful tool to see what's going on in your system (say within 10 seconds), you can get a view of server health information and show any connections that are causing problems.
mysqlard is a daemon connected to the MySQL server, responsible for collecting data every 5 minutes and storing them in a Round Robin Database in the background. There is a Web page that displays data such as table cache usage, keyword efficiency, connected clients, and temporary table usage. Although mytop provides a snapshot of server health information, mysqlard provides long-term health information. As a bonus, mysqlard uses some of the information it has gathered to give some suggestions on how to tune the server.
Another tool for collecting SHOW STATUS information is mysqlreport. Its reporting is far more complex than mysqlard because every aspect of the server needs to be analyzed. This is a great tool for tuning your server because it performs appropriate calculations on state variables to help determine which problems need to be fixed.
Conclusion
This article introduced some basic knowledge of tuning MySQL and concluded this 3-part series on tuning LAMP components. Tuning largely involves understanding how components work, determining if they are working properly, making some adjustments, and re-evaluating. Each component - Linux, Apache, PHP or MySQL - has various requirements. Understanding each component individually can help reduce bottlenecks that can slow down your application.