Querying is the most commonly used operation in database technology. The process of query operation is relatively simple. First, the query SQL statement is issued from the client. After receiving the SQL statement sent by the client, the database server executes the SQL statement and then returns the query results to the client. Although the process is very simple, different query methods and database settings will have a great impact on query performance.
Therefore, this article discusses query optimization techniques commonly used in MySQL. Discussions include: improving query speed through query buffering; MySQL's automatic optimization of queries; index-based sorting; detection of unreachable queries and using various query options to improve performance.
1. Improve query speed through query buffering
Generally, when we use SQL statements to query, the database server will execute this SQL statement every time it receives SQL from the client. But when the exact same SQL statement is received within a certain interval (such as within 1 minute), it will be executed in the same way. Although this can ensure the real-time nature of the data, most of the time, the data does not require complete real-time, which means there can be a certain delay. If this is the case, executing the exact same SQL in a short period of time is not worth the gain.
Fortunately, MySQL provides us with the query buffering function (query buffering can only be used in MySQL 4.0.1 and above). We can improve query performance to a certain extent through query caching.
We can set the query buffer through the my.ini file in the MySQL installation directory. The setting is also very simple, just set query_cache_type to 1. After setting this attribute, before executing any SELECT statement, MySQL will check in its buffer whether the same SELECT statement has been executed. If so, and the execution result has not expired, then the query result will be returned directly to client. But when writing SQL statements, please note that MySQL's query buffer is case-sensitive. The following two SELECT statements are as follows: SELECT * from TABLE1
SELECT * FROM TABLE1
The two SQL statements above are completely different SELECTs for query buffering. Moreover, the query buffer does not automatically handle spaces. Therefore, when writing SQL statements, you should try to reduce the use of spaces, especially the spaces at the beginning and end of SQL (because the query cache does not automatically intercept the spaces at the beginning and end).
Although not setting up a query buffer may sometimes cause performance losses, there are some SQL statements that need to query data in real time, or are not used frequently (perhaps executed once or twice a day). This requires turning off buffering. Of course, you can turn off the query cache by setting the value of query_cache_type, but this will permanently turn off the query cache. MySQL 5.0 provides a method to temporarily turn off the query cache: SELECT SQL_NO_CACHE field1, field2 FROM TABLE1
Since the above SQL statement uses SQL_NO_CACHE, regardless of whether this SQL statement has been executed before, the server will not search in the buffer and will execute it every time.
We can also set query_cache_type in my.ini to 2, so that the query cache will be used only after SQL_CACHE is used. SELECT SQL_CALHE * FROM TABLE1
2. MySQL’s automatic optimization of queries
Indexes are very important for databases. Indexes can be used to improve performance during queries. But sometimes using indexes can reduce performance. We can look at the following SALES table: CREATE TABLE SALES
(
ID INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
NAME VARCHAR(100) NOT NULL,
PRICE FLOAT NOT NULL,
SALE_COUNT INT NOT NULL,
SALE_DATE DATE NOT NULL,
PRIMARY KEY(ID),
INDEX (NAME),
INDEX (SALE_DATE)
)
Suppose there are millions of pieces of data stored in this table, and we want to query the average price of product number 1000 in 2004 and 2005. We can write the following SQL statement: SELECT AVG(PRICE) FROM SALES
WHERE ID = 1000 AND SALE_DATE BETWEEN '2004-01-01' AND '2005-12-31';
If the quantity of this product is very large, it accounts for almost 50% or more of the records in the SALES table. Then using the index on the SALE_DATE field to calculate the average is a bit slow. Because if you use an index, you have to sort the index. When there are very many records that meet the conditions (such as accounting for 50% or more of the records in the entire table), the speed will slow down, so it is better to scan the entire table. Therefore, MySQL will automatically decide whether to use the index for query based on the proportion of data that meets the conditions in the entire table.
For MySQL, the index is not used when the proportion of the above query results to the records in the entire table is about 30%. This proportion is derived by MySQL developers based on their experience. However, the actual scale value will vary depending on the database engine used.
3. Index-based sorting
One of MySQL's weaknesses is its sorting. Although MySQL can query approximately 15,000 records in 1 second, MySQL can only use at most one index when querying. Therefore, if the WHERE condition already occupies the index, the index will not be used in sorting, which will greatly reduce the speed of the query. We can look at the following SQL statement: SELECT * FROM SALES WHERE NAME = “name” ORDER BY SALE_DATE DESC;
The index on the NAME field has been used in the WHERE clause of the above SQL, so the index will no longer be used when sorting SALE_DATE. In order to solve this problem, we can create a composite index on the SALES table: ALTER TABLE SALES DROP INDEX NAME, ADD INDEX (NAME, SALE_DATE)
In this way, the speed will be greatly improved when using the above SELECT statement to query. But be careful, when using this method, make sure there is no sorting field in the WHERE clause. In the above example, you cannot use SALE_DATE to query. Otherwise, although the sorting is faster, there is no separate index on the SALE_DATE field, so the query will Slow down.
4. Detection of unreachable queries
When executing SQL statements, you will inevitably encounter some conditions that must be false. The so-called must-false condition is that no matter how the data in the table changes, this condition is false. Such as WHERE value < 100 AND value > 200. We can never find a number that is both less than 100 and greater than 200.
If you encounter such query conditions, it is unnecessary to execute such SQL statements. Fortunately, MySQL can automatically detect this situation. For example, we can look at the following SQL statement: SELECT * FROM SALES WHERE NAME = “name1” AND NAME = “name2”
The above query statement searches for records whose NAME is equal to both name1 and name2. Obviously, this is an unreachable query, and the WHERE condition must be false. Before MySQL executes the SQL statement, it will first analyze whether the WHERE condition is an unreachable query. If so, the SQL statement will no longer be executed. To verify this. We first use EXPLAIN to test the following SQL: EXPLAIN SELECT * FROM SALES WHERE NAME = "name1"
The above query is a normal query. We can see that the table item in the execution information data returned by EXPLAIN is SALES. This shows that MySQL operates SALES. Look at the following statements again: EXPLAIN SELECT * FROM SALES WHERE NAME = “name1” AND NAME = “name2”
We can see that the table item is empty, which means that MySQL has not operated on the SALES table.
5. Use various query selections to improve performance
In addition to the normal use of the SELECT statement, MySQL also provides us with many options that can enhance query performance. As mentioned above, SQL_NO_CACHE and SQL_CACHE, which are used to control query buffering, are two of the options. In this section, I'll introduce a few commonly used query options.
1. STRAIGHT_JOIN: Force connection order
When we connect two or more tables for query, we don't need to care about which table MySQL connects first and which table it connects last. All of this is determined by a connection sequence determined by MySQL through a series of internal calculations and evaluations. In the following SQL statements, TABLE1 and TABLE2 are not necessarily connected to each other: SELECT TABLE1.FIELD1, TABLE2.FIELD2 FROM TABLE1, TABLE2 WHERE …
If developers need to manually intervene in the order of connections, they must use the STRAIGHT_JOIN keyword, such as the following SQL statement: SELECT TABLE1.FIELD1, TABLE2.FIELD2 FROM TABLE1 STRAIGHT_JOIN TABLE2 WHERE …
As can be seen from the above SQL statement, STRAIGHT_JOIN is used to force MySQL to join tables in the order of TABLE1 and TABLE2. If you think it is more efficient to join in your own order than the order recommended by MySQL, you can use STRAIGHT_JOIN to determine the connection order.
2. Intervene in index usage and improve performance
The use of indexes has been mentioned above. Under normal circumstances, MySQL will decide whether to use an index and which index to use when querying. But in some special cases, we want MySQL to use only one or a few indexes, or we do not want to use a certain index. This requires using some query options of MySQL to control the index.
Limit the scope of using indexes
Sometimes we create many indexes in the data table. When MySQL selects the index, these indexes are all considered. But sometimes we want MySQL to only consider a few indexes instead of all indexes. This requires using USE INDEX to set the query statement. SELECT * FROM TABLE1 USE INDEX (FIELD1, FIELD2) …
As can be seen from the above SQL statement, no matter how many indexes have been established in TABLE1, MySQL only considers the indexes established on FIELD1 and FIELD2 when selecting indexes.
Limit the range of indexes that are not used
If we have many indexes to consider and few unused indexes, we can use IGNORE INDEX for reverse selection. In the above example, the index that is considered is selected, while using IGNORE INDEX is to select the index that is not considered. SELECT * FROM TABLE1 IGNORE INDEX (FIELD1, FIELD2) …
In the above SQL statement, only the indexes on FIELD1 and FIELD2 in the TABLE1 table are not used.
Force the use of an index
The above two examples both provide MySQL with a choice, which means that MySQL does not have to use these indexes. Sometimes we hope that MySQL must use a certain index (since MySQL can only use one index when querying, it can only force MySQL to use one index). This requires the use of FORCE INDEX to complete this function. SELECT * FROM TABLE1 FORCE INDEX (FIELD1) …
The above SQL statement only uses the index built on FIELD1, not the indexes on other fields.
3. Use temporary tables to improve query performance
When there is a lot of data in the result set of our query, we can force the result set into a temporary table through the SQL_BUFFER_RESULT. option, so that the MySQL table lock can be quickly released (so that other SQL statements can query these records) ) and can serve large recordsets to clients for long periods of time. SELECT SQL_BUFFER_RESULT * FROM TABLE1 WHERE …
Similar to the SQL_BUFFER_RESULT. option, there is SQL_BIG_RESULT. This option is generally used for grouping or DISTINCT keywords. This option notifies MySQL that, if necessary, the query results will be placed in a temporary table, or even sorted in the temporary table. SELECT SQL_BUFFER_RESULT FIELD1, COUNT(*) FROM TABLE1 GROUP BY FIELD1
6. Conclusion
There is also a "20/80 principle" in programming, that is, 20% of the code takes 80% of the time. The same goes for database application development. The optimization of database applications focuses on the execution efficiency of SQL. The focus of data query optimization is to make the database server read less data from the disk and read pages sequentially instead of non-sequentially.