MySQL strategies to improve the efficiency of data loading operations

Author：Eve Cole Update Time：2010-01-12 18:11:11

This article introduces MySQL's strategies for improving the efficiency of data loading operations. Many times you are concerned with optimizing SELECT queries because they are the most commonly used queries, and determining how to optimize them is not always straightforward. Loading data into a database is relatively straightforward. Many times you are concerned with optimizing SELECT queries because they are the most commonly used queries, and determining how to optimize them is not always straightforward. Loading data into a database is relatively straightforward. However, there are strategies that can be used to improve the efficiency of data loading operations, the basic principles of which are as follows:

Bulk loading is faster than single-row loading because the index cache does not need to be flushed after each record is loaded; it can be flushed after the batch of records is loaded.

Loading a table without an index is faster than loading after an index. If there are indexes, not only must records be added to the data files, but each index must be modified to reflect the addition of the new records.

Shorter SQL statements are faster than longer SQL statements because they involve less analysis on the server side and because they are faster to send over the network from the client to the server. Some of these factors may seem trivial (especially the last one), but if you're loading a large amount of data, even small factors can make a big difference in the results. We can use the above general principles to derive several practical conclusions about how to load data fastest:

LOAD DATA (in all its forms) is more efficient than INSERT because it loads rows in batches. Index refreshes are fewer, and the server only has to parse and interpret one statement instead of several.

LOAD DATA is more efficient than LOAD DATA LOCAL. With LOAD DATA, the file must be located on the server and must have FILE permissions, but the server can read the file directly from disk. With LOAD DATA LOCAL, the client reads the file and sends it over the network to the server, which is slow.

If you must use INSERT, you should use a form that allows multiple rows to be specified in a single statement, such as:

The more rows you can specify in a statement, the better. This reduces the number of statements required and reduces the amount of index refreshes. If you use mysqldump to generate a database backup file, you should use the --extended-insert option so that the dump file contains multi-line INSERT statements. You can also use --opt (optimization), which enables the --extended-insert option. Conversely, use of the --complete-insert option to mysqldump should be avoided; this option causes INSERT statements to be single-line, take longer to execute, and require more analysis than statements generated without the --complete-insert option.

Use compressed client/server protocols to reduce network data traffic. For most MySQL clients, this can be specified with the --compress command line option. It is generally only used on slower networks because compression requires a lot of processor time.

Let MySQL insert default values; do not specify columns in the INSERT statement that will be assigned default values in any way. On average, this results in shorter statements and reduces the number of characters sent over the network to the server. Additionally, statements containing fewer values require less analysis and transformation by the server.

If the table is indexed, you can use bulk inserts (LOAD DATA or multi-row INSERT statements) to reduce index overhead. This minimizes the impact of index updates because the index only needs to be refreshed when all rows have been processed, rather than after each row.

If you need to load a large amount of data into a new table, you should create the table and load it when it is not indexed, and then create the index after loading the data. This is faster. Creating the index once (rather than modifying it once per row) is faster.

If you drop or disable an index before loading, recreating or enabling the index after loading the data may make the load faster. If you want to use a delete or disable strategy for data loading, be sure to do some experiments to see if it's worth it (if you're loading a small amount of data into a large table, rebuilding and indexing may take longer than loading the data) ).

DROP INDEX and CREATE INDEX can be used to drop and rebuild indexes. An alternative is to disable and enable indexes using myisamchk or isamchk. This requires an account on the MySQL server host with write access to the table files. To disable table indexes, enter the corresponding database directory and execute one of the following commands:

Use myisamchk for MyISAM tables with index files with a .MYI extension, and isamchk for ISAM tables with index files with an .ISM extension. After loading data into the table, activate the index as follows:

If you decide to use index disabling and activation, you should use the table repair locking protocol described in Chapter 13 to prevent the server from changing locks at the same time (although the table is not repaired at this time, it is modified like a table repair process, so you need to use same locking protocol).

The data loading principles described above also apply to fixed queries related to clients that need to perform different operations. For example, you generally want to avoid running long SELECT queries on frequently updated tables. Long-running SELECT queries can create a lot of contention and reduce writer performance. One possible solution is to first store the records in a temporary table and then periodically add the records to the main table if the writes are primarily INSERT operations. This is not a feasible approach if immediate access to new records is required. But this method can be used as long as they are not accessed for a short period of time. There are two benefits to using temporary tables. First, it reduces contention with the SELECT query statement on the main table and, therefore, executes faster. Second, the total time to load records from the temporary table into the main table is less than the total time to load the records separately; the corresponding index cache only needs to be refreshed at the end of each batch load, rather than after each row load. One application of this strategy is to access the MySQL database from the Web page of the Web server. In this scenario, there may not be a higher level of authority that guarantees immediate entry of the record into the main table.

If the data is not exactly the kind of single record that would be inserted in the event of a system shutdown, another strategy to reduce index refreshes is to use the DELAYED_KEY_WRITE table creation option for MyISAM tables (which may be possible if MySQL is used for some data entry work). This will happen). This option causes the index cache to be refreshed only occasionally, rather than after every insert.

If you want to take advantage of delayed index refresh server-wide, just start mysqld with the --delayed-key-write option. In this scenario, index block writes are delayed until the block must be flushed to make room for other index values, until a flush-tables command is executed, or until the index table is closed.