This article will introduce a practical problem that administrators often face during the database initialization process - how to import large volumes of data into the database system. During the database initialization process, a practical problem that administrators need to face is how to import large volumes of data into the database system. Some large-capacity data import and export tools are provided in the SQL Server database for database administrators to use. For example, administrators can use the bcp utility tool to export data in large volumes and import data in large volumes and generate formatted files. For example, bulk insert statements can be used to import large-capacity data directly from data files into database tables or unpartitioned views, etc.
Although these utilities are provided in SQL Server database. However, administrators still need to be cautious about this work in actual work. Specifically, during the process of importing and exporting large-capacity data, you need to pay attention to the following aspects.
1. Try to use the tools provided by the system to import and export large-capacity data.
The several practical tools mentioned by the author above have a common feature, that is, they will perform certain optimizations on large-capacity data during the import process. For example, it will standardize the format of some data, save time on import and export, etc. However, when using these tools, there is one difference from other ordinary data import and export tools, that is, large-capacity import operations do not support importing data in comma-delimited files, which are often referred to as text files. Although the administrator can also use other tools to import large-capacity data in file format at this time, I generally do not recommend this. Because other tools do not support the optimization function during the import process. For this reason, the author recommends using other tools, such as ACCESS and other intermediate tools, to first convert the data in the text file into an ordinary table form, and then use the above tools to import it into the system. Although this will increase the workload, it can ensure the quality of large-capacity data. For this reason, in my actual work, if users encounter this kind of problem, I strongly recommend that they use the tools provided by the system to import and export large-capacity data.
In addition, using formatted files to improve the standardization of large-capacity data is also a good choice. Several of the large-capacity import and export tools mentioned above support the use of specialized format files to store the format information of each field in the original data file. The format file can also contain information about the corresponding database table. Format files can be used to provide all the formatting information needed to bulk export data from and bulk import data into a database instance. In layman's terms, a format file provides a flexible way to interpret the format of data in a data file during import and to format data in a data file during export. This flexibility eliminates the need to write specialized code to interpret the data or to reformat the data to meet the special needs of the database or external application. If formatted files are used flexibly, users can directly export or import large-capacity data in the required format without having to make additional format adjustments afterwards.
2. Select the appropriate database log operation mode
As everyone knows, any changes made by users in the database will be recorded in the relevant logs. Importing and exporting large volumes of data is no exception. However, because the large-capacity data is relatively large, it will occupy a relatively large transaction log function. For this reason, the author recommends that before users import large-capacity data, it is best to choose a suitable database log operation mode. The author's approach is that if the user needs to import large-capacity data, it is best to choose the large-capacity log recovery mode. Wait until the import work is completed, and then return to the original mode.
This is mainly because in large-capacity log mode, the support for large-capacity data import work is relatively good. Compared with other logged recovery models (such as full recovery model), bulk-logged recovery model only minimally records bulk operations. For this reason, the large-capacity log operation recovery model protects large-volume operations from hardware failures, provides better performance, and takes up minimal log space. Therefore, using bulk-logged recovery helps prevent transaction logs from running out of space because bulk-logged recovery does not insert log lines. This bulk log operation mode is very suitable for databases using the full recovery model. The bulk-logged recovery model is useful when performing bulk operations on index-less tables.
However, the large-capacity log operation mode also has certain risks. A recovery model such as bulk log increases the risk of data loss for these bulk copy operations. Because the bulk-logged operating mode will prevent the database system from capturing the changes made to each transaction one by one. If a log backup contains bulk log operations, you cannot restore to a point in time in that log backup; you can only restore the entire log backup. Also in the bulk-logged recovery model, if the log backup covers any bulk operations, the log backup will contain the log records and data pages changed by the bulk operations. This is critical for capturing the results of bulk logging operations. Merged data areas can make log backups very large. Furthermore, backing up logs requires access to data files containing large volumes of log transactions. If any affected database files are inaccessible, the transaction log will not be backed up and all operations committed in this log will be lost. Therefore, the large-capacity log backup mode is not a safe log mode.
3. Consider whether you need to temporarily delete the index of the table first
The index is a special file, and its role in the database is very important. To put it simply, if the database is compared to a book, then the index is like the table of contents of the book. The index contains reference pointers to all records in the data table. There is no doubt that indexes can improve database performance. But indexing can not have a positive effect on every occasion. In some special cases, it will reduce the performance of some operations, such as the import of large-capacity data.
An index can speed up data retrieval operations, but it can make data modification operations slower. Because every time a data record is modified or inserted, the index must be refreshed. In other words, if one million records are inserted, the index must be refreshed one million times. It can be seen that when large-capacity data is imported, the index will consume a lot of database resources, thus reducing the performance of the database. If there are indexes in the destination table, it will not only affect the speed of large-volume data import into the database, but also reduce the performance of other users' normal access to the database.
For this reason, the author's suggestion is that if there is not a lot of data in the table to be imported, it is best to delete the index first to improve the performance of large-capacity data import. Re-enable indexing after importing. However, if there is already a lot of data in the table that needs to be imported, and the data that needs to be imported may be similar to or less than the existing data, then there is no need to delete the index. Deleting the index at this time will have the opposite effect. Because the time it takes for the database system to rebuild the indexes may be longer than the time saved during the bulk import operation. At this time, the administrator will lose more than the gain by deleting the index of the target table.
4. Perform database backup immediately after data import
Just like establishing database objects, after importing large-capacity data into the database system, administrators must back up the existing database in a timely manner. Because of the timely help of system large-capacity import tools, this data import work is still very tedious and time-consuming. For this reason, after the large-capacity data is successfully imported into the database system, the administrator must back up the database in a timely manner. What the author wants to remind everyone here is that the backup methods are often different in different operation log modes.
After importing large-capacity data, administrators need to back up the database. The author's suggestion is that if the administrator adopts a simple log recovery model at that time, the administrator should perform a full backup or differential backup immediately after the bulk import operation is completed (if time permits, it is best to perform a full backup) . And if the database administrator adopts the large-capacity log recovery model or the full recovery model at that time, if there is not much time or is worried that the full backup will affect the user's access at that time, then only performing a log backup is enough. If the database server has not become a production server (that is, there are still no users using it), it is safer to perform a full backup of the database.
5. Common mistakes
There are probably two most common errors during large-capacity data import.
First, the format of the provided file is incorrect. As mentioned above, usually the bulk import tool provided by the database does not support text files. Administrators need to perform prior conversions for this purpose. Second, be aware that hidden characters may cause problems. Many software and text editors display hidden characters. These hidden characters are usually located at the end of the data file. During bulk import operations, hidden characters in data files can cause unpredictable problems such as unexpected null character errors, etc. This mistake is easy to avoid. As long as the database administrator looks for and removes all hidden characters before importing the data. In fact, this problem will be encountered not only during the import of large-capacity data, but also during the import of small amounts of data.