Database summary

Author：Eve Cole Update Time：2009-11-27 16:31:19

SQL performance optimization is a big challenge for programmers, because we often encounter this problem: when we develop a project, we feel that the functional experience of testing it by ourselves is really good, but after the actual project is launched, With the massive increase in data, the customer experience of the system is getting worse and worse. Of course, in addition to the framework and unreasonable code, the main reason is that SQL has not been optimized, which has caused the system to become slower and slower.

Because I work in a small company, I do everything. I think sometimes it’s better to treat the symptoms than to treat the root cause! There are several issues to pay attention to:

1. The design of the database table must be reasonable, especially the design of the primary key. If the amount of data in the table is very large, the design of the primary key should not be meaningful, just like ROWID, such as the GUID of SQL Server, the UUID of Hibernate, etc. Of course, some data dictionary tables can be processed flexibly, and there is no need to consider that they must be physical primary keys. In primary key design, composite primary keys are generally not used.

2. Reasonable indexing. Index is a powerful tool and a good means to speed up our data query. But don’t add every field. The principle of indexing is just like the table of contents of a book. If the table of contents of your book is almost all the same name, you can imagine the following. How fast can you find the specific content according to the table of contents? The index does not necessarily have to be unique, but it should not have too many identical records. Furthermore, if more indexes are added, the TEMP table space will increase. When exporting the table and importing it into another database, the indexes will also reduce the efficiency of the import. At this time, you will also find that the UNDOTBS01 table space is abnormally large. Therefore, the index is a double-edged sword and must be applied reasonably.

3. I have seen some very professional articles about SQL optimization on the Internet, but I feel that I have not been able to use these in my project. Instead, I continued to experiment and discover some basic principles during the project. I personally think that there is only one principle of SQL optimization, which is to narrow the query scope as much as possible. This will definitely improve the efficiency, and Oracle itself can optimize the SQL we write, so what we have to do is to narrow the query scope as much as possible. Speaking of this, I think everyone will definitely think that indexing is a powerful tool to improve query speed. Indeed, it is just a means, and it also stems from the principle of narrowing the query scope.

Most of the SQL that needs to be optimized are multi-table join queries, and multi-table joins also include horizontal joins and vertical joins. The one we use most is vertical joins. Horizontal connection generally means that the field structures of two tables are basically the same, and some data records of one table should be changed into some records of another table, that is, Rows+Rows. Vertical connection means that we take some fields to be queried from table A, and some fields to be queried from table B, and then connect the tables taken out from tables A and B vertically using the common part, that is, Columns+Columns.

Horizontal join statement: select a.column1,a.column2 from tableA a union all select b.column1,b.column2 from tableB b

Note that when connecting horizontally, the number of columns must be the same, and the data types of the corresponding field columns must be the same. In fact, you can think of the tables to be unioned as one copy of the other, exactly the same. Someone may ask, if the columns I want to merge do have different columns, or there is no column at all, then you can use the following method

select d.dname,d.loc from dept1 d union all select '' dname, e.loc from dept e, look at "'' dname", we can easily find that you can find a substitute, use an empty string instead There are no fields, so they can be merged.

Vertical join statement: select a.column1,a.column2 from tableA a full outer join select b.column3,b.column4 from tableB b on a.aid=b.bid where..., this is a full outer join Format. This speed is indeed very fast, but you may not like the query, because there are some result rows that you may not want to see at all. Under normal circumstances, we use left outer join and right outer join more. The difference between the two is that left outer join is mainly based on the table corresponding to the join field on the left after on, and right outer join is just opposite. Of course you can also use left join, right join. During use, I still found that external connections are relatively faster.

To speed up the efficiency of vertical connection queries, the way is to nest queries. The following is an actual example from the project:

select c.customerid,c.receivedmoney,c.tollcollector,c.receiveddate,c.yearmonth,c.receivedlatefee,

c.receivedfee,c.receivedappend,c.jmman,c.jmmoney,c.name,d.chargeint from

(select a.customerid,a.receivedmoney,a.tollcollector,a.receiveddate,a.yearmonth,a.receivedlatefee,

a.receivedfee,a.receivedappend,a.jmman,a.jmmoney,b.name from

(select rf.customerid,rf.receivedmoney,rf.tollcollector,rf.receiveddate,rf.yearmonth,rf.receivedlatefee,

rf.receivedfee,rf.receivedappend,rf.jmman,rf.jmmoney from sf_receivedfee rf where

rf.electriccompanyid='1000000001' and rf.dealsign=0 and rf.yearmonth in(200811,200901,200903,200804,200805,200806,200807)

and rf.customerid=1000052545) a left outer join (select xe.employeeid,xe.name from xt_employee xe) b on a.tollcollector=b.employeeid)

c left outer join (select cp.chargeint,cp.customerid from sf_chargeprotocol cp where cp.customerid=1000052545) d

on c.customerid=d.customerid

You can see that in this example, we first filter out the records we need from each table using almost the same conditions, and then merge the records. In actual use, I found that this is nearly 60 times faster than direct link query. Although it is ugly and difficult to read, it does solve the SQL performance problem. The principle it uses is still to narrow the scope first, and then perform a connection query. If we connect and then filter, it is equivalent to merging two tables, and then fetching data based on conditions. The speed can be imagined.