10 Tips for Writing High-Performance Web Applications

Author：Eve Cole Update Time：2009-06-30 16:10:59

Author: Rob Howard Translation: Tropical fish

This article discusses:

· General ASP.NET performance secrets

· Useful tips and tricks to improve ASP.NET performance

· Recommendations for using databases in ASP.NET

· Caching and background processing in ASP.NET

Writing a web application using ASP.NET is incredibly simple. It's so simple that many developers don't take the time to build their applications to perform well. In this article, I recommend 10 tips for writing high-performance web applications. I won't limit my discussion to ASP.NET applications, because ASP.NET applications are only a subset of Web applications. This article is not intended to be a definitive guide to optimizing the performance of web applications - a full book could easily do that. Instead, we should consider this article a good starting point.

Before I became a workaholic, I would go rock climbing a lot. Before doing any climbing, I prefer to look at the routes in a travel guide and read recommendations from people who have been to the summit. But no matter how well-written a guidebook is, you need actual climbing experience before attempting a challenging goal. Similarly, you can only learn how to write high-performance Web applications when you're faced with fixing performance problems or running a high-throughput site.

My personal experience comes from working as a Foundation Program Manager on the ASP.NET team at Microsoft, maintaining and managing www.asp.net , and helping architect Community Server, one of several well-known ASP.NET applications (ASP.NET Forums, .Text, and nGallery connected to one platform). I'm sure some of these tips that have helped me will be useful to you too.

You should consider separating your application into several logical layers. You may have heard of 3-tier (or n-tier) architecture. These are usually prescribed structural patterns that physically divide the business and/or hardware into functional divisions. If the system requires larger scale, more hardware can be easily added. However, that will produce a performance degradation associated with business and machine jumps, so we should avoid it. So whenever possible, try to run the ASP.NET page and the page's related components in the same application.

Because of code separation and boundaries between layers, using Web services or remoting can reduce performance by 20% or more.

The data layer is a little different because typically it's better to have hardware dedicated to the database. However, the cost of process jumping to the database is still high, so performance in the data layer should be your first consideration when optimizing your code.

Before investing in fixing your application's performance issues, make sure you analyze your application to discover the root cause of the problem. Key performance counters (such as the one that indicates the percentage of time spent performing garbage collection) are also very useful in finding out where the application spends the majority of its time. Although those places where time is spent are often less intuitive.

In this article I discuss two ways to improve performance: large-scale optimizations, such as using ASP.NET caching, and small-scale optimizations, which are often repeated. These small pieces of optimization are sometimes the most interesting. A small change you make to your code will be called thousands of times. Optimize large chunks and you may find a big jump in overall performance. By optimizing in small chunks, you might shave off a few microseconds from a given request, but cumulatively across all requests per day, you'll get an unexpected improvement in performance.

Performance in the data layer

When you start optimizing the performance of an application, there's one decisive test you can prioritize: Does the code need to access a database? If so, how often do you visit? Note that this test can also be applied to code that uses web services or remote control, but I won't cover those in this article.

If a database request is required in a certain code path in your code, and you find other places where you want to prioritize optimizations, such as string manipulation, stop and perform the critical tests first. Unless you have a really bad performance problem to deal with, your time will be better spent optimizing the time it takes to connect to the database, the amount of data returned, and the operations you do to and from the database. .

Now that I've covered the information in general, let's look at 10 tips to help your application perform better. I'll start with those that have the most obvious impact on improving performance.

Tip 1 - Return multiple result sets

Take a look at your database code to see if you have request paths that access the database more than once. Each such round trip reduces the number of requests your application can serve per second. By returning multiple result sets in a single database request, you can reduce the overall time consumed by database communication. After you reduce the number of requests your database server has to manage, you also make your system more scalable.

Generally you can use dynamic SQL statements to return multiple result sets, I prefer to use stored procedures. It's debatable whether you should put business logic in a stored procedure, but I think if the logic in a stored procedure can limit the data returned (reduces the size of the data set, the time spent on the network connection, and eliminates the need for filtering logic layer data), then it is a good thing.

Using a SqlCommand instance and its ExecuteReader method to generate a strongly typed business class, you can move the result set pointer forward by calling NextResult. Figure 1 shows an example session that uses a defined class to generate several ArrayLists. Returning only the data you need from the database will significantly reduce memory requests on your server.

1// read the first resultset
2reader = command.ExecuteReader();
3
4// read the data from that resultset
5while (reader.Read()) {
6 suppliers.Add(PopulateSupplierFromIDataReader(reader));
7}
8
9// read the next resultset
10reader.NextResult();
11
12// read the data from that second resultset
13while (reader.Read()) {
14 products.Add(PopulateProductFromIDataReader(reader));
15}
16
17

Tip 2 - Paginated data access

ASP.NET's DataGrid provides a great capability: support for data paging. When pagination is set up in the DataGrid, a specific number of results will be displayed at a time. Additionally, a paging UI for navigating between results is displayed at the bottom of the DataGrid. The paginated UI allows you to navigate forward or backward through the displayed data, displaying a specific number of results per page.

But there is a small problem. When using DataGrid for paging, all data needs to be bound to the table. For example, your data layer will need to return all data, and then the DataGrid will need to populate all records to be displayed based on the current page. If 100,000 records are returned when you use DataGrid pagination, 99,975 records will be discarded per request (assuming the capacity of each page is 25 records). When the number of records grows, the performance of the application is greatly affected because more and more data must be returned with each request.

One way to write better pagination code is to use stored procedures. Figure 2 shows a sample stored procedure that pages the Orders data table in the Nothwind database. Basically, all you need to do here is pass in the page's index and the page's capacity. The database calculates the appropriate result sets and returns them.

1CREATE PROCEDURE northwind_OrdersPaged
2(
3 @PageIndex int,
4 @PageSize int
5)
6AS
7BEGIN
8DECLARE @PageLowerBound int
9DECLARE @PageUpperBound int
10DECLARE @RowsToReturn int
11
12-- First set the rowcount
13SET @RowsToReturn = @PageSize * (@PageIndex + 1)
14SET ROWCOUNT @RowsToReturn
15
16--Set the page bounds
17SET @PageLowerBound = @PageSize * @PageIndex
18SET @PageUpperBound = @PageLowerBound + @PageSize + 1
19
20-- Create a temp table to store the select results
21CREATE TABLE #PageIndex
twenty two(
23 IndexId int IDENTITY (1, 1) NOT NULL,
24 OrderID int
25)
26
27--Insert into the temp table
28INSERT INTO #PageIndex (OrderID)
29SELECT
30 OrderID
31FROM
32 Orders
33ORDER BY
34 OrderID DESC
35
36--Return total count
37SELECT COUNT(OrderID) FROM Orders
38
39-- Return paged results
40SELECT
41 O.*
42FROM
43 Orders O,
44 #PageIndex PageIndex
45WHERE
46 O.OrderID = PageIndex.OrderID AND
47 PageIndex.IndexID > @PageLowerBound AND
48 PageIndex.IndexID < @PageUpperBound
49ORDER BY
50 PageIndex.IndexID
51
52END
53
54

During the community service period, we wrote a paging server control to do these data paging. You'll notice that I used the idea discussed in Tip 1 to return two result sets from a stored procedure: the total number of records and the requested data.

The total number of records returned can vary depending on the request performed. For example, a WHERE clause can be used to constrain the data returned. We must know the total number of records to return in order to calculate the total number of pages to display in the paginated UI. For example, if there are 1,000,000 total records, and a WHERE clause is used to filter those records to 1,000 records, the paging logic needs to know the total number of records to submit the paging UI appropriately.

Tip 3 - Connection pooling

Establishing a TCP connection between your web application and SQL Server can be an expensive operation. Developers at Microsoft have been utilizing connection pooling for some time, allowing them to reuse connections to databases. Rather than establishing a new TCP connection for each request, a new connection is established only when there is no available connection in the connection pool. When the connection is closed, it is returned to the connection pool - it still maintains the connection to the database, rather than completely destroying the TCP connection.

Of course you need to be careful about leaked connections. Always close your connections when you are done using them. I repeat: no matter what anyone says about the Microsoft .NET Framework's garbage collection mechanism, you must always explicitly call the Close or Dispose method on your connection when you are done with it. Don't trust the Common Language Runtime (CLR) to clean up and close your connection for you at a predetermined time. The CLR will eventually destroy the class and force the connection to close, but you have no guarantee when the garbage collection mechanism on the object will actually be performed.

To achieve the best results using connection pooling, you need to follow a few rules. First, open a connection, do the work, then close the connection. It's okay if you have to (preferably applying tip 1) open and close the connection several times per request, it's much better than leaving the connection open and passing it to several different methods. Second, use the same connection string (and of course the same thread ID if you are using integrated authentication). If you don't use the same connection string, for example different custom connection strings based on the logged in user, you won't get the same optimal value that the connection pool provides. And if you use integrated authentication when impersonating a large number of users, your connection pool will also be much less efficient. The .NET CLR data performance counters can be useful when trying to track down any performance issues related to connection pooling.

Whenever your application connects to a resource, such as a database, or runs in another process, you should do this by focusing on the time it takes to connect to the resource, the time it takes to send and receive data, and the time it takes to send and receive data. There are a number of round trips to the database to optimize for. Optimizing any kind of process hop in your application is the first step to start achieving better performance.

The application layer contains the logic that connects to your data layer and transforms the data into meaningful class instances and logical processes. For example, in a community server, this is where you generate a forum or thread collection and apply business rules such as permissions; more importantly, this is where caching logic is performed.

Tip 4 - ASP.NET Buffering API

The first thing to consider before you start writing the first line of code in your application is to architect the application layer to maximize and take advantage of ASP.NET's caching features.

If your component runs within an ASP.NET application, you simply reference System.Web.dll in your application project. When you need to access the cache, use the HttpRuntime.Cache property (this object can also be accessed through Page.Cache and HttpContext.Cache).

There are several guidelines for using cached data. First, if the data can be used multiple times, then caching it is a good choice. Second, if the data is general and not specific to a specific request or user, then caching it is a great option. If the data is user or request specific but has a long lifetime, it can be cached but may not be used frequently. Third, an often overlooked principle is that sometimes you can cache too much. Typically on an x86 machine, to reduce the likelihood of out-of-memory errors, you'll want to run a process that uses no more than 800MB of private bytes. Therefore, caching should be limited. In other words, you might need to reuse the result of one calculation, but if that calculation requires ten parameters, you might need to try to cache 10 permutations, and that might get you into trouble. Out of memory errors due to overcaching are the most common in ASP.NET, especially with large data sets.

Caching has several great features that you need to know about. First, the cache implements a least recently used algorithm, allowing ASP.NET to force cache flushing—automatically removing unused items from the cache—when memory is running less efficiently. Second, the cache supports expired dependencies that can be forced to expire. These dependencies include time, keys, and files. Time is often used, but with ASP.NET 2.0, a new and more powerful invalidation type is introduced: database cache invalidation. It refers to automatically deleting items in the cache when the data in the database changes. For more information about database cache invalidation, see the Dino Esposito Cutting Edge column in the July 2004 issue of MSDN Magazine. To understand the architecture of the cache, see Figure 3.

Tip 5 — Per-request caching

Earlier in this article, I mentioned that small improvements to frequently traversed code paths can lead to large overall performance gains. Of these small improvements, one is definitely my favorite and I call it per-request caching.

Caching APIs are designed to cache data for a longer period of time, or until certain conditions are met, but per-request caching means caching data only for the duration of that request. For each request, a specific code path is accessed frequently, but the data is extracted, applied, modified, or updated only once. This sounds a bit theoretical, so let's give a concrete example.

In a community server forum application, each server control used on the page requires personalization data to determine what appearance to use, what style table to use, and other personalization data. Some of this data can be cached long-term, but some data is fetched only once per request and then reused multiple times during that request, such as for the appearance of a control.

To achieve per-request caching, use ASP.NET HttpContext. For each request, an HttpContext instance is created and is accessible from anywhere within the HttpContext.Current property during the request. The HttpContext class has a special Items collection property; objects and data added to this Items collection are cached only for the duration of the request. Just as you can use caching to store frequently accessed data, you can also use HttpContext.Items to store data that is only used on a per-request basis. The logic behind it is very simple: data is added to the HttpContext.Items collection when it does not exist, and in subsequent lookups, just the data in HttpContext.Items is returned.

Tip 6 — Background Processing

The path to code should be as fast as possible, right? There may be times when you find that you are performing a very resource-intensive task that is performed on every request or every n requests. Sending emails or analyzing and validating incoming data are some examples.

When dissecting ASP.NET Forums 1.0 and re-architecting the content that makes up the community server, we discovered that the code path for publishing new posts was extremely slow. Every time a new post is published, the application first needs to ensure that there are no duplicate posts, then it must analyze the post using a "bad words" filter, analyze the post's character emoticons, tag and index the post, and add the post to the request when requested. Add to the appropriate queue, validate the attachment, and finally send an email notification to all subscribers immediately after the post is published. Clearly, there's a lot involved.

After research, it was found that most of the time was spent on indexing logic and sending emails. Indexing posts is a very time-consuming operation and it was discovered that the built-in System.Web.Mail functionality connects to an SMTP server and then sends emails continuously. As the number of subscribers for a particular post or topic area increases, the AddPost function takes longer and longer to execute.

Email indexing is not required for every request. Ideally, we'd like to batch this operation, indexing 25 posts at a time or sending all emails every five minutes. We decided to use code that I had used to prototype the data cache invalidation that was eventually included in Visual Studio 2005.

The Timer class in the System.Threading namespace is very useful, but not very well-known in the .NET Framework, at least among Web developers. Once created, this Timer class will call the specified callback at a configurable interval for a thread in the ThreadPool. This means that you can set up your code to execute without incoming requests to the ASP.NET application, which is ideal for background processing. You can also perform operations such as indexing or sending emails in this background process.

However, there are several problems with this technology. If the application domain is uninstalled, this timer instance will stop firing events. In addition, because the CLR has a hard standard for the number of threads per process, there may be a situation in a heavily loaded server where the timer may not guarantee that the threads continue to complete the operation, and to some extent may causing delays. ASP.NET attempts to minimize the chance of this happening by keeping a certain number of threads available in the process and using only a portion of the total threads for request processing. However, this can be a problem if you have a lot of asynchronous operations.

There isn't enough room here for this code, but you can download an easy-to-understand example at www.rob-howard.net . Check out the slides and demos from the Blackbelt TechEd 2004 presentation.

Tip 7 — Page Output Caching and Proxy Servers

ASP.NET is your presentation layer (or should be your presentation layer); it consists of pages, user controls, server controls (HttpHandlers and HttpModules), and the content they generate. If you have an ASP.NET page that generates output (HTML, XML, images, or any other data), and when you run this code on every request it generates the same output, then you have a tool that can be used to A great alternative to page output caching.

By adding the following line to the top of the page:

<%@ Page OutputCache VaryByParams="none" Duration="60" %>

You can effectively generate output for this page once and then reuse it multiple times for up to 60 seconds, at which time the page will be re-executed and the output will be added to the ASP.NET cache again. This behavior can also be accomplished using some low-level programmable APIs. There are several configurable settings for output caching, such as the VaryByParams property just mentioned. VaryByParams is just requested, but also allows you to specify HTTP GET or HTTP POST parameters to change cache entries. For example, simply set VaryByParam="Report" to cache output for default.aspx?Report=1 or default.aspx?Report=2. Additional parameters can be specified by specifying a semicolon-separated list.

Many people don't realize that when output caching is used, ASP.NET pages also generate HTTP headers that flow down to caching servers, such as those used by Microsoft Internet Security and Acceleration Server or Akamai. After setting the HTTP cache table header, documents can be cached on these network resources and client requests can be satisfied without returning to the origin server.

Therefore, using page output caching will not make your application more efficient, but it may reduce the load on the server because the downstream caching technology caches the document. Of course, this can only be anonymous content; once it goes downstream, you'll never see the requests again, and you'll no longer be able to perform authentication to prevent access to it.

Tip 8 — Run IIS 6.0 (even just to use the kernel cache)

If you are not running IIS 6.0 (Windows Server 2003), you are missing out on some great performance enhancements in Microsoft Web servers. In Tip 7, I discussed output caching. In IIS 5.0, requests go through IIS and then into ASP.NET. When it comes to caching, the HttpModule in ASP.NET receives the request and returns the contents of the cache.

If you are using IIS 6.0, you will find a nice little feature called kernel cache that does not require any code changes to ASP.NET. When a request is made for output caching by ASP.NET, the IIS kernel cache receives a copy of the cached data. When a request comes from a network driver, the kernel-level driver (without context switching to user mode) receives the request, flushes the cached data to the response if cached, and then completes execution. This means that when you use kernel-mode caching with IIS and ASP.NET output caching, you will see incredible performance results. During the development of ASP.NET in Visual Studio 2005, I was the development manager responsible for ASP.NET performance. The developers do the specific work, but I get to see all the reporting that goes on every day. Kernel mode cache results are always the most interesting. The most common characteristic is that the network is flooded with requests/responses, while IIS is running at only about 5% CPU usage. This is shocking! There are of course other reasons to use IIS 6.0, but kernel-mode caching is the most obvious one.

Tip 9 — Use Gzip Compression

Although using gzip is not necessarily a server performance trick (as you may see an increase in CPU usage), using gzip compression can reduce the number of bytes sent by the server. This results in a perceived increase in page speed and reduced bandwidth usage. Depending on the data being sent, how compressible it can be, and whether the client browser supports it (IIS will only send gzip-compressed content to clients that support gzip compression, such as Internet Explorer 6.0 and Firefox), your server can serve More requests. In fact, almost every time you reduce the amount of data returned, you increase the number of requests per second.

Gzip compression is built into IIS 6.0, and its performance is much better than the gzip compression used in IIS 5.0, which is good news. Unfortunately, when trying to turn on gzip compression in IIS 6.0, you may not be able to find the setting in the Properties dialog of IIS. The IIS team built excellent gzip functionality into the server, but forgot to include an administrative UI to enable it. To enable gzip compression, you have to dig deep inside the XML configuration settings of IIS 6.0 (so that you don't get weak of heart). Incidentally, credit goes to Scott Forsyth of OrcsWeb for helping me raise this issue with the www.asp.net server hosted on OrcsWeb.

This article will not describe the steps. Please read Brad Wilson's article at IIS6 Compression. There is also a knowledge base article on enabling compression for ASPX at Enable ASPX Compression in IIS. You should note, however, that due to some implementation details, dynamic compression and kernel caching cannot exist simultaneously in IIS 6.0.

Tip 10 — Server Control View State

View state is an interesting name for ASP.NET that stores some state data in hidden output fields of the generated page. When the page is sent back to the server, the server can parse, validate, and apply this view state data back to the page's control tree. View state is a very powerful feature because it allows state to be persisted with the client, and it does not require cookies or server memory to save this state. Many ASP.NET server controls use view state to persist settings created during interactions with page elements, such as saving the current page that is displayed when paginating data.

However, using view state also has some disadvantages. First, it increases the overall load on the page when it is served or requested. Additional overhead also occurs when serializing or deserializing view state data sent back to the server. Finally, view state increases memory allocation on the server.

Several server controls have a tendency to overuse view state even when it is not needed, the most notable of which is the DataGrid. The default behavior of the ViewState property is on, but you can turn it off at the control or page level if you don't need it. Within the control, simply set the EnableViewState property to false, or set it globally on the page using the following setting:

<%@ Page EnableViewState="false" %>

If you don't post back the page, or always regenerate the controls on the page on every request, you should disable view state at the page level.

summary

I've given you some tips that I find helpful when writing high-performance ASP.NET applications. As I mentioned earlier in this article, this is a preliminary guide and not the final word on ASP.NET performance. (For information about improving ASP.NET application performance, see Improving ASP.NET Performance.) The best way to solve a specific performance problem can only be found through your own experience. However, these tips should give you some good guidance on your journey. In software development, there are few absolutes; every application is unique.

See the sidebar "Common Performance Myths."

Rob Howard is the founder of Telligent Systems, specializing in high-performance Web applications, knowledge base management, and collaboration systems. Rob was previously employed by Microsoft, where he helped design the infrastructure for ASP.NET 1.0, 1.1, and 2.0. To contact Rob, please visit [email protected] .

Original link: http://msdn.microsoft.com/msdnmag/issues/05/01/ASPNETPerformance/default.aspx