10 Getting Started Tips for Writing High-Performance Web Applications

Author：Eve Cole Update Time：2009-07-01 16:07:09

It's incredible how easy it is to write Web applications using ASP.NET. Because of this simplicity, many developers don't take the time to structure their applications for better performance. In this article, I'll cover 10 tips for writing high-performance Web applications. But I wouldn't limit these suggestions to ASP.NET applications, since these applications are only part of a Web application. This article is not intended to be a definitive guide to performance tuning web applications—an entire book could not easily cover this subject. Consider this article a great starting point.

Before I became a workaholic, I used to enjoy rock climbing. Before any big climb I start by looking carefully at the routes in the guidebooks and reading the recommendations from previous visitors. But no matter how good the guide is, you need real rock climbing experience before you attempt a particularly challenging climb. Similarly, you can only learn how to write high-performance Web applications when you're faced with fixing performance problems or running a high-throughput site.

My personal experience comes from working as an Infrastructure Program Manager in the ASP.NET division at Microsoft, where I ran and managed www.ASP.NET and helped architect the community servers, one of several well-known
ASP.NET applications (ASP.NET Forums, .Text and nGallery combined into one platform). I'm sure some of the tips that helped me will help you too.

You should consider dividing your application into several logical layers. You may have heard the term 3-tier (or n-tier) physical architecture. These are typically prescribed architectural approaches that physically separate functionality between processes and/or hardware. When the system needs to expand, more hardware can be easily added. There is a performance hit associated with process and machine jumps, however, and should be avoided. So, if possible, try to run ASP.NET pages and their related components together in the same application.

Because of code separation and boundaries between layers, using Web services or remoting will degrade performance by 20% or more.

The data layer is a little different because typically it's better to have hardware dedicated to the database. However, the cost of process jumping to the database is still high, so the performance of the data layer is the first issue you should consider when optimizing the code.

Before you dive into performance fixes for your application, first make sure you profile your application to identify the specific issues. Key performance counters, such as those representing the percentage of time required to perform garbage collection, are also useful for finding out where an application spends its main time. However where time is spent is often very unintuitive.

This article describes two types of performance improvements: large optimizations (such as using ASP.NET caching), and small optimizations that repeat themselves. These small optimizations are sometimes particularly interesting. You make a small change in your code and you gain lots and lots of time. With large optimizations, you may see a large jump in overall performance. With small optimizations, you might only save a few milliseconds for a particular request, but added up across all requests every day, it can be a huge improvement.

Data Layer Performance

When it comes to application performance tuning, there's a dipstick test you can use to prioritize your work: Does the code access the database? If so, at what frequency? Note that this same test can also be applied to code that uses Web services or remoting, but this article does not cover these.

If a database request is necessary in a particular code path and you think you need to optimize other areas first (such as string manipulation), stop and then perform this dipstick test. If your performance issues are not severe, it's a good idea to spend some time optimizing the time spent relative to the database, the amount of data being returned, and the frequency of round trips to and from the database.

With this general information in mind, let's look at ten tips that may help improve application performance. First, I'll talk about the changes that might make the biggest difference.

Tip 1 — Return Multiple Result Sets

Look carefully at your database code to see if there are multiple request paths into the database. Each such round trip reduces the number of requests per second that the application can serve. By returning multiple result sets in one database request, you can save the overall length of time required to communicate with the database. At the same time, it also makes the system more scalable by reducing the work of the database server in managing requests.

Although it is possible to use dynamic SQL to return multiple result sets, my preferred method is to use stored procedures. There is some debate as to whether the business logic should reside in the stored procedure, but I think it would be better if the logic in the stored procedure can constrain the returned data (reducing the size of the data set, reducing the time spent on the network, not having to sift through the logic layer) data), this should be favored.

When populating a strongly typed business class using a SqlCommand instance and its ExecuteReader method, you can move the result set pointer forward by calling NextResult. Figure 1 shows a sample session using type classes to populate several ArrayLists. Returning only the data you need from the database will further reduce memory allocation on the server.

Figure 1 Extracting Multiple Resultsets from a DataReader
// read the first resultset
reader = command.ExecuteReader();

// read the data from that resultset
while (reader.Read()) {
suppliers.Add(PopulateSupplierFromIDataReader(reader));
}

// read the next resultset
reader.NextResult();

// read the data from that second resultset
while (reader.Read()) {
products.Add(PopulateProductFromIDataReader( reader ));
}

Tip 2 - Paginated Data Access

ASP.NET DataGrid has a nice feature: data pagination support. When paging is enabled in the DataGrid, a fixed number of records are displayed at a time. Additionally, a paging UI is displayed at the bottom of the DataGrid to facilitate navigation between records. The paging UI enables you to navigate forward and backward through the displayed data and displays a fixed number of records at a time.

There is also a small twist. Pagination using the DataGrid requires all data to be bound to the grid. For example, if your data layer needs to return all data, the DataGrid will filter all records displayed based on the current page. If 100,000 records are returned when paging through the DataGrid, 99,975 records are discarded for each request (assuming a page size of 25 records). When the number of records increases, application performance suffers because more and more data must be sent with each request.

An excellent way to write better performing pagination code is to use stored procedures. Figure 2 shows an example stored procedure for paging the Orders table in the Northwind database. In short, all you have to do at this point is pass the page index and page size. The appropriate result set is then calculated and returned.

Figure 2 Paging Through the Orders Table
CREATE PROCEDURE northwind_OrdersPaged
(
@PageIndex int,
@PageSize int
)
AS
BEGIN
DECLARE @PageLowerBound int
DECLARE @PageUpperBound int
DECLARE @RowsToReturn int

-- First set the rowcount
SET @RowsToReturn = @PageSize * (@PageIndex + 1)
SET ROWCOUNT @RowsToReturn

-- Set the page bounds
SET @PageLowerBound = @PageSize * @PageIndex
SET @PageUpperBound = @PageLowerBound + @PageSize + 1

-- Create a temp table to store the select results
CREATE TABLE #PageIndex
(
IndexId int IDENTITY (1, 1) NOT NULL,
OrderID int
)

-- Insert into the temp table
INSERT INTO #PageIndex (OrderID)
SELECT
OrderID
FROM
Orders
ORDER BY
OrderID DESC

-- Return total count
SELECT COUNT(OrderID) FROM Orders

-- Return paged results
SELECT
O.*
FROM
Orders O,
#PageIndex PageIndex
WHERE
O.OrderID = PageIndex.OrderID AND
PageIndex.IndexID > @PageLowerBound AND
PageIndex.IndexID < @PageUpperBound
ORDER BY
PageIndex.IndexID

END

In the community server, we wrote a paging server control to complete all data paging. As you'll see, I'm using the concept discussed in Tip 1 to return two result sets from a stored procedure: the total number of records and the requested data.

The total number of records returned may vary depending on the query executed. For example, a WHERE clause can be used to constrain the data returned. In order to calculate the total number of pages displayed in the paginated UI, you must know the total number of records to be returned. For example, if there are 1,000,000 records in total, and you want to use a WHERE clause to filter it down to 1000 records, the paging logic needs to know the total number of records in order to render the paging UI correctly.

Tip 3 — Connection Pooling

Setting up a TCP connection between a Web application and SQL Server can be a very resource-intensive operation. Developers at Microsoft have been able to use connection pooling for some time now, which allows them to reuse database connections. Instead of setting up a new TCP connection for every request, they only set up a new connection when there are no connections in the connection pool. When the connection is closed, it returns to the connection pool where it maintains the connection to the database rather than destroying the TCP connection entirely.

Of course, you need to be careful if you develop leaky connections. When you are done using your connections, be sure to close them. To repeat: No matter what anyone says about garbage collection in the Microsoft® .NET Framework, be sure to explicitly call Close or Dispose on a connection when you're done using it. Do not trust the common language runtime (CLR) to clear and close connections for you at predetermined times. Although the CLR will eventually destroy the class and force the connection to close, there is no guarantee when garbage collection of the object actually occurs.

To use connection pooling in an optimal way, there are some rules that need to be followed. First open the connection, perform the operation, then close the connection. If you must, open and close the connection multiple times per request (preferably applying tip 1), but don't keep the connection open all the time and pass it in and out using various different methods. Second, use the same connection string (and the same thread ID if using integrated authentication). If you don't use the same connection string, for example customizing the connection string based on the logged in user, then you won't get the same optimization value that the connection pool provides. If you use integrated authentication and also impersonate a large number of users, the efficiency of the connection pool will also decrease significantly. The .NET CLR data performance counters can be useful when trying to track down any performance issues related to connection pooling.

Whenever your application connects to a resource, such as a database running in another process, you should optimize by focusing on the time it takes to connect to that resource, the time it takes to send or retrieve data, and the number of round trips. Optimizing any kind of process hopping in your application is the first point to get better performance.

The application layer contains the logic to connect to the data layer and transform data into meaningful class instances and business processes. For example, a community server, where you want to populate the Forums or Threads collection, apply business rules (such as permissions); and most importantly, perform caching logic there.

Tip 4 — ASP.NET Caching API

Before writing a line of application code, one of the first things to do is to structure the application layer to take maximum advantage of ASP.NET caching capabilities.

If your component is to run in an ASP.NET application, you only need to include a reference to System.Web.dll in the application project. When you need to access the cache, use the HttpRuntime.Cache property (this object is also accessible through Page.Cache and HttpContext.Cache).

There are several rules for caching data. First, if the data is likely to be used multiple times, this is a good alternative to using caching. Second, if the data is generic and not specific to a specific request or user, this is also a good candidate for caching. If the data is specific to the user or request, but has a long life, it can still be cached, but this may not be used very often. Third, an often overlooked rule is that sometimes you can cache too much. Typically on an x86 computer, to reduce the chance of out-of-memory errors, you'll want to run processes with no more than 800MB of private bytes. Therefore the cache should have a limit. In other words, you might be able to reuse the result of a calculation, but if that calculation takes 10 parameters, you might be trying to cache 10 permutations, which might get you into trouble. One of the most common requests for ASP.NET support is out-of-memory errors caused by excessive caching, especially for large data sets.

Caching has several great features that you need to know about. First, the cache implements a least recently used algorithm, allowing ASP.NET to force cache purging - automatically removing unused items from the cache - when memory is running less efficiently. Second, the cache supports expired dependencies that can be forced to expire. These dependencies include time, keys, and files. Time is often used, but with ASP.NET 2.0, a new and more powerful invalidation type is introduced: database cache invalidation. It refers to automatically deleting items in the cache when the data in the database changes. For more information about database cache invalidation, see the Dino Esposito Cutting Edge column in the July 2004 MSDN Magazine. To understand the architecture of the cache, see the diagram below.

Tip 5 — Per-request caching

Earlier in this article, I mentioned that small improvements in frequently traversing code paths can lead to large overall performance gains. Of these small improvements, one is definitely my favorite and I call it per-request caching.

Caching APIs are designed to cache data for a longer period of time, or until certain conditions are met, but per-request caching means caching data only for the duration of that request. For each request, a specific code path is accessed frequently, but the data is extracted, applied, modified, or updated only once. This sounds a bit theoretical, so let's give a concrete example.

In a community server forum application, each server control used on the page requires personalization data to determine what appearance to use, what style table to use, and other personalization data. Some of this data can be cached long-term, but some data is fetched only once per request and then reused multiple times during that request, such as for the appearance of a control.

To achieve per-request caching, use ASP.NET HttpContext. For each request, an HttpContext instance is created and is accessible from anywhere within the HttpContext.Current property during the request. The HttpContext class has a special Items collection property; objects and data added to this Items collection are cached only for the duration of the request. Just as you can use caching to store frequently accessed data, you can also use HttpContext.Items to store data that is only used on a per-request basis. The logic behind it is very simple: data is added to the HttpContext.Items collection when it does not exist, and in subsequent lookups, just the data in HttpContext.Items is returned.

Tip 6 — Background Processing

The path to code should be as fast as possible, right? There may be times when you feel that a task that is performed on every request or once every n requests requires a lot of resources. Sending emails or analyzing and validating incoming data are some examples.

When dissecting ASP.NET Forums 1.0 and re-architecting the content that makes up the community server, we found that adding new posting code paths was very slow. Each time a new post is added, the application first needs to ensure that there are no duplicate posts, then it must analyze the post using a "bad words" filter, analyze the posted character emoticon, tag and index the post, and add the post when requested Go to the appropriate queue, validate the attachment, and upon final posting, send an email notification to all subscribers immediately. Clearly, there's a lot involved.

After research, it was found that most of the time was spent on indexing logic and sending emails. Indexing postings is a very time-consuming operation, and it was discovered that the built-in System.Web.Mail functionality connects to a SMYP server and then sends emails continuously. As the number of subscribers for a particular post or topic area increases, the AddPost function takes longer and longer to execute.

Email indexing is not required for every request. Ideally, we would like to batch this operation, indexing 25 postings at a time or sending all emails every five minutes. We decided to use code that we had previously used to prototype the data cache invalidation that was used for what ended up in Visual Studio® 2005.

The Timer class in the System.Threading namespace is very useful, but not very well-known in the .NET Framework, at least among Web developers. Once created, this Timer class will call the specified callback at a configurable interval for a thread in the ThreadPool. This means that you can set up your code to execute without incoming requests to the ASP.NET application, which is ideal for background processing. You can also perform operations such as indexing or sending emails in this background process.

However, there are several problems with this technology. If the application domain is uninstalled, this timer instance will stop firing its events. Also, because the CLR has a hard standard for the number of threads per process, there may be situations where the server is heavily loaded where the timer may not have threads to complete on, to some extent. Will cause delays. ASP.NET attempts to minimize the chance of this happening by keeping a certain number of threads available in the process and using only a portion of the total threads for request processing. However, this can be a problem if you have a lot of asynchronous operations.

There isn't enough room here for this code, but you can download a readable example at www.rob-howard.net. Check out the slides and demos from the Blackbelt TechEd 2004 presentation.

Tip 7 — Page Output Caching and Proxy Servers

ASP.NET is your presentation layer (or should be your presentation layer); it consists of pages, user controls, server controls (HttpHandlers and HttpModules), and the content they generate. If you have an ASP.NET page that generates output (HTML, XML, images, or any other data), and when you run this code on every request it generates the same output, then you have a tool that can be used to A great alternative to page output caching.

Add this line of content to the top of the page <%@ Page OutputCache VaryByParams="none" Duration="60" %>

You can efficiently generate output for this page once and then reuse it multiple times for up to 60 seconds, at which time the page will be re-executed and the output will be added to the ASP.NET cache again. This behavior can also be accomplished using some low-level programmatic APIs. There are several configurable settings for output caching, such as the VaryByParams property just mentioned. VaryByParams is just requested, but also allows you to specify HTTP GET or HTTP POST parameters to change cache entries. For example, simply set VaryByParam="Report" to cache output for default.aspx?Report=1 or default.aspx?Report=2. Additional parameters can be specified by specifying a semicolon-separated list.

Many people don't know that when output caching is used, ASP.NET pages also generate some HTTP headers that are located downstream of the caching server, such as those used by Microsoft Internet Security and Acceleration Server or Akamai. After setting the HTTP cache header, documents can be cached on these network resources and client requests can be satisfied without returning to the origin server.

Therefore, using page output caching will not make your application more efficient, but it may reduce the load on the server because downstream caching technologies cache the document. Of course, this might just be anonymous content; once it goes downstream, you'll never see those requests again, and you'll no longer be able to perform authentication to prevent access to it.

Tip 8 — Run IIS 6.0 (just use it for kernel caching)

If you are not running IIS 6.0 (Windows Server? 2003), you are missing out on some great performance enhancements in Microsoft Web servers. In Tip 7, I discussed output caching. In IIS 5.0, requests go through IIS and then into ASP.NET. When it comes to caching, the HttpModule in ASP.NET receives the request and returns the contents of the cache.

If you are using IIS 6.0, you will find a nice little feature called kernel cache that does not require any code changes to ASP.NET. When a request is made for output caching by ASP.NET, the IIS kernel cache receives a copy of the cached data. When a request comes from a network driver, the kernel-level driver (without context switching to user mode) receives the request, flushes the cached data to the response if cached, and then completes execution. This means that when you use kernel-mode caching with IIS and ASP.NET output caching, you will see incredible performance results. During the development of ASP.NET in Visual Studio 2005, I was the program manager responsible for ASP.NET performance. The developers do the specific work, but I get to see all the reporting that goes on every day. Kernel mode cache results are always the most interesting. The most common characteristic is that the network is flooded with requests/responses, while IIS is running at only about 5% CPU usage. This is shocking! There are of course other reasons to use IIS 6.0, but kernel-mode caching is the most obvious one.

Tip 9 — Use Gzip Compression

Although using gzip is not necessarily a server performance trick (as you may see an increase in CPU usage), using gzip compression can reduce the number of bytes sent by the server. This results in a perceived increase in page speed and reduced bandwidth usage. Depending on the data being sent, how much it can be compressed, and whether the client browser supports it (IIS will only send gzip-compressed content to clients that support gzip compression, such as Internet Explorer 6.0 and Firefox), your server can serve More requests. In fact, almost every time you reduce the amount of data returned, you increase the number of requests per second.

Gzip compression is built into IIS 6.0, and its performance is much better than the gzip compression used in IIS 5.0, which is good news. Unfortunately, when trying to turn on gzip compression in IIS 6.0, you may not be able to find the setting in the Properties dialog of IIS. The IIS team built excellent gzip functionality into the server, but forgot to include an administrative UI to enable it. To enable gzip compression, you have to dig deep inside the XML configuration settings of IIS 6.0 (so that you don't get weak of heart). Incidentally, credit goes to Scott Forsyth of OrcsWeb for helping me raise this issue with the www.asp.net server hosted on OrcsWeb.

This article will not describe the steps. Please read Brad Wilson's article at IIS6 Compression. There is also a knowledge base article on enabling compression for ASPX at Enable ASPX Compression in IIS. You should note, however, that due to some implementation details, dynamic compression and kernel caching cannot exist simultaneously in IIS 6.0.

Tip 10 — Server Control View State

View state is an interesting name for ASP.NET that stores some state data in hidden output fields of the generated page. When the page is posted back to the server, the server can parse, validate, and apply this view state data back to the page's control tree. View state is a very powerful feature because it allows state to be persisted with the client, and it does not require cookies or server memory to save this state. Many ASP.NET server controls use view state to persist settings created during interactions with page elements, such as saving the current page that is displayed when paginating data.

However, using view state also has some disadvantages. First, it increases the overall load on the page every time it is served or requested. Additional overhead also occurs when serializing or deserializing view state data posted back to the server. Finally, view state increases memory allocation on the server.

Several server controls have a tendency to overuse view state even when it is not needed, the most famous of which is the DataGrid. The default behavior of the ViewState property is on, but you can turn it off at the control or page level if you don't need it. Within the control, simply set the EnableViewState property to false, or set it globally on the page using the following setting:

＜%@ Page EnableViewState="false" %＞

If you do not post back the page, or always regenerate the controls on the page on every request, you should disable view state at the page level.

I've given you some tips that I find helpful when writing high-performance ASP.NET applications. As I mentioned earlier in this article, this is a preliminary guide and not the final word on ASP.NET performance. (For information about improving ASP.NET application performance, see Improving ASP.NET Performance.) The best way to solve a specific performance problem can only be found through your own experience. However, these tips should give you some good guidance on your journey. In software development, there are few absolutes; every application is unique.