Web Caching Technology Overview

Author：Eve Cole Update Time：2010-05-24 16:21:32

introduction

WWW is one of the most popular applications on the Internet. Its rapid growth has caused network congestion and server overload, resulting in increased customer access delays and WWW service quality problems. Caching technology is considered to be one of the effective ways to reduce server load, reduce network congestion, and enhance WWW scalability. Its basic idea is to use the principle of temporal locality of customer access to store the content that the customer has accessed in the Cache. Store a copy. When the content is accessed next time, it does not have to be connected to the hosting website, but is provided by the copy retained in the Cache.
Web content can be cached on the client, proxy server, and server side. Research shows that caching technology can significantly improve WWW performance [1][2], and it can bring the following benefits:
(1) Reduce network traffic, thereby alleviating network congestion;
(2) Reduce customer access delay. The main reasons are: ① For content cached in the proxy server, customers can obtain it directly from the proxy instead of from the remote server, thus reducing the transmission delay; ② Content that is not cached due to network Congestion and server load are reduced so that it can be obtained by customers faster;
(3) Since part of the client's request content can be obtained from the proxy, the load on the remote server is reduced;
(4) If the remote server cannot respond to the client's request due to remote server failure or network failure, the client can obtain a cached copy of the content from the proxy, which enhances the robustness of the WWW service.
Web caching systems also bring the following problems:
(1) The content obtained by the customer through the agent may be outdated;
(2) If a cache invalidation occurs, the client's access latency increases due to additional proxy processing overhead. Therefore, when designing a Web cache system, one should strive to maximize the Cache hit rate and minimize the failure cost;
(3) The agent may become a bottleneck. Therefore, an upper limit on the number of service customers and a lower limit on service efficiency should be set for an agent, so that the efficiency of an agent system is at least as efficient as that of clients directly connected to the remote server.
At present, extensive and in-depth research has been carried out around Web caching systems and their optimization issues, and these research works mainly focus on the role of proxies.
2 Ideal Characteristics of a Web Caching System An ideal Web caching system should have the following characteristics:
(1) Speed: The caching system should be able to effectively reduce customer access delays;
(2) Robustness: Robustness means availability, and customers want Web services to be available at any time;
(3) Transparency: The caching system should be transparent to customers, and the results obtained by customers are only fast response and good availability;
(4) Scalability: The web caching system should be able to scale well as the network size and density continue to grow;
(5) Efficiency: The smaller the overhead the Web caching system brings to the network, the better;
(6) Adaptability: The caching system can adapt to dynamic changes in customer requests and network environment, which involves cache management, cache routing, proxy configuration, etc., and is crucial to obtaining ideal cache performance;
(7) Stability: The solution adopted by the Web caching system should not bring instability to the network;
(8) Load balancing: An ideal caching solution should be able to evenly distribute the load to the entire network to avoid a certain agent or server becoming a bottleneck or hot spot, causing performance degradation of part of the system or even the entire system;
(9) Heterogeneous processing capabilities: As the network scale and coverage area continue to increase, the network will span a series of different hardware and software architectures. Web caching systems should be able to adapt to different network architectures;
(10) Simplicity: Simple solutions are easy to implement and generally accepted. An ideal web caching solution should be simple and easy to configure.
Focusing on the above characteristics, a Web caching system must solve the following problems:
(1) Cache architecture: how caching proxies are organized and configured in the network;
(2) Agent cooperation: How to cooperate between agents. Agents that cooperate with each other can increase the hit rate and improve the performance of the cache system;
(3) Cache routing: when one cache proxy fails, how to forward the request to other cache proxies;
(4) Cache replacement algorithm: when the cache space is not enough, how to replace the cache content;
(5) Cache consistency: that is, the timeliness of cached content and how to prevent cached content from becoming outdated;
(6) Content prefetching: How the agent decides to prefetch content from the server or other agents to reduce the client's access delay;
(7) Load balancing: How to solve the "Hot spot" phenomenon in the network;
(8) Cache content: What kind of content can be cached.
When designing a web caching system, the above issues must be addressed.
3 Overview of Web Caching Solutions
3.1 Web Cache Architecture The performance of a Web cache system depends on the size of its customer base. The larger the customer base, the higher the probability that the cached content will be requested again. Cache groups that cooperate with each other may increase the hit rate and improve the performance of the cache system. Therefore, the architecture of the cache system should ensure that agents can cooperate effectively. Typical cache architectures include the following: hierarchical, distributed, and hybrid.

Figure 1 Web caching system architecture diagram
3.1.1 Hierarchical cache architecture
The Harvest project [3] first proposed a hierarchical web caching architecture. In the hierarchical cache architecture, Cache is configured at multiple levels in the network, as shown in Figure 1(a). For the sake of simplicity, it is assumed that there are four levels: bottom layer Cache, local layer Cache, regional layer Cache, and wide area layer Cache. The bottom layer is the client/browser Cache. When the client Cache cannot satisfy the client's request, the request is forwarded to the local area layer Cache. If it is still not satisfied, the request is forwarded to the regional layer Cache until the wide area layer Cache. . If the request cannot be satisfied in caches at all levels, the request is eventually forwarded to the server. The server's response to the request is then sent top-down to the client, leaving a copy in each middle-tier cache along the way. Other requests for the same content are forwarded from bottom to top until they are satisfied in a certain level of cache.
The hierarchical caching architecture is highly bandwidth efficient, and web content with high click-through rates can be distributed to the network quickly and efficiently. However, this architecture also has some shortcomings[4]:
(1) Establish a hierarchical cache architecture. Cache servers must be configured at key access points in the network, and cache servers must cooperate with each other;
(2) Each level of Cache will bring additional delay;
(3) The high-level cache may become a bottleneck and bring long queuing delays;
(4) Multiple copies of the same content are stored in different caches, and the cache space utilization of the entire system is not high.
3.1.2 Distributed Cache Architecture In view of the above-mentioned shortcomings of hierarchical cache structure, some researchers have proposed a distributed cache architecture. In this structure, there is only a low-level Cache, as shown in Figure 1(b). In the distributed Web cache structure in literature [5], there is no intermediate Cache layer beyond the local layer, and the caches cooperate with each other to handle failures. In order to determine which local area layer cache to forward the client request to obtain the invalid content, each local area layer cache retains a copy of the directory information of the cached content in other local area layer caches, so that the client request can be accurately forwarded when an invalidation occurs. To the corresponding local layer Cache. Cache Array Routing Protocol CARP [6] (Cache Array Routing protocol) is a distributed caching scheme that divides the URL space into different parts and assigns each part to a loosely coupled Cache group. Each Cache can only Cache web content that has a URL assigned to it, making it possible to determine which Cache to forward the request to based on the URL from which the client requested the content.
In a distributed cache structure, most network traffic occurs at the bottom of the network, which is less likely to cause network congestion. Cache space utilization is high, load sharing can be better achieved, and fault tolerance is better. However, the configuration of a large-scale distributed cache system may encounter several problems: high number of connections, high bandwidth requirements, and difficult management [4].
3.1.3 Hybrid cache architecture The hybrid architecture is shown in Figure 1(c). Cache at the same level adopts a distributed cache structure and cooperates with each other. The Internet Cache Protocol ICP (the Internet Cache Protocol) designed by Harvest Group supports obtaining corresponding content from the parent Cache or neighbor Cache with the smallest RTT.
3.1.4 Optimization research on cache architecture shows that [4] compared with distributed cache structure, hierarchical cache architecture has shorter connection time, so smaller documents are cached in the middle layer Cache. Access latency can be reduced; the distributed cache structure has shorter transmission time and higher bandwidth utilization. The ideal solution is to combine the two to give full play to their respective strengths while reducing connection time and transmission time.
3.2 Cache Routing Considering the scalability of the Web caching system, most caching systems disperse a large number of Cache on the Internet. The biggest problem brought by this is how to quickly locate the Cache that caches the required content. This is cache routing. question. This problem is somewhat similar to network routing, but cannot be solved in the same way. Traditional network routing can be based on address clustering (hierarchical address representation makes address clustering possible), but in the WWW, documents with the same URL prefix or server address prefix may not be sent to the same client, making it difficult to route Addresses are clustered so that the cache routing table becomes unmanageably large. In addition, cache content is constantly updated, and outdated cache routing information will cause cache invalidation. In order to reduce the cost of cache failure, an ideal cache routing algorithm should route the client's request to the next proxy, which has a higher probability of hitting and is located on or close to the network path from the client to the server.
3.2.1 Caching routing table method
Malpani et al. [7] combined a group of Cache. When the client's request is forwarded to the specified Cache, if the Cache has the requested content, it will be sent to the client. Otherwise, the request will be forwarded to the client through IP multicast. Other caches in the same group respond to the client's request from the cache that caches the corresponding content. If the requested content is not cached in any cache, the request is forwarded to the origin server. The Harvest[3] caching system organizes cache into a hierarchical structure and uses the Cache Resolution Protocol ICP (Internet Cache Protocol). When a cache failure occurs, the lower-level cache first queries the sibling node cache before forwarding the customer request to the upper-level cache. Whether the corresponding content is cached to avoid overloading the top-level Cache. The adaptive web caching system [8] establishes a Cache tree for each server. The caches in the tree are organized into overlapping multicast groups, and a request obtains the corresponding cached content through these transmission groups. This method constructs a different Cache tree for each server, so there is no overload problem of the root node, and the self-configuration and robustness are relatively good. However, requests for content with low click-through rates may go through more caches, resulting in greater cache communication overhead. The author recommends limiting the number of caches that requests pass through to solve this problem.
3.2.2 Hash function method
The Cache Array Routing Protocol CARP [6] uses a hash function based on the array member list and the URL to determine the exact cache address of a Web object or where a Web object should be cached. In Summary Cache [9], each proxy saves a URL summary information of the content cached by other proxies in the same group. The proxy checks these summary information when forwarding the client request to determine which proxy to forward the request to. To reduce overhead, these summary information are updated periodically. Experiments show that this system can significantly reduce the amount of information between caches, bandwidth consumption, and CPU overhead caused by the protocol, while maintaining almost the same cache hit rate as ICP.
3.3 Cache replacement algorithm
The Cache replacement algorithm is an important factor affecting the performance of the proxy cache system. A good Cache replacement algorithm can produce a higher hit rate. Algorithms that have been proposed so far can be divided into the following three categories:
(1) Traditional replacement algorithm and its direct evolution. Its representative algorithms are: ①LRU (Least Recently Used) algorithm: replace the least recently used content out of the Cache; ②LFU (Lease Frequently Used) algorithm: replace the least visited content out of the Cache Cache; ③Pitkow/Recker[10] proposed a replacement algorithm: if all contents in the Cache are cached on the same day, the largest document will be replaced from the Cache, otherwise it will be replaced according to the LRU algorithm.
(2) Replacement algorithm based on the key characteristics of cache content. Its representative algorithms include: ①Size[10] replacement algorithm: replace the largest content out of the Cache; ②LRU-MIN[11] replacement algorithm: This algorithm strives to make the replaced document individual The least number. Assume that the size of the document to be cached is S, and the cached documents in the cache with a size of at least S are replaced according to the LRU algorithm; if there is no object with a size of at least S, the LRU algorithm is used from documents with a size of at least S/2. Replace; ③LRU—Threshold[11] replacement algorithm: The same as the LRU algorithm, except that documents whose size exceeds a certain threshold cannot be cached; ④Lowest Lacency First[12] replacement algorithm: Replace the document with the smallest access delay out of the cache.
(3) Cost-based replacement algorithm. This type of algorithm uses a cost function to evaluate the objects in the Cache, and finally determines the replacement object based on the cost value. Its representative algorithms are: ①Hybrid[12] algorithm: The algorithm assigns a utility function to each object in the Cache, and replaces the object with the smallest utility value out of the Cache; ②Lowest Relative Value[13] algorithm: replaces the object with the lowest utility value out of the Cache ; ③ Least Normalized Cost Replacement (LCNR) [14] algorithm: This algorithm uses an inference function about document access frequency, transmission time and size to determine replacement documents; ④ Bolot et al. [15] proposed a method based on document transmission time cost, Size, and the weighted inference function of the last access time are used to determine document replacement; ⑤Size-Adjust LRU (SLRU) [16] algorithm: Sort cached objects according to the ratio of cost to size, and select the object with the smallest ratio for replacement.
In short, in order to maximize the Cache hit rate, a lot of work has been carried out around the Cache replacement algorithm. However, the performance of the replacement algorithm largely depends on the characteristics of WWW access. There is no replacement algorithm that can handle all Web access modes. are better than other algorithms.
3.4 Cache consistency
The web caching system can reduce access latency, but it has a side effect: the cached copy provided to customers may be outdated content, so a cache consistency mechanism must be in place to ensure that the cached content can be updated and validated in a timely manner. , in order to provide customers with the latest content.
There are currently two main types of cache consistency: strong cache consistency and weak cache consistency.
3.4.1 Strong cache consistency (1) Client confirmation: For each access, the proxy considers the cached content to be outdated and sends an "IF-Modified-Since-date" header to the server with the request. If the content changes after the specified time, the server sends the updated content to the agent and eventually to the client; if the requested content has not been modified, a "304" response is sent back, indicating that the document has not been modified and the cached content continues. efficient.
(2) Server confirmation: When the server detects that a content has changed, the server sends invalidation information to all clients that have recently requested the content and may have cached the content [17]. This method requires that the server must save a linked list of clients accessing the content in order to send invalid information. When the number of clients is large, this method will become inapplicable. At the same time, the linked list itself may also become outdated, causing the server to send messages to many clients that are no longer cached. Customers of this content are sent invalid information.
3.4.2 Weak cache consistency (1) Adaptive TTL [18] (Time To Live) mechanism: by observing the lifetime of a document to adjust its survival time, thereby solving the cache consistency problem. If a document has not been modified for a considerable period of time, it will tend not to change again. In this way, a document's lifetime attribute is assigned a percentage of the document's current "age" (equal to the current time minus the time of the last modification). The adaptive TTL method can control the possibility of a document becoming obsolete to less than 5%. Most proxy servers use this mechanism, but this cache consistency mechanism based on document lifetime does not ensure the validity of cached content.
(2) Piggyback invalidation mechanism
Krishnamurthy et al. proposed using a piggy-back invalidation mechanism to improve the efficiency of cache coherence. They proposed three mechanisms: ① Piggyback Cache Validation (PCV) [19] mechanism: using requests sent by the proxy to the server to improve cache consistency. For example, when a proxy makes a request to the server, it carries a series of cached but possibly outdated content from the server for validity confirmation; ② Piggyback Service Invalidation (PSI) [20] (Piggyback Service Invalidation) mechanism: the basic idea is that when When the server responds to the proxy, it tells the proxy server a series of content that has changed since the last proxy access and the proxy invalidates these contents, thereby extending the cache time of other cached content in the Cache; ③ PSI and PCV hybrid mechanism [21]: This mechanism determines which mechanism to use to achieve the best overall performance based on the size of the current interval since the last request was invalidated by the agent. If this time interval is small, the PSI mechanism is used, otherwise the PCV mechanism is used to confirm the cache content. The basic principle is that the smaller the time interval, the smaller the number of voids sent together with the PSI, but as time increases, the overhead of sending voids will be greater than the overhead of requesting confirmation.
3.5 Content prefetching
Web caching technology can improve Web performance, but research shows [22] that no matter what caching scheme is used, the maximum cache hit rate is usually not greater than 40 to 50%. To further improve the cache hit rate, prefetching technology is introduced. Prefetch technology is essentially an active caching technology. Its basic idea is to use prior knowledge of the customer's access content or mode to predict the customer's next request content when processing the customer's current request, and use the customer's requested content to Gap caches prediction content in the Cache to better hide latency and improve service quality.
Early research focused on content prefetching between browsers/clients and Web servers. When proxies were introduced, people's research interest shifted to prefetching technology between proxies and servers. Research shows that prefetching technology can effectively reduce customer access latency, but prefetching technology is still controversial for two reasons:
(1) Content prefetching is a task with high real-time requirements. It mainly uses the interval of customer requests, and this interval is generally less than one minute [23]. If the prefetching task cannot be completed within this period of time, , prefetching will become meaningless. Therefore, there are higher requirements for the efficiency of the prefetch algorithm.
(2) Content prefetching reduces client response time at the expense of increasing server load and network traffic, so there are higher requirements for prefetching accuracy. At the same time, a prefetch model must consider the customer's access characteristics, server load, and network traffic conditions when determining the number of prefetched documents. Prefetching without these factors may have counterproductive effects.
In short, a good prefetching model must have high efficiency and accuracy at a low cost. Further research is needed around the efficiency and accuracy of prefetching.
3.5 Load Balancing When many customers obtain data or services from a server at the same time, the Hot Spot phenomenon will occur, resulting in server performance degradation or even failure. Most of the current methods to deal with this problem are to use some replication strategy to store the requested content on the Internet, thereby distributing the load to multiple servers (agents) [24] to avoid a single server becoming a bottleneck.
3.6 Caching content A proxy may play multiple roles. In addition to data caching, it can also perform connection caching and calculation caching. Connection caching refers to the use of persistent connections between the client and the agent, and the agent and the server to reduce the overhead of establishing a TCP connection and the slow start overhead when the server sends, thereby reducing the customer access delay time [25]. Computational caching can be seen as Web servers that can migrate some of their services to proxies to alleviate server bottlenecks. One of its applications is dynamic data caching, which caches dynamic data through proxies and migrates part of the calculations to proxies, which are generated by proxies. and maintain cached dynamic data, thereby improving the performance of customers in obtaining dynamic data.
4 Issues that require further research A large amount of research has been carried out around Web caching technology and fruitful results have been achieved, but there are still some issues that require further research. These issues include:
(1) Research on customer access patterns: By studying customer access patterns, we can better perform cache management and content prefetching, and improve the cache hit rate;
(2) Dynamic data caching: An important reason why the current web cache hit rate is not high is that a considerable part of the content (private data, authorized data, dynamic data, etc.) cannot be cached. How to make more data cacheable and how to reduce the access delay for customers to access non-cached pages has become a key issue in improving Web performance;
(3) Web traffic characteristics: The efficiency of the caching system lies in the time locality of Web access flows and good Cache management strategies. Understanding the load characteristics generated by Web clients is of great significance for better designing and providing Web services;
(4) Proxy configuration: To obtain good Web performance, proxy configuration is crucial. The ideal standards for proxy configuration strategies are: self-organization, efficient routing, load balancing, stable behavior, etc. Further research is needed on this issue. .
In short, the cutting-edge research to improve Web performance lies in developing caching solutions that are scalable, robust, adaptable, stable, efficient, and can be better configured in current and future networks.
Wang Shike Wu Ji Jin Shiyao

(State Key Laboratory of Parallelism and Distribution, School of Computer Science, National University of Defense Technology, Changsha 410073)