Text/Tao Gang
Handling large file downloads in web applications has always been notoriously difficult, so for most sites, woe befalls the user if their download is interrupted. But we don't have to do that now, because you can make your ASP.NET application capable of supporting resumable (continuable) downloads of large files. Using the method provided in this article, you can track the download process, so you can handle dynamically created files - and do this without the need for old-school ISAPI dynamic link libraries and unmanaged C++ code. .
It's easiest to provide a service for clients to download files from the Internet, right? Just copy the downloadable file into your web application directory, publish the link and let IIS do all the related work. However, file serving shouldn't be more than a pain in the neck, you don't want the whole world to have access to your data, you don't want your server to be clogged with hundreds of static files, you don't even want Download temporary files - These files are only created during idle time after the client starts downloading.
Unfortunately, it is not possible to achieve these effects using IIS's default response to download requests. So in general, to gain control over the download process, developers need to link to a custom .aspx page where they check the user's credentials, create a downloadable file and use the following code to The file is pushed to the client:
Response.WriteFile
Response.End()
And this is where the real trouble arises.
What's the problem?
The WriteFile method looks perfect, it makes the file's binary data flow to the client. But what we didn't know until recently is that the WriteFile method is a notorious memory hog, loading the entire file into the server's RAM to serve (in fact it takes up twice the size of the file). For large files, this can cause service memory issues and possibly duplicate ASP.NET processes. But in June 2004, Microsoft released a patch that solved the problem. This patch is now part of the .NET Framework 1.1 Service Pack (SP1).
This patch introduces the TransmitFile method, which reads a disk file into a smaller memory buffer and then begins transferring the file. Although this solution solves the memory and loop problems, it is still unsatisfactory. You have no control over the response lifecycle. You have no way of knowing whether the download completed correctly, you have no way of knowing if the download was interrupted, and (if you created temporary files) you have no way of knowing if and when you should delete the files. To make matters worse, if the download does fail, the TransmitFile method starts downloading from the beginning of the file the client next tries.
One possible solution, implementing a Background Intelligent Transfer Service (BITS), is not feasible for most sites because it would defeat the purpose of maintaining client browser and operating system independence.
The basis for a satisfactory solution comes from Microsoft's first attempt to solve the memory confusion problem caused by WriteFile (see Knowledge Base article 812406). That article demonstrated an intelligent chunk data download process that reads data from a file stream. Before the server sends the byte chunk to the client, it uses the Response.IsClientConnected property to check whether the client is still connected. If the connection is still open, it continues sending stream bytes, otherwise it stops to prevent the server from sending unnecessary data.
This is the approach we take, especially when downloading temporary files. In the case where IsClientConnected returns False, you know that the download process was interrupted and you should save the file; otherwise, when the process completes successfully, you delete the temporary file. Additionally, in order to resume an interrupted download, all you need to do is start the download from the point where the client connection failed during the last download attempt.
HTTP protocol and header information (Header) support
HTTP protocol support can be used to handle header information for interrupted downloads. Using a small number of HTTP headers, you can enhance your download process to fully comply with the HTTP protocol specification. This specification, along with ranges, provides all the information needed to resume an interrupted download.
Here's how it works. First, if the server supports client-side resumable downloads, it sends the Accept-Ranges header in the initial response. The server also sends an entity tag header (ETag), which contains a unique identifying string.
The code below shows some of the headers IIS sends to the client in response to an initial download request, which passes the details of the requested file to the client.
HTTP/1.1 200 OK
Connection: close
Date: Tue, 19 Oct 2004 15:11:23 GMT
Accept-Ranges: bytes
Last-Modified: Sun, 26 Sep 2004 15:52:45 GMT
ETag: "47febb2cfd76c41:2062"
Cache-Control: private
Content-Type: application/x-zip-compressed
Content-Length: 2844011
After receiving these header information, if the download is interrupted, the IE browser will send the Etag value and Range header information back to the server in subsequent download requests. The code below shows some of the headers IE sends to the server when trying to resume an interrupted download.
GET
These headers indicate that IE cached the entity tag provided by IIS and sent it back to the server in the If-Range header. This is a way to ensure that the download is resumed from the exact same file. Unfortunately, not all browsers work the same way. Other HTTP headers sent by the client to verify the file may be If-Match, If-Unmodified-Since, or Unless-Modified-Since. Obviously, the specification is not explicit about which headers client software must support, or which headers must be used. Therefore, some clients do not use header information at all, while IE only uses If-Range and Unless-Modified-Since. You'd better check this information with code. When taking this approach, your application can comply with the HTTP specification at a very high level and work with a variety of browsers. The Range header specifies the requested byte range - in this case it is the starting point from which the server should resume the file stream.
When IIS receives a request type of resume download, it sends back a response containing the following header information:
HTTP/1.1 206 Partial Content
Content-Range: bytes 822603-2844010/2844011
Accept-Ranges: bytes
Last-Modified: Sun, 26 Sep 2004 15:52:45 GMT
ETag: "47febb2cfd76c41:2062"
Cache-Control: private
Content-Type: application/x-zip-compressed
Content-Length: 2021408
Please note that the above code has a slightly different HTTP response than the original download request - the request to resume the download is 206 while the original download request was 200. This indicates that what is being passed over the wire is a partial file. This time the Content-Range header indicates the exact number and location of bytes being passed.
IE is very picky about these header information. If the initial response does not contain the Etag header information, IE will never attempt to resume the download. Other clients I've tested don't use the ETag header, they simply rely on the file name, request scope, and use the Last-Modified header if they are trying to validate the file.
An in-depth understanding of the HTTP protocol
The header information shown in the previous section is sufficient to make the solution for resuming downloads work, but it does not completely cover the HTTP specification.
The Range header can ask for multiple ranges in a single request, a feature called "multipart ranges". Not to be confused with segmented downloading, almost all download tools use segmented downloading to increase download speeds. These tools claim to increase download speeds by opening two or more concurrent connections, each requesting a different range of files.
The idea of multipart ranges doesn't open multiple connections, but it allows client software to request the first ten and last ten bytes of a file in a single request/response cycle.
To be honest, I've never found a piece of software that uses this feature. But I refuse to write "it is not fully HTTP compliant" in the code declaration. Omitting this feature will definitely violate Murphy's Law. Regardless, multipart ranges are used in email transmissions to separate header information, plain text, and attachments.
Sample Code
We know how the client and server exchange header information to ensure resumable downloads. Combining this knowledge with the idea of file block streaming, you can add reliable download management capabilities to your ASP.NET applications. .
The way to gain control of the download process is to intercept the download request from the client, read the header information and respond appropriately. Before .NET, you had to write an ISAPI (Internet Server API) application to implement this functionality, but the .NET Framework component provides an IHttpHandler interface that, when implemented in a class, allows you to do this using just .NET code Intercept and process requests. This means your application has full control and responsiveness over the download process and never involves or uses IIS automated functions.
The sample code includes a custom HttpHandler class (ZIPHandler) in the HttpHandler.vb file. ZipHandler implements the IhttpHandler interface and handles requests for all .zip files.
In order to test the sample code, you need to create a new virtual directory in IIS and copy the source files there. Create a file called download.zip in this directory (please note that IIS and ASP.NET cannot handle downloads larger than 2GB, so make sure your file does not exceed this limit). Configure your IIS virtual directory to map the .zip extension through aspnet_isapi.dll.
HttpHandler class: After ZIPHandler
maps the .zip extension in ASP.NET, every time the client requests a .zip file from the server, IIS calls the ProcessRequest method of the ZipHandler class (see download code).
The ProcessRequest method first creates an instance of the custom FileInformation class (see download code), which encapsulates the status of the download (such as in progress, interrupted, etc.). The example hardcodes the path to the download.zip sample file into the code. If you apply this code to your own application, you will need to modify it to open the requested file.
' Use objRequest to detect which file was requested, and use the file to open objFile.
' For example objFile = New Download.FileInformation(<full file name>)
objFile = New Download.FileInformation( _
objContext.Server.MapPath("~/download.zip"))
Next, the program executes the request using the described HTTP headers (if the headers were provided in the request). I prayed for the first time under the sun. If a validation check fails, the response is terminated immediately and the appropriate StatusCode value is sent.
If Not objRequest.HttpMethod.Equals(HTTP_METHOD_GET) Or Not
objRequest.HttpMethod.Equals(HTTP_METHOD_HEAD) Then
' Currently only GET and HEAD methods are supported objResponse.StatusCode = 501 ' Not executed
ElseIf Not objFile.Exists Then
' The requested file could not be found objResponse.StatusCode = 404 ' Not found
ElseIf objFile.Length > Int32.MaxValue Then
'The file is too large objResponse.StatusCode = 413 'The request entity is too large
ElseIf Not ParseRequestHeaderRange(objRequest, alRequestedRangesBegin, alRequestedRangesend, _
objFile.Length, bIsRangeRequest) Then
' The Range request contains useless entities objResponse.StatusCode = 400 ' Useless request
ElseIf Not CheckIfModifiedSince(objRequest,objFile) Then
'The entity has not been modified objResponse.StatusCode = 304 'The entity has not been modified
ElseIf Not CheckIfUnmodifiedSince(objRequest,objFile) Then
' The entity has been modified since the last requested date objResponse.StatusCode = 412 ' Preprocessing failed
ElseIf Not CheckIfMatch(objRequest, objFile) Then
' The entity does not match the request objResponse.StatusCode = 412 ' Preprocessing failed
ElseIf Not CheckIfNoneMatch(objRequest, objResponse,objFile) Then
' The entity does match the none-match request.
'The response code is located in the CheckIfNoneMatch function
Else
'Preliminary check successful
The ParseRequestHeaderRange function in these preliminary checks (see download code) checks whether the client requested a file range (which means a partial download). If the requested range is invalid (an invalid range is a range value that exceeds the file size or contains an unreasonable number), this method sets bIsRangeRequest to True. If a range is requested, the CheckIfRange method verifies the IfRange header information.
If the requested range is valid, the code calculates the size of the response message. If the client requested multiple ranges, the response size value will include the multipart header length value.
If a sent header value cannot be determined, the program will handle the download request as an initial request rather than a partial download, sending a new download stream starting from the top of the file.
If bIsRangeRequest AndAlso CheckIfRange(objRequest, objFile) Then
'This is a range request' If the Range array contains multiple entities, it is also a multipart range request bMultipart = CBool(alRequestedRangesBegin.GetUpperBound(0)>0)
' Go into each range to get the entire response length For iLoop = alRequestedRangesBegin.GetLowerBound(0) To alRequestedRangesBegin.GetUpperBound(0)
'The length of the content (in this range)
iResponseContentLength += Convert.ToInt32(alRequestedRangesend( _
iLoop) - alRequestedRangesBegin(iLoop)) + 1
If bMultipart Then
' If it is a multi-part range request, calculate the length of the intermediate header information to be sent iResponseContentLength += MULTIPART_BOUNDARY.Length
iResponseContentLength += objFile.ContentType.Length
iResponseContentLength += alRequestedRangesBegin(iLoop).ToString.Length
iResponseContentLength += alRequestedRangesend(iLoop).ToString.Length
iResponseContentLength += objFile.Length.ToString.Length
' 49 is the length of line breaks and other necessary characters in multi-part downloads iResponseContentLength += 49
End If
Next iLoop
If bMultipart Then
' If it is a multi-part range request,
' We must also calculate the length of the last intermediate header that will be sent iResponseContentLength +=MULTIPART_BOUNDARY.Length
' 8 is the length of dash and newline iResponseContentLength += 8
Else
' Not a multi-part download, so we must specify the response range of the initial HTTP header objResponse.AppendHeader( HTTP_HEADER_CONTENT_RANGE, "bytes " & _
alRequestedRangesBegin(0).ToString & "-" & _
alRequestedRangesend(0).ToString & "/" & _
objFile.Length.ToString)
'End If
' Range response objResponse.StatusCode = 206 ' Partial response Else
' This is not a scope request, or the requested scope entity ID does not match the current entity ID,
'So start a new download' indicates that the size of the completed part of the file is equal to the length of the content iResponseContentLength =Convert.ToInt32(objFile.Length)
'Return to normal OK status objResponse.StatusCode = 200
End If
' Next the server must send several important response headers, such as content length, Etag, and file content type:
' Write the content length into the response objResponse.AppendHeader( HTTP_HEADER_CONTENT_LENGTH,iResponseContentLength.ToString)
' Write the last modified date into the response objResponse.AppendHeader( HTTP_HEADER_LAST_MODIFIED,objFile.LastWriteTimeUTC.ToString("r"))
' Tell the client software that we accepted the range request objResponse.AppendHeader( HTTP_HEADER_ACCEPT_RANGES,HTTP_HEADER_ACCEPT_RANGES_BYTES)
' Write the file's entity tag to the response (enclosed in quotes)
objResponse.AppendHeader(HTTP_HEADER_ENTITY_TAG, """" & objFile.EntityTag & """")
'Write content type to responseIf bMultipart Then
'Multipart messages have this special type' In the example the actual mime type of the file is written to the response later objResponse.ContentType = MULTIPART_CONTENTTYPE
Else
'The file content type owned by a single partial message objResponse.ContentType = objFile.ContentType
End If
Everything you need for downloading is ready and you can start downloading files. You will use a FileStream object to read chunks of bytes from a file. Set the State property of FileInformation instance objFile to fsDownloadInProgress. As long as the client remains connected, the server reads chunks of bytes from the file and sends them to the client. For multipart downloads, this code sends specific header information. If the client disconnects, the server sets the file status to fsDownloadBroken. If the server completes sending the requested range, it sets the status to fsDownloadFinished (see download code).
FileInformation Auxiliary Class
In the ZIPHandler section you will find that FileInformation is an auxiliary class that encapsulates download status information (such as downloading, interrupted, etc.).
To create an instance of FileInformation, you need to pass the path to the requested file to the constructor of the class:
Public Sub New(ByVal sPath As String)
m_objFile = New System.IO.FileInfo(sPath)
End Sub
FileInformation uses the System.IO.FileInfo object to obtain file information, which is exposed as properties of the object (such as whether the file exists, the full name of the file, size, etc.). This class also exposes a DownloadState enumeration, which describes the various states of the download request:
Enum DownloadState
' Clear: No download process, file may be maintaining fsClear = 1
'Locked: Dynamically created files cannot be changed fsLocked = 2
'In Progress: The file is locked and the download process is in progress fsDownloadInProgress = 6
'Broken: The file is locked, the download process was in progress, but was canceled fsDownloadBroken = 10
' Finished: The file is locked and the download process is completed fsDownloadFinished = 18
End Enum
FileInformation also provides the EntityTag attribute value. This value is hard-coded in the example code because the example code only uses one download file and that file will not be changed, but for a real application, you will provide multiple files, even dynamically To create files, your code must provide a unique EntityTag value for each file. Additionally, this value must change every time the file is changed or modified. This enables client software to verify that the chunks of bytes they have downloaded are still up to date. Here is the part of the sample code that returns the hardcoded EntityTag value:
Public ReadOnly Property EntityTag() As String
' EntityTag is used for the initial (200) response to the client, and for recovery requests from the client Get
' Create a unique string for the file.
' Note that as long as the file does not change, the unique code must be retained.
' However, if the file does change or is modified, this code must change.
Return "MyExampleFileID"
End Get
End Property
A simple and generally safe enough EntityTag might consist of the file name and the date the file was last modified. No matter what method you use, you must ensure that this value is truly unique and cannot be confused with the EntityTag of other files. I would like to dynamically name the created files in my application by client, customer, and zip code index, and store the GUID used as the EntityTag in the database.
The ZipFileHandler class reads and sets the public State property. After completing the download, it sets the State to fsDownloadFinished. At this time you can delete the temporary files. Here you generally need to call the Save method to maintain the state.
Public Property State() As DownloadState
Get
Return m_nState
End Get
Set(ByVal nState As DownloadState)
m_nState = nState
' Optional action: You can delete the file automatically at this time.
' If the status is set to Finished, you no longer need this file.
' If nState =DownloadState.fsDownloadFinished Then
'Clear()
'Else
'Save()
'End If
Save()
End Set
End Property
ZipFileHandler should call the Save method any time the file status changes to save the file status so that it can be displayed to the user later. You can also use it to save the EntityTag you created yourself. Please do not save the file state and EntityTag value in the Application, Session, or Cache - you must save the information across the life cycle of all these objects.
PrivateSubSave()
'Save the download status of the file to the database or XML file.
' Of course, if you don't create the file dynamically, you don't need to save this state.
End Sub
As mentioned earlier, the example code only handles an existing file (download.zip), but you can further enhance this program to create the requested file as needed.
When testing the sample code, your local system or LAN may be too fast to interrupt the download process, so I recommend that you use a slow LAN connection (reducing the site's bandwidth in IIS is a simulation method) or put The server is placed on the Internet.
Downloading files on the client is still a struggle. An incorrect or misconfigured web cache server operated by an ISP can cause large file downloads to fail, including poor download performance or early session termination. If the file size exceeds 255MB, you should encourage customers to use third-party download management software, although some recent browsers have basic download managers built in.
If you wish to extend the example code further, it may be helpful to consult the HTTP specification. You can establish MD5 checksums for downloads, adding them using the Content-MD5 header to provide a way to verify the integrity of downloaded files. The sample code does not involve other HTTP methods except GET and HEAD.