Web applications are essentially made up of functions that map inputs (requests) to outputs (responses).

This article looks at a way to store the relationship between request and response and pre-generate responses, thus reducing the resources needed fulfill requests.

A conventional website usually consists of a mix of static and dynamic content. From the developer's standpoint, this fact presents a series of decisions. Which pages should be built dynamically by the server application, and which should be "static" (an .html file in a web directory)? Answering this question can lead to an awkward partitioning of the site's pages , thus complicating the development effort. However, the fact that certain content is unlikely to change does not preclude it from being stored in a table and dynamically built into pages. In fact, this sort of normalization of the site's content can be a good move for the long-term maintenance of the site.

But, this flexibility comes at the cost of a back-end dynamic hit for every request. This cost is negligible for a low-traffic site, but can be prohibitive if the traffic picks up (which is usually the goal right?). Also, it can be difficult to manage the Internet search engine registration for a site that is largely dynamic. This article discusses a method whereby one can enjoy the benefits of a completely dynamic site while still serving static pages to users when appropriate.

Typically, we view our dynamic web applications as services that quickly dish out customized content based on user requests. Consider looking at your application in a slightly different way. What if the resulting content was stored to a reusable file (or memo field) instead of being pushed out the http stream? If you look at it "outside the box" for a moment, you can see that your application can actually be a very efficient tool for generating static content.

Obviously, there are limitations to this approach. We can't pre-generate static html content for every possible user request in a typical application. For example, if a user-defined search interface if offered, the results for those searches will have to be built dynamically. However, there is often core content that changes little, if at all, from day to day. Furthermore, there is usually a subset of content that is requested much more than any other (for example, the site's home page). Finally, there is often some content that can be quite resource intensive to generate. Tables based on complex database queries are a good example. In this article, we will target for pre-generation any content that falls into one or more of these categories.

There are two issues at hand. First, the static pages must be generated (and likely re-generated when updates to the site's data are made). Next, the intra-site links on both the static and dynamic pages must be synchronized, so that the user is diverted to an appropriate static page whenever possible. Finally, the system should be resilient and stable during the design changes and the inevitable switching of pages from dynamic to static, or vice-versa, as the site evolves.

Let's look first at the basic concept, which can be applied to many different situations. Then, we'll give a sample implementation of the concept using the West Wind Web Connection framework.

The Concept

The big idea here is to think of your site as a mathematical function that produces a consistent result for a given input. The inputs that influence the result produced by a dynamic site can include query string parameters, browser variables, form field values, cookies, session information, and so forth. Your site will produce responses in accordance with the values of each of these. The key is to maintain a record of the mapping between each possible request the result associated with it.

A simple example

Suppose you have a site that includes a home page containing a link to the library page, where a list of book categories is presented. Each category allows a drill-down link to a dynamically generated list of books in that category. It is not too hard to image the books table that would drive this site. Let's keep it simple and say that the table is denormalized to some extent and includes a category field. So, to generate the base categories page, some sort of SELECT DISTINCT cCategory ... query is needed. Then, each category's listing page would be driven by something like: SELECT cTitle, cAuthor... WHERE cCategory=lcCategory, etc... Also note that the home page may or may not include dynamic content, but it will probably need to be based on a similar layout to the other pages.

Given this scenario, imagine that the books table gets updated weekly. Every hit to the library categories and category listings will generate the exact same response between book table updates. In Figure 1, this would be represented by the same arrow being drawn and the same dynamic content being generated repeatedly. If we could capture this mapping, generate a static response for these common mappings and point the links to the static response files rather than the back-end application, we could reduce the load on this application significantly. For now, we'll take the somewhat simplistic view that these requests are determined by the query string alone. We will also simplify by generating complete .html files, thus completely eliminating the need for a back-end database hit. Here is a very simple example of what this mapping might look like:

Figure 1 - Web applications can be viewed as functions that provide a mapping from a "request set" to a "response set".

The idea then, is to present the user with links that look like:

<a href="library-history.html">  

which will serve the static file, rather than

<a href="wc.dll?library~history"> 

which will serve the same result, but will require the back-end application to do the work.

Clearly, this mapping scheme would need to be expanded to include named parameters (which are not order specific), but the idea would be the same. Also note that by focusing on query string, we are potentially missing many aspects of the request space, some of which will require the application logic. IIS can resolve requests for static files for us, but inspecting a cookie or a browser variable typically requires a little more work.

However, note that we are not limited to generating .html files here. We could just as easily generate .asp or .wcs files, which contain the bulk of the dynamic content already pre-generated (see Figure 2). Then, simple pre-generated scripts can resolve the final nuances of the response, based on other aspects of the request. The application has to do some work, but an expensive backend data query is avoided. And, if the application is used to pre-generate the static files, we are in a position to maintain the look and feel of the site from one source, namely the application code, rather than a mix of application code and static html files.

Figure 2 - Determining a sub-set of responses that can be pre-generated and mapping requests to for that content to static files when possible can substantially reduce the system resources required by a web application.

Web Connection Implementation Example

In order to generate a static page instead of a standard HTML response, we will need to have our content written to a file instead of into the server's HTTP output stream. How you implement this will depend in part on how your application is designed. An example that will work with West Wind Web Connection right out of the box is provided here. However, it is possible that this exact approach will not be the best approach for your application. Build yours to suit your needs.

I will point out one important design issue here. The implementation of alternative result destinations (in our case, an html or script file rather than the HTTP output stream) may be much more elegant if your application code is not peppered with raw Response.Write() calls. It is much better to break the content generation down into a reasonable granularity and build strings for each piece of content. Then, supply your application with a central method (or better, class) that builds the final result from the constituent strings and sends it to the appropriate location (HTTP stream, static file, etc). This approach provides many potential hook points at which you can read content from alternate sources and divert it to alternate destinations. We won't assume this approach in this sample code, but it is something to consider if you are building a new application.

Two aspects of implementation

We have two basic tasks at hand. First, we need a way to generate the static pages. Second, we need a way to generate links to those static pages. Both of these tasks will be accomplished with the help of our request-response mapping table. For this example, we will accommodate script maps as well as process method calls based on the query-string.

We require a mechanism to maintain the relationship between possible query-strings and static file names. This can be done via a simple lookup table with three fields: cScriptMap, cQS and cFileName. The cQS field holds the query string, and if needed, cScriptMap holds the script map assignment for that process method. The cFileName field holds the name of the static file that corresponds to that query string.

Using our library page example, and the "history" book category, sample records from our table may look like this:

Note that there is some redundancy here, since this mapping allows multiple ways to access the same result. Depending on how your application request pathing is set up, you will likely need only a subset of these records.

We now need to arrange things so that, if the book data has not changed, requests to

http://www.yoursite.com/wc.dll?library~history

http://www.yoursite.com/wc.dll?library~&;category=history

http://www.yoursite.com/library.code~&;category=history

http://www.yoursite.com/library-history.html

will produce exactly the same result. The difference will be that the last request does not involve any web application effort; the web-server simply dishes up the html file.

These static files can be generated in response to a web request (possibly password protected), thus allowing pages to be updated by a simple web hit. Here is a sample Web Connection process class method, coded in Visual Foxpro, for generating the static pages:

***********************************************
* MyProcess::GenerateStaticPages
***********************************************
FUNCTION GenerateStaticPages()
LOCAL lcOutPut, lcFileName, lcHTMLPath

lcHTMLPath = Server.oConfig.oMyProcess.cHTMLPagePath

*-- assuming ResourceLinks is open

SELECT * from ResourceLinks INTO CURSOR GenStatic

SCAN
  lcFileName = cFileName
  REQUEST.cQueryString  = Alltrim(cQS)
  REQUEST.ParseQueryString()
  IF EMPTY(cScriptMap)
     lcMethod = REQUEST.QueryString[2]
  ELSE
     lcMethod = JUSTSTEM(cScriptMap)
  ENDIF
  RESPONSE.Reset()
  EVALUATE([this.]+lcMethod+[()])
  lcOutput = RESPONSE.GetOutput()
  *--- strip the content-type header
  lcOutput = SUBSTR(lcOutput,AT([<],lcOutput))
  	       STRTOFILE(lcOutput, ;	
             ADDBS(lcHTMLPath)+ ;
  	       lcFileName)
ENDSCAN
USE IN GenStatic		
RESPONSE.Reset()
Response.Write("Static pages generated.")
ENDFUNC

The site must generate <a href="..." links in accordance with the location of the content. And, we must do this both on the static pages (since the user may travel from one static page to another) and on purely dynamic pages (since we want to direct the user to the static content whenever possible). For example, consider our history book listing page, which is accessed via the following query string:

library~history 

In each and every place we wish to possibly offer a static page in place of a dynamic one, we will generate the link via:

ResolveLink("library~history")

For example, in code:

lcHTML = [<a href="]+ ;
This.ResolveLink("library~history")+ ;
[">History Books</a>]

Or, in a script:

<a href="
<%=PROCESS.ResolveLink("library~history")%>">
History Books</a>

The ResolveLink method determines the proper href attribute for that particular query string and, if possible, directs the link to the static resource. In this case, we would get something like:

<a href="library-history.html">History Books</a>

Let's look at the Web Connection process class method for resolving the links:

***********************************************
* MyProcess::ResolveLink
***********************************************
FUNCTION ResolveLink( tcQS, tcSM)
LOCAL lcHTMLPath

tcScriptMap   = IIF(EMPTY(tcSM), ;
                    '',tcSM)	
tcQueryString = lower(alltrim(tcQS))
tcScriptMap   = lower(alltrim(tcSM))
lcHTMLPath    =
Server.oConfig.oMyProcess.cHTMLPagePath

*-- assuming ResourceLinks is open
SELECT ResourceLinks	

LOCATE FOR ;
  lower(alltr(cQS)) == tcQS ;
    AND lower(alltr(cScriptMap)) == tcSM

  IF FOUND() AND FILE(ADDBS(lcHTMLPath)+ ;
			    ResourceLinks.cFileName) 
    RETURN ResourceLinks.cFileName
  ELSE
    RETURN IIF(EMPTY(tcSM),[wc.dll],tcSM)+ ;
           IIF(EMPTY(tcQS),[],[?])+tcQS
  ENDIF
ENDFUNC

At this point, we are pretty much done. A call to GenerateStaticPages() will re-generate content for all the query strings listed in the ResourceLinks table. And, ResolveLink() will create href attributes for all dynamic hits and will direct the user to the static content when it exists. So long as we remember to regenerate the content when the site's underlying data changes, the user's experience will be the same, and we will have a mich lighter load on the application. If we always use ResolveLink() when producing href attributes throughout the application, we can add or remove pages from the static generation list by updating the ResourceLinks table as needed.

Improvements and Extensions

If updates to your site's data are frequent, you might consider automating the static page generation by firing that method periodically in a separate process on a set interval, or in response to certain events. Also, Consider that stock .html files need not be the final word here. Your application could just as easily produce .asp or .wcs files or other server-side scripted files. Thus, these "semi-static" pages could be processed on demand with the resource-intensive work already done. This approach can be helpful if small customizations are required, for example dynamically generating a link to a .css file based on the client's browser.

Also, note that generating entire pages (.html or scripts) is only one option. Certain resource intensive sub-page content could be pre-generated and stored in memo fields. This partial content may be an HTML Table, a <DIV> section, a menu bar, or some other portion of a page. The application could draw on these pre-generated content 'snippets' to build pages. The partial content could also be stored in a presentation-independent manner using XML. This way, user-based customizations (suppression of certain fields, user-based color schemes, etc.) could be provided as the HTML was built from the pre-generated XML result. In practice, it is likely that an approach that encapsulates a spectrum of static content storage options, from completely static .html files to memo field based XML query results, will be best for large-scale applications.

Special thanks to Randy Pearson (randyp@cycla.com) for his excellent advice and ideas on this article, and on content caching in general.