Chapter 3

Designing CGI Applications

by Jeffry Dwight


CONTENTS

A CGI application is much more like a system utility than a full-blown application. In general, scripts are task-oriented rather than process-oriented. That is, a CGI application has a single job to do-it initializes, does its job, and then terminates. This makes it easy to chart data flow and program logic. Even in a GUI environment, the application doesn't have to worry much about being event-driven: The inputs and outputs are defined, and the program will probably have a top-down structure with simple subroutines.

Programming is a discipline, an art, and a science. The mechanics of the chosen language, coupled with the parameters of the operating system and the CGI environment, make up the science. The conception, the execution, and the elegance (if any) can be either art or science. But the discipline isn't subject to artistic fancy and is platform-independent. This chapter deals mostly with programming discipline, concentrating on how to apply that discipline to your CGI scripts.

Chapter 4 "Understanding Basic CGI Elements," covers script elements in detail. In particular, you'll find a complete discussion of environment variables and parsing. I'll touch on these issues briefly, but only as they relate to script structure and planning.

In this chapter, I'll cover

CGI Script Structure

When your script is invoked by the server, the server passes information to the script in one of two ways: GET or POST. These two methods are known as request methods. The request method used is passed to your script via the environment variable called-appropriately enough-REQUEST_METHOD.

URL Encoding
The HTTP 1.0 specification calls for URL data to be encoded in such a way that it can be used on almost any hardware and software platform. Information specified this way is called URL-encoded; almost everything passed to your script by the server will be URL-encoded.
Parameters passed as part of QUERY_STRING or PATH_INFO will take the form variable1=value1&variable2=value2 and so forth, for each variable defined in your form.
Variables are separated by the ampersand (&). If you want to send a real ampersand, it must be escaped-that is, encoded as a two-digit hexadecimal value representing the character. Escapes are indicated in URL-encoded strings by the percent sign (%). Thus, %25 represents the percent sign itself. (25 is the hexadecimal, or base 16, representation of the ASCII value for the percent sign.) All characters above 127 (7F hex) or below 33 (21 hex) are escaped. This includes the space character, which is escaped as %20. Also, the plus sign (+) needs to be interpreted as a space character.
Before your script can deal with the data, it must parse and decode it. Fortunately, these are fairly simple tasks in most programming languages. Your script scans through the string looking for an ampersand. When an ampersand is found, your script chops off the string up to that point and calls it a variable. The variable's name is everything up to the equal sign in the string; the variable's value is everything after the equal sign. Your script then continues parsing the original string for the next ampersand, and so on, until the original string is exhausted.
After the variables are separated, you can safely decode them, as follows:
  1. Replace all plus signs with spaces.
  2. Replace all %## (percent sign followed by two hex digits) with the corresponding ASCII character.
It's important that the script scan through the string linearly rather than recursively because the characters the script decodes may be plus signs or percent signs.
When the server passes data to your form with the POST method, the script checks the environment variable called CONTENT_TYPE. If CONTENT_TYPE is application/x-www-form-urlencoded, your data needs to be decoded before use.

The basic structure of a CGI application is simple and straightforward: initialization, processing, output, and termination. Because this chapter deals with concepts, flow, and programming discipline, I'll use pseudocode rather than a specific language for the examples.

Ideally, a script has the following form (with appropriate subroutines for do-initialize, do-process, and do-output):

  1. Program begins
  2. Call do-initialize
  3. Call do-process
  4. Call do-output
  5. Program ends

Real life is rarely this simple, but I'll give the nod to proper form while acknowledging that you'll seldom see it.

Initialization

The first thing your script must do when it starts is determine its input, environment, and state. Basic operating-system environment information can be obtained the usual way: from the system registry in NT, from standard environment variables in UNIX, from INI files in Windows, and so forth.

State information will come from the input rather than the operating environment or static variables. Remember: Each time CGI scripts are invoked, it's as if they've never been invoked before. The scripts don't stay running between calls. Everything must be initialized from scratch, as follows:

  1. Determine how the script was invoked. Typically, this involves reading the environment variable REQUEST_METHOD and parsing it for the word GET or the word POST.

NOTE
Although GET and POST are the only currently defined operations that apply to CGI, you may encounter PUT or HEAD from time to time if your server supports it and the user's browser uses it. PUT was offered as an alternative to POST, but never received approved RFC status and isn't in general use. HEAD is used by some browsers to retrieve just the headers of an HTML document and isn't applicable to CGI programming; other oddball request methods may be out there, too. Your code should check explicitly for GET and POST and refuse anything else. Don't assume that if the request method isn't GET then it must be POST, or vice versa.

  1. Retrieve the input data. If the method was GET, you must obtain, parse, and decode the QUERY_STRING environment variable. If the method was POST, you must check QUERY_STRING and also parse STDIN. If the CONTENT_TYPE environment variable is set to application/x-www-form-urlencoded, the stream from STDIN needs to be unencoded too.

The following is the initialization phase in pseudocode:

retrieve any operating system environment values desired
allocate temporary storage for variables
if environment variable REQUEST_METHOD equals "GET" then
        retrieve contents of environment variable QUERY_STRING;
        if QUERY_STRING is not null, parse it and decode it;
else if REQUEST_METHOD equals "POST" then
        retrieve contents of environment variable QUERY_STRING;
        if QUERY_STRING is not null, parse it and decode it;
        retrieve value of environment variable CONTENT_LENGTH;
        if CONTENT_LENGTH is greater than zero, read CONTENT_LENGTH bytes from STDIN;
        parse STDIN data into separate variables;
        retrieve contents of environment variable CONTENT_TYPE;
        if CONTENT_TYPE equals application/x-www-form-urlencoded, then decode parsed variables;
else if REQUEST_METHOD is neither "GET" nor "POST" then
        report an error;
        deallocate temporary storage;
        terminate
end if

Processing

After initializing its environment by reading and parsing its input, the script is ready to get to work. What happens in this section is much less rigidly defined than during initialization. During initialization, the parameters are known (or can be discovered), and the tasks are more or less the same for every script you'll write. The processing phase, however, is the heart of your script, and what you do here will depend almost entirely on the script's objectives.

  1. Process the input data. What you do here will depend on your script. For instance, you may ignore all the input and just output the date; you may spit back the input in neatly formatted HTML; you may hunt up information in a database and display it; or you may do something never thought of before. Processing the data means, generally, transforming it somehow. In classical data processing terminology, this is called the transform step because in batch-oriented processing, the program reads a record, applies some rule to it (transforming it), and then writes it back out. CGI programs rarely, if ever, qualify as classical data processing, but the idea is the same. This is the stage of your program that differentiates it from all other CGI programs-where you take the inputs and make something new from them.
  2. Output the results. In a simple CGI script, the output is usually just a header and some HTML. More complex scripts might output graphics, graphics mixed with text, or all the information necessary to call the script again with some additional information. A common and rather elegant technique is to call a script once by using GET, which can be done from a standard <a href> tag. The script senses that it was called with GET and creates an HTML form on the fly-complete with hidden variables and code necessary to call the script again, this time with POST.

Row, Row, Row Your Script…
In the UNIX world, a character stream is a special kind of file. STDIN and STDOUT are character streams by default. The operating system helpfully parses streams for you, making sure that everything going through is proper 7-bit ASCII or an approved control code.

Seven-bit? Yes. For HTML, this doesn't matter. However, if your script sends graphical data, using a character-oriented stream means instant death. The solution is to switch the stream over to binary mode. In C, you do this with the setmode() function: setmode(fileno(stdout), O_BINARY). You can change horses in midstream with the complementary setmode(fileno(stdout), O_TEXT). A typical graphics script will output the headers in character mode, and then switch to binary mode for the graphical data.

In the NT world, streams behave the same way for compatibility reasons. A nice simple \n in your output gets converted to \r\n for you when you write to STDOUT. This doesn't happen with regular NT system calls, such as WriteFile(); you must specify \r\n explicitly if you want CRLF.

Alternate words for character mode and binary mode are cooked and raw, respectively-those in the know will use these terms instead of the more common ones.

Whatever words you use and on whatever platform, there's another problem with streams: by default, they're buffered, which means that the operating system hangs onto the data until a line-terminating character is seen, the buffer fills up, or the stream is closed. This means that if you mix buffered printf() statements with unbuffered fwrite() or fprintf() statements, things will probably come out jumbled, even though they may all write to STDOUT. Printf() writes buffered to the stream; file-oriented routines output directly. The result is an out-of-order mess.You may lay the blame for this straight at the feet of the god known as Backward Compatibility. Beyond the existence of many old programs, streams have no reason to default to buffered and cooked. These should be options that you turn on when you want them-not turn off when you don't. Fortunately, you can propitiate this god of Backward Compatibility with the simple incantation, setvbuf(stdout, NULL, _IONBF, 0), which turns off all buffering for the STDOUT stream.

Another solution is to avoid mixing types of output statements; even so, that won't make your cooked output raw, so it's a good idea to turn off buffering anyway. Many servers and browsers are cranky and dislike receiving input in drabs and twaddles.

NOTE
Those who speak mainly UNIX will frown at the term CRLF, while those who program on other platforms might not recognize \n or \r\n. CRLF, meet \r\n. \r is how C programmers specify a carriage return (CR) character; \n is how C programmers specify a line feed (LF) character. (That's Chr$(10) for LF and Chr$(13) for CR to you Basic programmers.)

The following is a pseudocode representation of a simple processing phase whose objective is to recapitulate all the environment variables gathered in the initialization phase:

output header "content-type: text/html\n"
output required blank line to terminate header "\n"
output "<HTML>"
output "<H1>Variable Report</H1>"
output "<UL>"
for each variable known
        output "<LI>"
        output variable-name
        output "="
        output variable-value
loop until all variables printed
output "</UL>"
output "</HTML>"

This has the effect of creating a simple HTML document containing a bulleted list. Each item in the list is a variable, expressed as name=value.

Termination

Termination is nothing more than cleaning up after yourself and quitting. If you've locked any files, you must release them before letting the program end. If you've allocated memory, semaphores, or other objects, you must free them. Failure to do so may result in a "one-shot wonder" of a script-one that works only the first time, but breaks on every subsequent call. Worse yet, your script may hinder, or even break, other scripts or even the server itself by failing to free up resources and release locks.

On some platforms-most noticeably Windows NT and, to a lesser extent, UNIX-your file handles and memory objects are closed and reclaimed when your process terminates. Even so, it's unwise to rely on the operating system to clean up your mess. For instance, under NT, the behavior of the file system is undefined when a program locks all or part of a file and then terminates without releasing the locks.

Make sure that your error-exit routine-if you have one (and you should)-knows about your script's resources and cleans up just as thoroughly as the main exit routine does.

Planning Your Script

Now that you've seen a script's basic structure, you're ready to learn how to plan a script from the ground up.

In the old days (circa 1950), planning a program meant reams of paper, protractors, rulers, chalkboards, punch tape, stacks of cards, and endless cups of coffee during endless meetings with other white-coated technicians, each of whom could parse your machine code and compare cycles and pre-fetch queue efficiency in his head. Programs emerging from this method of planning tended to be brutal, short, and ugly-but amazingly efficient.

Later on, in the 1970s, planning a program meant reading dozens of weighty tomes, each of which went on at great length and obscurity about discipline, flow charts, data flow versus program logic, the foolishness (or wisdom) of top-down design, inheritablity, reusability, encapsulation, data integrity, and so forth. One then drank endless cups of coffee while attending endless meetings and shouting quotations from the books at the other participants.

Finally, someone would notice that the customer was about to sign with another company, and nip off to write the program while everyone else was still arguing. The resultant program was almost always put into use immediately and got the job done. The only thing with a longer life cycle than a program written this way is the ongoing, raging discussion about its inadequacies and lack of adherence to proper rules of structure.

In still more recent times, circa 1980, programs got designed by folks wearing blue jeans, sandals, glasses, and long hair. They seldom attended meetings, and if they did, they never paid attention. They doodled, talked to themselves, kept odd hours, and drank either Mountain Dew or mineral water, depending on whether they worked in California or not. They were known to consume mass quantities of almost raw red meat, and to call for pizza at least once a day-often, first thing in the morning.

Sometimes they appeared to be working, but it was hard to tell because the lights were always off in their offices. Eventually they emerged with a program-arrived at by mystical processes unknown to common man-and when asked for the documentation, would tap their foreheads and smile knowingly, and then go home to sleep. These programs provided the foundation of the modern software industry. Scary, huh?

In the 1990s, program design most often happens after a short meeting or two to discuss requirements and milestones in a non-smoking, caffeine-free, hypoallergenic environment. Then someone gets assigned to chart the data flow and program logic, usually using Visio or some such program to drag perfect little boxes and lines around on-screen until the project manager is happy. Another person or team usually codes the project, translating each of the little boxes into a single, simple subroutine that performs a single, simple task. Then the documentation and user-training team moves in, and the programmers can go home until the next project.

Although each approach has its advantages and drawbacks, CGI scripts benefit most from the 1980s model with the discipline of the 1970s, the documentation of the 1990s, and a dash of efficiency from the old days. Follow these steps:

  1. Take your time defining the program's task. Think it through thoroughly. Write it down and trace the program logic. (Doodling is fine; Visio is overkill.) When you're satisfied that you understand the input and output and the transform process you'll have to do, proceed.
  2. Order a pizza and a good supply of your favorite beverage, lock yourself in for the night, and come out the next day with a finished program. The glasses, jeans, and long hair are optional accessories. Don't forget to document your code while writing it.
  3. Test, test, test. Use every browser known to mankind and every sort of input you can think of. Especially test for the situations in which users enter 32K of data in a 10-byte field or they enter control codes where you're expecting plain text.
  4. Document the program as a whole, too--not just the individual steps within it--so that others who have to maintain or adapt your code will understand what you were trying to do.

Step 1, of course, is this section's topic, so let's look at that process in more depth:

NOTE
Programmers use semaphores to coordinate among multiple programs, multiple instances of the same program, or even among routines within a single program. Some operating systems have support for semaphores built-in; others require the programmers to develop a semaphore strategy.
In the simplest sense, a semaphore is like a toggle switch whose state can be checked: Is the switch on? If so, do this; if not, do that. Often, files are used as semaphores (does the file exist? If so, do this; if not, do that). A more sophisticated method is to try to lock a file for exclusive access (if you can get the lock, do this; if not, wait a bit and try again).
In CGI programming, semaphores are used most often to coordinate among multiple instances of the same CGI script. If, for instance, your script must update a file, it can't assume that the file is available at all times. What if another instance of the same script is in the middle of updating the file right then? The second process must wait until the first one is finished, or else the file will become hopelessly corrupted. The solution is to use a semaphore. Your script checks to make sure that the semaphore is clear. If not, it goes into a short loop, checking the semaphore periodically. After the semaphore is clear, it sets the semaphore so that no other program will interfere. It then performs its critical section--in this case, writing to a file--and clears the semaphore again. Other instances can then each take a turn. The semaphore thus provides a way to manage concurrency safely.

NOTE
An early-out algorithm is one that tests for the exception, or least-significant case, and exits with a predefined answer rather than exercise the algorithm to determine the answer. For example, division algorithms usually test for a divide by two operation, and do a shift instead of divide.

Standard CGI Environment Variables

Here's a brief overview of the standard environment variables you're likely to encounter. Each server implements the majority of them consistently, but there are variations, exceptions, and additions. In general, you're more likely to find a new, otherwise undocumented variable rather than a documented variable omitted. The only way to be sure, though, is to check your server's documentation.

Chapter 4 "Understanding Basic CGI Elements," deals with each variable in some depth. This section is taken from the NCSA specifications and is the closest thing to "standard" as you'll find. In case you've misplaced the URL for the NCSA CGI specification, here it is again:

http://www.w3.org/hypertext/WWW/CGI/

The following environment variables are set each time the server launches an instance of your script, and are private and specific to that instance:

NOTE
AUTH_TYPE and REMOTE_USER are set only after a user successfully authenticates (usually via a user name and password) his identity to the server. Hence, these variables are useful only when restricted areas are established, and then only in those areas.

CGI Script Portability

CGI programmers face two portability issues: platform independence and server independence. By platform independence, I mean the capability of the code to run without modification on a hardware platform or operating system different from the one for which it was written. Server independence is the capability of the code to run without modification on another server using the same operating system.

Platform Independence

The best way to keep your CGI script portable is to use a commonly available language and avoid platform-specific code. It sounds simple, right? In practice, this means using either C or Perl and not doing anything much beyond formatting text and outputting graphics.

Does this leave Visual Basic, AppleScript, and UNIX shell scripts out in the cold? Yes, I'm afraid so-for now. However, platform independence isn't the only criterion to consider when selecting a CGI platform. There's also speed of coding, ease of maintenance, and ability to perform the chosen task.

Certain types of operations simply aren't portable. If you develop for 16-bit Windows, for instance, you'll have great difficulty finding equivalents on other platforms for the VBX and DLL functions you use. If you develop for 32-bit Windows NT, you'll find that all your asynchronous Winsock calls are meaningless in a UNIX environment. If your shell script does a system() call to launch grep and pipe the output back to your program, you'll find nothing remotely similar in the NT environment. And AppleScript is good only on Macs-period!

If one of your mandates is the capability to move code among platforms with a minimum of modification, you'll probably have the best success with C. Write your code using the standard functions from the ANSI C libraries, and avoid making other operating system calls. Unfortunately, following this rule will limit your scripts to very basic functionality. If you wrap your platform-dependent code in self-contained routines, however, you minimize the work needed to port from one platform to the next. As you saw earlier in the section "Planning Your Script," when talking about encapsulation, a properly designed program can have any module replaced in its entirety without affecting the rest of the program. Using these guidelines, you may have to replace a subroutine or two, and you'll certainly have to recompile; however, your program will be portable.

Perl scripts are certainly easier to maintain than C programs, mainly because there's no compile step. You can change the program quickly when you figure out what needs to be changed. And there's the rub: Perl is annoyingly obtuse, and the libraries tend to be much less uniform-even between versions on the same platform-than do C libraries. Also, Perl for NT is fairly new and still quirky (as if anything related to Perl can be called more quirky than another part).

If, however, you dream of bit masks, think two-letter code words are more descriptive than named functions, and believe in your heart that programming syntax should be as convoluted and chock full of punctuation as possible, then you and Perl are soul mates. You won't have much trouble porting your application among platforms once you identify the platform-dependencies and find (or write) libraries for the standard functions.

Server Independence

Far more important than platform independence (unless you're writing scripts only for your own pleasure) is server independence. Server independence is fairly easy to achieve, but for some reason seems to be a stumbling block to beginning script writers. To be server independent, your script must run without modification on any server using the same operating system. Only server-independent programs can be useful as shareware or freeware, and without a doubt, server independence is a requirement for commercial software.

Most programmers think of obvious issues, such as not assuming that the server has a static IP address. The following are some other rules of server independence that, although obvious once stated, nevertheless get overlooked time and time again:

CGI Libraries

When you talk about CGI libraries, there are two possibilities: libraries of code you develop and want to reuse in other projects, and publicly available libraries of programs, routines, and information.

Personal Libraries

If you follow the advice given earlier in the "Planning Your Script" section about writing your code in a black-box fashion, you'll soon discover that you're building a library of routines that you'll use over and over. For instance, after you puzzle out how to parse out URL-encoded data, you don't need to do it again. And when you have a basic main() function written, it will probably serve for every CGI program you ever write. This is also true for generic routines, such as querying a database, parsing input, and reporting runtime errors.

How you manage your personal library depends on the programming language you use. With C and assembler, you can precompile code into actual .lib files, with which you can then link your programs. Although possible, this likely is overkill for CGI and doesn't work for interpreted languages, such as Perl and Visual Basic. (Although Perl and VB can call compiled libraries, you can't link with them in a static fashion the way you can with C.) The advantage of using compiled libraries is that you don't have to recompile all your programs when you change code in the library. If the library is loaded at runtime (a DLL), you don't need to change anything. If the library is linked staticly, all you need to do is relink.

Another solution is to maintain separate source files and simply include them with each project. You might have a single, fairly large file that contains the most common routines while putting seldom-used routines in files of their own. Keeping the files in source format adds a little overhead at compile time, but not enough to worry about-especially when compared to the time savings you gain by writing the code only once. The disadvantage of this approach is that when you change your library code, you must recompile all your programs to take advantage of the change.

Nothing can keep you from incorporating public-domain routines into your personal library either. As long as you make sure that the copyright and license allow you to use and modify the source code without royalties or other stipulations, you should strip out the interesting bits and toss them into your library.

Well-designed and well-documented programs provide the basis for new programs. If you're careful to isolate the program-specific parts into subroutines, there's no reason not to cannibalize an entire program's structure for your next project.

You can also develop platform-specific versions of certain subroutines and, if your compiler will allow it, automatically include the correct ones for each type of build. At the worst, you'll have to manually specify which subroutines you want.

The key to making your code reusable this way is to make it as generic as possible. Not so generic that, for instance, a currency printing routine needs to handle both yen and dollars, but generic enough that any program that needs to print out dollar amounts can call that subroutine. As you upgrade, swat bugs, and add capabilities, keep each function's inputs and outputs the same, even when you change what happens inside the subroutine. This is the black-box approach in action. By keeping the calling convention and the parameters the same, you're free to upgrade any piece of code without fear of breaking older programs that call your function.

Another technique to consider is using function stubs. Say that you decide eventually that a single routine to print both yen and dollars is actually the most efficient way to go. But you already have separate subroutines, and your old programs wouldn't know to pass the additional parameter to the new routine. Rather than go back and modify each program that calls the old routines, just "stub out" the routines in your library so that the only thing they do is call the new, combined routine with the correct parameters. In some languages, you can do this by redefining the routine declarations; in others, you actually need to code a call and pay the price of some additional overhead. But even so, the price is far less than that of breaking all your old programs.

Public Libraries

The Internet is rich with public-domain sample code, libraries, and precompiled programs. Although most of what you'll find is UNIX-oriented (because it has been around longer), there's nevertheless no shortage of routines for Windows NT.

Here's a list of some of the best sites on the Internet with a brief description of what you'll find at each site. This list is far from exhaustive. Hundreds of sites are dedicated to, or contain information about, CGI programming. Hop onto your Web browser and visit your favorite search engine. Tell it to search for "CGI" or "CGI libraries" and you'll see what I mean. To save you the tedium of wading through all the hits, I've explored them for you. The following are the ones that struck me as most useful:

I could go on listing sites forever, it seems, but that's enough to get you started.

CGI Limitations

By far, the biggest limitation of CGI is its statelessness. As you learned in Chapter 1 "Introducing CGI," an HTTP Web server doesn't remember callers between requests. In fact, what appears to the user as a single page may actually be made up of dozens of independent requests-either all to the same server or to many different servers. In each case, the server fulfills the request, then hangs up and forgets the user ever dropped by.

The capability to remember what a caller was doing the last time through is called remembering the user's state. HTTP, and therefore CGI, doesn't maintain state information automatically. The closest things to state information in a Web transaction are the user's browser cache and a CGI program's cleverness. For example, if a user leaves a required field empty when filling out a form, the CGI program can't pop up a warning box and refuse to accept the input. The program's only choices are to output a warning message and ask the user to click the browser's back button; or output the entire form again, filling in the value of the fields that were supplied and letting the user try again, either correcting mistakes or supplying the missing information.

There are several workarounds for this problem, none of them terribly satisfactory. One idea is to maintain a file containing the most recent information from all users. When a new request comes through, hunt up the user in the file and assume the correct program state based on what the user did the last time. The problems with this idea are that it's very hard to identify a Web user, and a user may not complete the action, yet visit again tomorrow for some other purpose. An incredible amount of effort has gone into algorithms to maintain state only for a limited time period-a period that's long enough to be useful, but short enough not to cause errors. However, these solutions are terribly inefficient and ignore the other problem-identifying the user in the first place.

You can't rely on the user to provide his identity. Not only do some want to remain anonymous, but even those who want you to know their names can misspell it from time to time. Okay, then, what about using the IP address as the identifier? Not good. Everyone going through a proxy uses the same IP address. Which particular employee of Large Company, Ltd., is calling at the moment? You can't tell. Not only that, but many people these days get their IP addresses assigned dynamically each time they dial in. You certainly don't want to give Joe Blow privileges to Jane Doe's data just because Joe got Jane's old IP address this time.

The only reliable form of identity mapping is that provided by the server, using a name-and-password scheme. Even so, users simply won't put up with entering a name and password for each request, so the server caches the data and uses one of those algorithms mentioned earlier to determine when the cache has gone invalid.

Assuming that the CEO of your company hasn't used his first name or something equally guessable as his password, and that no one has rifled through his secretary's drawer or looked at the yellow sticky note on his monitor, you can be reasonably sure that when the server tells you it's the CEO, then it's the CEO. So then what? Your CGI program still has to go through hoops to keep your CEO from answering the same questions repeatedly as he queries your database. Each response from your CGI program must contain all the information necessary to go backward or forward from that point. It's ugly and tiresome, but necessary.

The second main limitation inherent in CGI programs is related to the way the HTTP spec is designed around delivery of documents. HTTP was never intended for long exchanges or interactivity. This means that when your CGI program wants to do something, such as generate a server-pushed graphic, it must keep the connection open. It does this by pretending that multiple images are really part of the same image.

The poor user's browser keeps displaying its "connection active" signal, thinking it's still in the middle of retrieving a single document. From the browser's point of view, the document just happens to be extraordinarily long. From your script's point of view, the document is actually made up of dozens-perhaps hundreds-of separate images, each one funneled through the pipe in sequence and marked as the next part of a gigantic file that doesn't really exist anywhere.

Perhaps when the next iteration of the HTTP specification is released, and when browsers and servers are updated to take advantage of a keep-alive protocol, we'll see some real innovation. In the meantime, CGI is what it is, warts and all. Although CGI is occasionally inelegant, it's nevertheless still very useful-and a lot of fun.