Chapter 4 Understanding Basic CGI Elements

by Bill Schongar

CONTENTS

CGI Behind the Scenes
Environment Variables: Information for the Taking
Dealing with URL-Encoded Information
- Encoding
- Decoding (Parsing) Routines
Use Your Header…
Returning Output to the Users
- STDOUT
- File-Based Output

Using CGI programs is somewhat like ordering a pizza and having it delivered: you call, someone makes it, and then someone sends the pizza to your place. With CGI, you send a request, the server processes it, and you get back the results. The whole goal is that someone (or something) else is supposed to take care of processing the information that you send: Do you want extra cheese? Pepperoni and/or sausage? Anchovies? All the instructions and conditions you send have to be considered as part of the whole operation; otherwise, you have no use for what gets delivered to you.

Whether giving instructions to a pizza place or sending a registration form through CGI, the process is the same: You initiate a conversation to tell someone, or something, what you want done. The main difference, however, is that the pizza place normally doesn't keep you on hold while someone makes your pizza.

The information you send as part of your request to whomever (or whatever) processes it determines the output. To make sure that you're understood, you have to communicate clearly and pass on information that makes sense to the receiving end. The basic elements of CGI that hold the information and keep track of what format it's in are available to help you with that process. In Chapter 3, "Designing CGI Applications," you learned how to plan your application; the chapter also introduced you to some of the basic CGI elements involved in that planning process. In this chapter, you look at some more specifics of those elements and a few others, including

Understanding environment variables
Retrieving data from environment variables
Parsing information
Formatting for output

CGI Behind the Scenes

The Common Gateway Interface, or CGI, is really nothing more than a standard communication method that makes sure that information between the client and the server gets sent in an understandable manner. Imagine that everyone in the world, regardless of language, used a standard form. It could be a form used for job applicants, a vacation request, a pizza order, or a grocery list-the actual purpose wouldn't matter. What would matter is that anyone who looked at that form would recognize it and could understand what data was contained on it. You wouldn't have to be able to pick out the word name in 35 different languages to be able to find another person's name on that form. Although the language would vary, you would know that the name goes in a specific box, and you could pick out that box. If the form had a common format, language wouldn't be as much of a barrier.

Virtually Satisfy Your Appetite

Often, the first implementations of a neat or useful server technology come in a funny form. Knowing that the "average" programmer diets on pizza and soda, is it any surprise to know that CGI is alive and well both for delivering food or checking to see whether the machine downstairs has any cold sodas left?

Although it won't curb your appetite, you can see one of the earliest (and still coolest) hybrids of CGI and pizza at http://www.ecst.csuchico.edu/~pizza/-home of the original Internet Pizza Server. If it makes you hungry, don't worry-it has links to places such as Little Caesars Virtual Pizza Page, where you not only can see the pizza, but get it delivered by the closest Little Caesars franchise.

If you need a drink to go with your pizza and live somewhere near Rochester, New York, check out the Coke machine(s) in the Computer Science House at the Rochester Institute of Technology. Maybe you can convince a resident to buy you a drink over the modem. Find out how (and why) at http://www.csh.rit.edu/proj/drink.html.

In the case of CGI, the common format is outlined by processes-server receives a request, script is executed, script reads in the data, script processes the data, script sends back output. At any step, elements that have been set (or can be set) by that particular step are in use.

The first step, when a request is sent through CGI, involves the server doing all the front-end work in gathering data for you. This step takes care of two things at once:

It puts the information into predefined holding areas.
It formats the data.

All you have to do is look in the storage areas, pick and choose what you want, and use it in your program. First, you need to know what information is stored where, and at that point, you encounter environment variables.

Environment Variables: Information for the Taking

When the Common Gateway Interface gathers information for you, the amount of information it gathers is extensive-not only the information that's directly related to your application, but information about the current state of the session environment, such as who's executing the program, where they're doing it from, and how they're doing it. In fact, more than a dozen distinct pieces of environment information are available every time a CGI application executes.

To store all this information, the CGI functions of the server place it all into your system's environment variables, allowing persistent global access to this data from anything that cares to take a look at it. Just like you might have a PATH or a HOME environment variable, now you have environment variables telling you what script is being executed and from where it's being executed.

So what gets set and why? Each one of the many pieces of information has its own purpose, and it may or may not be used by your application. So to make sure that the server doesn't skip anything that might be of use, it records all the information it can get its hands on. You've already been briefly introduced to a variety of environment variables in the previous chapter, but they can be broken down further so that you can look at what part of the process they assist in. Three distinct sets of environment variables exist, if grouped by purpose. The first one of these groups is called server-specific variables.

Server-Specific Environment Variables

When it records information, the server starts with itself. Server-specific variables, summarized in table 4.1, record information such as the port the server is running on, the name of the server software, the protocol being used to process requests, and the version of the CGI specification the server con-forms to.

Table 4.1 Server-Specific Environment Variables

Variable	Purpose
`GATEWAY_INTERFACE`	CGI version that the server complies with. Example: CGI/1.1.
`SERVER_NAME`	Server's IP address or host name. Example: www.yourhost.com.
`SERVER_PORT`	Port on the server that received the HTTP request. Example: 80 (most servers).
`SERVER_PROTOCOL`	Name and version of the protocol being used by the server to process requests. Example: HTTP/1.0.
`SERVER_SOFTWARE`	Name (and, normally, version) of the server software being run. Example: Purveyor / v1.1 Windows NT.

In general, the information provided by the server-specific environment variables isn't going to be of much use to your application because it almost always is the same. The real exception to the rule takes place when you have a script that can be accessed by multiple servers or by a server that supports virtual addressing-one server responding to multiple IP addresses. For instance, if your server is of a commercial nature, you might have one machine running several virtual servers on different IP addresses to provide for a unique server name for each customer.

The outline of each server-specific variable has already been shown in Chapter 3 "Designing CGI Applications," but for reference, the following is the output of part of a Perl script that you look at later. It serves just to echo the content of the environment variables in the order you examine them in this chapter-in this case, as they appear when the script echo.pl is run on a Windows NT system. You'll see the code for echo.pl a little later in the chapter, after you examine the different variables.

Gateway Interface: CGI/1.1

Server Protocol: HTTP/1.0

Server Name: bills.aimtech.com

Server Port: 80

Server Software: Purveyor / v1.1 Windows NT

After the server has the chance to describe itself to your program, it moves on to the meat of the information-the components directly related to the user's request.

Request-Specific Environment Variables

Unlike the information about the server, which rarely changes, the information for each request is dynamic, varying not only by which script is called but also by data sent and the user who sent it. At one point or another, all this information may be of use to a script you write, but three basic environment variables are always important to any script: REQUEST_METHOD, CONTENT_LENGTH, and QUERY_STRING. The latter two are used in different situations:

CONTENT_LENGTH is useful to POST requests for determining input size.
QUERY_STRING is the data passed when a GET request is used.

The combination of variables tells you how the request was sent, determines how much information was available, and can provide you with the information itself. Unless your script accepts no input, you'll be using these three variables quite a bit. Table 4.2 outlines these variables, as well as the other request-specific environment variables.

Table 4.2 Request-Specific Environment Variables

Variable	Purpose
`AUTH_TYPE`	Authentication scheme used by the server (`NULL` if no authentication is present). Example: `Basic`.
`CONTENT_FILE`	File used to pass data to a CGI program (Windows HTTPd/WinCGI only). Example: `c:\temp\324513.dat`.
`CONTENT_LENGTH`	Number of bytes passed to standard input (STDIN) as content from a `POST` request. Example: 9.
`CONTENT_TYPE`	Type of data being sent to the server. Example: `text/plain`.
`OUTPUT_FILE`	File name to be used as the location for expected output (Windows HTTPd/WinCGI only). Example: `c:\temp\132984.dat`.
`PATH_INFO`	Additional relative path information passed to the server after the script name, but before any query data. Example: `/scripts/forms`.
`PATH_TRANSLATED`	Same information as `PATH_INFO`, but with virtual paths translated into absolute directory information. Example: `/users/webserver/scripts/forms`.
`QUERY_STRING`	Data passed as part of the URL, comprised of anything after the ? in the URL. Example: `part1=hi&part2=there`.
`REMOTE_ADDR`	End user's IP address or server name. Example: 127.0.0.1.
`REMOTE_USER`	User name, if authorization was used. Example: `jen`.
`REQUEST_LINE`	The full HTTP request line provided to the server. (Availability varies by server.) Example: `GET` `/ssi2.htm` `HTTP/1.0`.
`REQUEST_METHOD`	Specifies whether data for the HTTP request was sent as part of the URL (`GET`) or directly to STDIN (`POST`).
`SCRIPT_NAME`	Name of the CGI script being run. Example: `echo.cgi`.

Of all these variables, REQUEST_METHOD, QUERY_STRING, CONTENT_LENGTH, and PATH_INFO are the most commonly used environment variables. They determine how you get your information, what it is, and where to get it, and they pass on locations that may be needed for processing that data. In the following sections, you look at them in an arbitrary estimation of how often they're used.

REQUEST_METHOD

When you try to determine how data has been sent to your application, the method of the request is the first thing you need to identify. If you're using a form, you can choose which data-sending method is used; if you're using a direct link such as <a href=/scripts/myscript.pl?data>, your script is invoked with the GET method.

Identifying REQUEST_METHOD is necessary for any application except one type-a program that requires no input. If your application is a random-link generator or a link to output a dynamically generated file that doesn't depend on what the user inputs, you don't need to know whether it was sent via GET or POST because your program doesn't require any input. It might want to read the other environment variables, but no input data exists to be parsed, just a semi-fixed output; the end result doesn't depend on any data from users, just their action of executing it.

Assuming that your CGI application is like many, though, getting the data from the link or user is the next thing on your list of processes. Then you need either QUERY_STRING or CONTENT_LENGTH.

NOTE

Other possible selections are available for the REQUEST_METHOD value besides just GET and POST, including DELETE, HEAD, LINK, and UNLINK. The use of these other values isn't as common, but in case you do encounter them, you'll want to provide a fall-back case for dealing with these other methods, as discussed in Chapter 3 "Designing CGI Applications."

QUERY_STRING

The data that's passed when using the GET method is normally designed to be somewhat limited in size, because QUERY_STRING holds all of it in the environment space of the server. When your application receives this data, it comes URL encoded. That means it's in the form of ordered pairs of information, with an equal sign (=) tying together two elements of a pair, an ampersand (&) tying pairs together, a plus (+) sign taking the place of spaces, and special characters that have been encoded as hexadecimal values. A sample from a form with multiple named elements might produce a full request that looks like this:


http://server.host.com/script.pl?field1=data1&field2=
data2+more+data+from+field2&field3=data3

The part that comprises QUERY_STRING is automatically chopped to include only that information after the question mark (?). So, for that request, the QUERY_STRING would be as follows:


field1=data1&field2=data2+more+data+from+field2&field3=data3

Interpreting this URL-encoded information is easy and just requires a parsing routine in your script to break up these pairs. In "Dealing with URL-Encoded Information" later in this chapter, you'll see how parsing can be done easily, and become a little more familiar with URL encoding.

CONTENT_LENGTH

When the POST method is used, CONTENT_LENGTH is set to the number of URL-encoded bytes being sent to the standard input (STDIN) stream. This method is useful to your application because no end of file (EOF) is sent as part of the input stream. If you were to look for EOF in your script, you would just continue to loop, never knowing when you were supposed to stop processing, unless you put other checks in place. If you use CONTENT_LENGTH, an application can loop until the number of bytes has been read and then stop gracefully. The formatting that will be read from the STDIN block follows the same URL-encoding methods of ordered pairs and character replacement as QUERY_STRING and can be parsed the same way.

NOTE

When considering what method (GET or POST) is best suited to your application, consider the amount of data being passed. GET relies on passing all data through QUERY_STRING and thus can be limited in size. For large amounts of data, the STDIN buffer has a virtually unlimited capacity and makes a much better choice.

PATH_INFO

Another thing that you can include in the URL sent to the server is path information. If you place this data after the script but before the query string, your application can use this additional information to access files in alternate locations.

For instance, if you have a script that might need to search in either /docs/november or /docs/december, you can pass in the different paths, and the server automatically knows the location of these files relative to the root data directory for your server. So if you use the URL http://www.xyz.com/scripts/search.cgi/docs/december?value=abc, the PATH_INFO would be /docs/decem-ber. The companion variable PATH_TRANSLATED can give you the actual path to the files based on PATH_INFO, instead of just the relative path. So /docs/december might translate on your server as /users/webserver/marketing/docs/december. Using this variable saves you the work of having to figure out the path for yourself.

Other Variables

In addition to the primary variables, some other data could come in quite handy in your application. Looking at each individual environment variable is a good idea because you'll become familiar with just what purpose the variable is designed for, as well as what other purpose you could find for it.

You'll automatically know where a user is calling you from because REMOTE_ADDR provides his or her IP address. In case your script forgot, you can see what its name is (by using SCRIPT_NAME). Path information can be passed to your program to reference data files in alternate locations, and you can see the full URL that led someone to the script (by using REQUEST_LINE). Whether you use the information is up to you, but it's there for the taking.

Client-Specific Environment Variables

Last but not least is information that comes from the software from which the user accessed the script. To identify these pieces of information uniquely, the variables are all prefixed with HTTP_. This information gives you background details about the type of software the user used, where he or she accessed it, and so on. Table 4.3 shows three of the most commonly used client-specific variables: HTTP_ACCEPT, HTTP_REFERER, and HTTP_USER_AGENT.

Table 4.3 Common Client-Specific (HTTP_) Environment Variables

HTTP_ Variable	Purpose
`ACCEPT`	Lists what kind of response schemes are accepted by this request
`REFERER`	Identifies the URL of the document that gave the link to the current document
`USER_AGENT`	Identifies the client software, normally including version information

The formats of these HTTP header variables look like the following:

HTTP_ACCEPT: */*,image/gif,image/x-xbitmap

HTTP_REFERER: http://server.host.com/previous.html

HTTP_USER_AGENT: Mozilla/1.1N (Windows, I 32-bit)

These variables open up some interesting possibilities. For instance, certain browsers support special formatting (tables, backgrounds, and so on) that you might want to take advantage of to make your output look its best. You can use the HTTP_USER_AGENT value, for example, to determine whether your script has been accessed using one of those browsers, and modify the output accordingly. However, because some browsers accessing your script may not set the HTTP_USER_AGENT field to a value you're expecting, make sure that you include a default case that will apply if you can't isolate what type of browser is being used.

In addition to the variables listed in table 4.3 are other HTTP environment variables, but you're much less likely to run into browsers that set these fields with any regularity until newer browsers integrate them and people then migrate to the newer browsers. For reference, though, table 4.4 shows some other client-specific environment variables that you may want to examine.

Table 4.4 Additional Client-Specific (HTTP_) Environment Variables

HTTP_ Variable	Purpose
`ACCEPT_ENCODING`	Lists what types of encoding schemes are supported by the client
`ACCEPT_LANGUAGE`	Identifies the ISO code for the language that the client is looking to receive
`AUTHORIZATION`	Identifies verified users
`CHARGE_TO`	Sets up automatic billing (for future use)
`FROM`	Lists the client's e-mail address
`IF_MODIFIED_SINCE`	Accompanies `GET` request to return data only if the document is newer than the date specified
`PRAGMA`	Sets up server directives or proxies for future use

NOTE

Not every browser fills out the same HTTP_ variables. If you make your application dependent on any, you can run into problems. Be sure to verify support of HTTP_ environment variables for the browsers you're concerned about.

If you want to know for sure whether a specific browser sets certain HTTP_ headers (because new versions and new browsers are always released), you can find out in two ways.

First, you can look at the survey of browsers located at http://www.halcyon.com/htbin/browser-survey. This list shows a large number of browsers, ordered by name and version, with an output page for each that shows the headers that they send.

If you have some browser that didn't make it onto that list, or you want to make your own survey, you need to write a script that checks the HTTP_ headers you're interested in. The script itself doesn't have to be complex, just echo back the environment variables that you're interested in and then access that script with the browsers you want to check.

The next section provides two short examples for performing these checks-one in Perl (version 4) and one in UNIX sh script. You also can use these scripts to check any environment variable you're interested in just by changing the variable names that are used.

Scripts to Check Environment Variables

Without too much work, you can write your own simple scripts to check for the existence of specific environment variables. Because all the environment variables are read into a program in the same way, it doesn't matter whether you're checking for a server-specific variable, a request-set variable, or even a client-set variable-the methodology is the same.

The scripts in listings 4.1 and 4.2 are simple cases for checking whatever variables you're interested in. They demonstrate how similar the functions are in two different scripting languages.

Listing 4.1 Checking Variables with a Perl 4 Script

#!/bin/perl
#A Generic Environment Variable checker
print "Content-Type: text/plain \n \n";
print "Browser Software: $ENV('HTTP_USER_AGENT') \n";
print "\n";
print "Originating Page: $ENV('HTTP_REFERER') \n";
#... and so on...

print STDOUT "<UL>\n";
foreach $var (sort keys (%ENV){
      print STDOUT "<LI>$var: $ENV{$var}\n";
}
print STDOUT "</UL>";

TIP

If you want to see all the environment variables, you can let Perl cycle through them for you, rather than have to identify each one uniquely. Not only is the code smaller, but you don't run the risk of mistyping or forgetting a variable you may have wanted to know about.

Listing 4.2 Checking Variables with a sh Script

#!/bin/sh
echo Content-Type: text/html
echo
echo Browser Software: $HTTP_USER_AGENT
echo Originating Page: $HTTP_REFERER

Dealing with URL-Encoded Information

After you find all the data you want and are ready to let your program do some interpretation and processing, you need to take the information and break it up into manageable parts first. To do that, you need to know how the data is formatted.

Encoding

As you learned previously, data is formatted in ordered pairs, regardless of where it goes: QUERY_STRING or STDIN. The benefit is that this pairing and replacement, called URL encoding, allows you to use a common routine to evaluate this data regardless of the method. All you have to be aware of are the reserved characters that are used as part of URL encoding and the format that's used to pass values representing those reserved characters for literal use (see table 4.5).

Table 4.5 Reserved Characters Used in Encoding

Character	Name	Purpose
`+`	Plus sign	Separates data
`=`	Equal sign	Joins named fields and their values
`&`	Ampersand	Strings together joined pairs
`%`	Percent sign	Denotes hexadecimal value to follow

Suppose that you want to send a plus sign as part of the data. Sending the literal character, like any reserved character, is out of the question. Instead, you send the hexadecimal value (the reason for having the % sign as a reserved value). To be passed correctly, a hexadecimal value is always formatted as %XX, where XX represents the hexadecimal value of a specific ASCII character. For instance, the value for the plus sign is passed as %2b. In the parsing routine, you need to check for the % sign and its two following digits and then use the functionality of your scripting language to change it back into a literal value for use.

Now before you worry about just how you're expected to deal with all this on the client side, you should know one thing: It's done automatically-you don't have to do a thing. The only possible exception is if you're creating an explicit link to a CGI program, such as the following:


<a href="/scripts/myscript.cgi?value1=abcde&value2=more+info>

Here, because you're setting up exactly what gets passed, you have to do the formatting yourself. This isn't too common, but occasionally you may want to use it for a dynamic process that can't take advantage of server-side includes.

Decoding (Parsing) Routines

Rather than do all the work of creating your parsing routine from scratch, you can use one of the multitudes of scripts available for general use in almost every language. The authors of these libraries and routines save you the work, which is always a benefit.

One of the more prolific libraries for Perl takes care of this work for you-cgi-lib.pl by Stephen Brenner. This library allows you to take an otherwise tedious task and make it simple. For instance, reading and parsing the input becomes as simple as


require 'cgi-lib.pl';
&ReadParse(*input);

You now have values in the variable array input that you can bend to your will. Behind the scenes, all the ordered pairs have been broken down in the subroutine ReadParse, and each individual pair has been assigned its own name as part of input and had its appropriate data value assigned to it. To get a better understanding of just how the code is working, look at the ReadParse source itself (in listing 4.3). Like many well-written pieces of code, it's already been commented by its creator, but in some places additional comments have been added off to the side for further clarification.

Listing 4.3 cgi-lib Source Code

# Source for CGI-LIB.PL, by Stephen Brenner:
# ReadParse
# Reads in GET or POST data, converts it to unescaped text, and
# puts one key=value in each member of the list "@in"
# Also creates key/value pairs in %in, using '\0' to separate
# multiple selections
# If a variable-glob parameter (e.g., *cgi_input) is passed to
# ReadParse, information is stored there, rather than in $in, @in,
# and %in.
sub ReadParse {
    local (*in) = @_ if @_;
  local ($i, $loc, $key, $val);
  # Read in text                    #Checks the data-sending method
  if ($ENV{'REQUEST_METHOD'} eq "GET") {
    $in = $ENV{'QUERY_STRING'};
  } elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
    read(STDIN,$in,$ENV{'CONTENT_LENGTH'});
         #Reads in CONTENT_LENGTH bytes of STDIN
  }
  @in = split(/&/,$in);       #Splits ordered pairs at the "&" sign
  foreach $i (0 .. $#in) {    #Processes ordered pairs
    # Convert plus's to spaces
    $in[$i] =~ s/\+/ /g;
    # Split into key and value.
    ($key, $val) = split(/=/,$in[$i],2); # splits on the first =.

    # Convert %XX from hex numbers to alphanumeric
    $key =~ s/%(..)/pack("c",hex($1))/ge;
    $val =~ s/%(..)/pack("c",hex($1))/ge;
    # Associate key and value
    $in{$key} .= "\0" if (defined($in{$key})); # \0 is the multiple
    $in{$key} .= $val;                         # separator
 }
  return 1; # just for fun
}

If you want to use cgi-lib.pl, the application has been provided for you on the CD-ROM accompanying this book.

Many other libraries are available in almost every scripting language. They're already in use by countless other users, so you even remove testing time from your already busy schedule by making use of code that already has the functionality you were looking to create. You can find these libraries with a search on CGI Library in most search engines, or use the list provided in Chapter 3 "Designing CGI Applications."

CAUTION

If you're planning to write your own parsing routine, be very careful in how you do it. Things such as limited buffer sizes and open-ended functions that could be used to execute things such as Perl eval statements can let someone into your system. When in doubt, don't let it through.

Use Your Header…

Just like header information accompanies the incoming data, a header must let the server and client know what kind of information is being sent back. Called a response header, it can be one of three different types: content-type, location, or status.

NOTE

When using headers in your script, you must separate the header from the body (if any) of your response with blank lines to make sure that it's interpreted correctly. Otherwise, you end up with a somewhat ordered mess instead of a correctly returned document.

Non-Parsed Headers

In certain cases, you may not want your application to rely on the server to process your program's response. Whether due to overhead or some special response that's easier to do outside your server's interpretation, the decision to use non-parsed header (NPH) files places a little more work on your shoulders.

To function as an NPH return, the output data from your program must contain a complete HTTP response. That means providing the HTTP version and status code, the general header, response header, entity header, and entity body. What does all that mean in plain English? Well, look at an example of NPH output, taken from the original NCSA CGI documentation, and add some comments to it:


HTTP/1.0 200 OK           #HTTP Version, Status Code
Server: NCSA/1.0a6        #General Header
Content-type: text/plain  #Entity Header
Text goes here...         #Entity Body

As you can see, it's pretty straightforward once the terms are cleared up. All the client really wants to know is what protocol the response conforms to, if there's a status message it needs to concern itself with (such as errors), what type of data it's receiving, and what the data is. The main restriction is that the file name of the CGI application must begin with nph- to specify that the server shouldn't parse the return.

There are no hard-and-fast rules as to when it's right to use NPH output. If it works for what you want to do, your server load is normally very high, and you feel like using it, that could make it a good candidate. For the most part, however, the CGI libraries and application samples that you'll come across prefer to place a little of that work on the server.

Content-Type Header

The most common response header is content-type, which tells the client software to expect some data of a specific type, based on the supported MIME (Multipart Internet Mail Extensions) types. These types are covered in detail later in Chapter 10, "Using MIME with CGI," and are outlined in table 4.6. One of the more common content types to be returned in your CGI application is text/html, meaning that you're sending back an HTML document, so it should be interpreted as one, with all tags and other elements converted for display.

Table 4.6 Common MIME Types

Type	Category
application	Application data, such as a compressed file
audio	Audio data, such as RealAudio
image	Image data, such as a counter
text	Text-based information, including plain and HTML
video	Video data (MPEG, AVI, QuickTime)

Location Header

If you were to create a random link program, you probably wouldn't want the results to come back as an HTML page with a URL link that says, "Click here to go to the link that has been selected randomly." You would want users to make one selection that says "Random Link," and automatically be taken to that link after selecting it. The same would hold true for some search programs in which you might have only one possible match, or if you have a page to return if a function fails. In any of these cases, your best bet would be to use a location header.

As the name implies, the location header specifies that the data you're returning is a pointer to another location, normally a full URL. It's in the format of Location: http://server.host.com/document.

TIP

A number of browsers support enhanced HTML formatting commands that you may want to take advantage of, but the commands may create formatting problems if the user doesn't have a browser with those particular enhancements. You can use HTTP_USER_AGENT to determine the type of browser client, and then redirect the user to an appropriately formatted page with the Location: header.

Status Header

The status header is the basic element for use in returning error codes. If you don't have specific pages to be used when returning an error, you can just use the built-in codes to let the server send back the error message to be interpreted by the client. Table 4.7 lists some common status codes.

Table 4.7 Some Common Status Codes

Code	Result	Description
200	`OK`	The request was carried out with no problems.
202	`Accepted`	The request has been accepted but is still being processed.
301	`Moved`	The document has been moved to a new location.
302	`Found`	The document is on the server but at a different location.
400	`Bad` `Request`	The request's syntax was bad.
401	`Unauthorized` (`AUTH_TYPE`).	The server has restrictions on the document.
403	`Forbidden`	The request was forbidden, due to access rights or other reasons.
404	`Not` `Found`	The request couldn't find a match (or your Perl script is missing a `;`).
500	`Internal` `Error`	The server unexpectedly failed to carry out the request.
502	`Service` `is overloaded`	The server can't process any more requests now.

NOTE

One of the most frustrating things to encounter when writing a CGI script in Perl is getting a 404 Not Found error instead of your expected output. When you encounter this, make a habit of double-checking your script for missing or misplaced semicolons (;), which Perl uses to terminate a line of code. Just one missing piece of punctuation can drive you crazy.

For a more complete list of status codes, see http://www.w3.org/hypertext/WWW/Protocols/HTTP/HTRESP.html.

Returning Output to the Users

After all the work you've done getting the data, interpreting it, processing it, and deciding what type of information you're going to send back, all that's left to do is send it. To do that, you'll need three things: a header, content, and a way to output it to the user.

You already know about headers, such as content-type, for specifying what kind of information you're returning. The data that your program sends back can be anything, but it just gets sent after the header and the rest is taken care of. The only remaining item is determining how the user will get the data back.

STDOUT

Just like you can read data sent to the standard input stream, you can send information back out through the standard output (STDOUT) stream to the waiting server. By default, your programming language of choice probably makes this process easy. Just pretend that you're going to print something directly to the screen, which is normally what STDOUT is, and the server takes care of the rest for you.

For instance, if you send back a header telling the server and client to expect HTML code or text, just send it back as standard text, as follows:

Perl:


Print "Content-type: text/html \n";
Print "\n"   #The blank line separates the Content from its header.
Print "<h1>Hello World. </h1> \n";

sh:


Echo 'Content-Type'
Echo  '<h1>Hello World. </h1>'

Suppose that your program outputs records from mailing-list requests or just a plain old log of who used the script. File output is accomplished by redirecting statements like the preceding ones. Perl uses file handles (OPEN MY_FILE, ">>\home\file1.txt"), whereas sh scripts can do a number of things by using > redirection:

Perl:


Print MY_FILE "Hello World. \n";

sh:


cat 'Hello World' >> myfile

Whatever output method you choose-whether it's a pointer to data somewhere else or data you send back yourself-after you send it to STDOUT, the rest of the work is done for you. The server and the client negotiate the connection and translation work to get what you sent to the client into the right place and in the form you specified.

File-Based Output

In certain instances, the result of a CGI program's execution is just the location of a file or the creation of an output data file. The latter of these occurs when the server has set the OUTPUT_FILE environment variable, which means that a server such as Win HTTPd is expecting to go out to a specific file name and read everything from there, rather than from STDOUT.

There's no real "trick" to dealing with these situations, unless you want to create the output file and then perform some subsequent operation on it: As soon as the file is there, the server reads that as a response and brings it into place. So be sure not to copy something to the final OUTPUT_FILE name until it's ready to be received by the server.

Gateway Interface:	CGI/1.1
Server Protocol:	HTTP/1.0
Server Name:	bills.aimtech.com
Server Port:	80
Server Software:	Purveyor / v1.1 Windows NT

`HTTP_ACCEPT:`	/,image/gif,image/x-xbitmap
`HTTP_REFERER:`	http://server.host.com/previous.html
`HTTP_USER_AGENT:`	Mozilla/1.1N (Windows, I 32-bit)