Chapter 12 CGI and the Internet

by Bob Breedlove

CONTENTS

Pseudo Code
The CGI Program
Summary

A Common Gateway Interface (CGI) program interfaces with the HTTP server to receive requests from the client browser, perform specific processing, and return information to the user in the form of MIME-encoded documents, most typically Hypertext Markup Language (HTML) pages.

As discussed in the previous chapter, CGI programs can perform any type of processing either directly or indirectly by accessing servers such as database servers. Whether the CGI program performs all the processing itself or is simply the input/output part of a more complex system, the basic concepts are the same.

This chapter presents a basic CGI program to illustrate CGI processing. You can write CGI programs in any number of programming languages. I chose Perl for this illustration for several reasons: It runs on several platforms, it is relatively C-like, and it is easy to modify and run for your own experimentation.

Pseudo Code

Because the actual processing performed by a CGI program depends on your application, it is difficult to define how the bulk of your CGI application will look. This little program doesn't do much useful work (you'll have to decide how to do that), but it does illustrate the basics of CGI programming. Here is the basic pseudo code for the application:


Initialize application

if client requests form then

       Send the client form

if client returns form then

       Determine required processing

       Format appropriate reply

       Send Reply to server

Terminate application

I'll examine each part of this simple application to discuss elements of CGI programming. More detailed and complex programs are presented in the sections on specific programming languages in Part III, "Internet Scripting Languages."

The CGI Program

The Perl program is called showcgi.pl. The method used to execute this program depends on your environment and server software. However, the most typical way to execute the program includes the following basic steps:

Copy the script to the executable directory for your server (typically cgi-bin under the HTTP base subdirectory).
Indicate that the program is executable and set the location of your Perl interpreter on the first line of the script.
Some servers might require that you change the name of the program to meet some standard.

The entire program follows:


#!/usr/bin/perl

####################################################################

# Demonstration program for CGI

####################################################################



$http_action = 'http://www.myhost.com/cgi-bin/showcgi.pl';



if ($ENV{'REQUEST_METHOD'} eq "GET") {

    print "Content-type: text/html\n\n";

    print <<EOF;

<H1>Simple Test Form</H1>

<HR>

<FORM ACTION="$http_action" METHOD=POST>

Field1: <input name="field1"><br>

Field2: <input name="field2"><br>

Check : <input type=radio name="cbx1" value="1"> Yes

<input type=radio name="cbx1" value="2"> No <P>

<input type="submit"><P>

</form>

EOF

    ;

} else {

    &ReadData();



    print "Content-type: text/html\n\n";

    print "<H1>Results</H1>";

    print "<HR>";

    print "<H2>Variables</H2>";



    print "<B>GATEWAY_INTERFACE</B>: $ENV{'GATEWAY_INTERFACE'}<BR>\n";

    print "<B>SERVER_NAME</B>: $ENV{'SERVER_NAME'}<BR>\n";

    print "<B>SERVER_SOFTWARE</B>: $ENV{'SERVER_SOFTWARE'}<P>\n";

    print "<B>SERVER_PROTOCOL</B>: $ENV{'SERVER_PROTOCOL'}<BR>\n";

    print "<B>SERVER_PORT</B>: $ENV{'SERVER_PORT'}<BR>\n";

    print "<B>PATH_INFO</B>: $ENV{'PATH_INFO'}<BR>\n";

    print "<B>PATH_TRANSLATED</B>: $ENV{'PATH_TRANSLATED'}<BR>\n";

    print "<B>SCRIPT_NAME</B>: $ENV{'SCRIPT_NAME'}<BR>\n";

    print "<B>QUERY_STRING</B>: $ENV{'QUERY_STRING'}<BR>\n";

    print "<B>REMOTE_HOST</B>: $ENV{'REMOTE_HOST'}<BR>\n";

    print "<B>AUTH_TYPE</B>: $ENV{'AUTH_TYPE'}<BR>\n";

    print "<B>REMOTE_USER</B>: $ENV{'REMOTE_USER'}<BR>\n";

    print "<B>REMOTE_IDENT</B>: $ENV{'REMOTE_IDENT'}<BR>\n";

    print "<B>CONTENT_TYPE</B>: $ENV{'CONTENT_TYPE'}<BR>\n";

    print "<B>CONTENT_LENGTH</B>: $ENV{'CONTENT_LENGTH'}<P>\n";

    print "<B>HTTP_ACCEPT</B>: $ENV{'HTTP_ACCEPT'}<BR>\n";

    print "<B>HTTP_USER_AGENT</B>: $ENV{'HTTP_USER_AGENT'}<P>\n";



    print "<HR>\n";

    print "<H2>Variables Found</H2>\n";

    print "<B>Raw Input:</B> $in<p>";

    print &PrintVariables(%in);

}



sub ReadData {

    local (*in) = @_ if @_;

  local ($i, $loc, $key, $val);



  # Read in text

  if ($ENV{'REQUEST_METHOD'} eq "GET") {

    $in = $ENV{'QUERY_STRING'};

  } elsif ($ENV{'REQUEST_METHOD'} eq "POST") {

    read(STDIN,$in,$ENV{'CONTENT_LENGTH'});

  }



  @in = split(/&/,$in);



  foreach $i (0 .. $#in) {

    # Convert plus's to spaces

    $in[$i] =~ s/\+/ /g;



    # Split into key and value.

    ($key, $val) = split(/=/,$in[$i],2); # splits on the first =.



    # Convert %XX from hex numbers to alphanumeric

    $key =~ s/%(..)/pack("c",hex($1))/ge;

    $val =~ s/%(..)/pack("c",hex($1))/ge;



    # Associate key and value

    $in{$key} .= "\0" if (defined($in{$key})); # \0 is the multiple separator

    $in{$key} .= $val;



  }

}



sub PrintVariables {

  local (%in) = @_;

  local ($old, $out, $output);

  $old = $*;  $* =1;

  $output .=  "<DL COMPACT>";

  foreach $key (sort keys(%in)) {

    foreach (split("\0", $in{$key})) {

      ($out = $_) =~ s/\n/<BR>/g;

      $output .=  "<DT><B>$key</B><DD><I>$out</I><BR>";

    }

  }

  $output .=  "</DL>";

  $* = $old;



  return $output;

}

Well, it's not much of a program. In fact, it is just slightly more complex than the classic "Hello World" program found in many C programming texts. Perl is especially suited to CGI programming because it performs excellent text processing, which is the heart of this type of programming.

NOTE

To keep with tradition, here is a Perl "Hello World" CGI program.

#!/usr/bin/perl

################################################################

# Demonstration program for CGI

################################################################

require '/www/cgi-bin/cgi-lib.pl';

print &PrintHeader;

print <<EOF;

<HTML>

<HEAD>

<TITLE>Hello World!</TITLE>

</HEAD>

<BODY>

Hello World!

</BODY>

</HTML>

<<EOF

;

This program simply produces a form and displays some information in response to the user's input. An actual CGI application would have more complex processing, but this chapter isn't about the actual programming of the application-specific logic. Instead, it's about the CGI interface side, which is relatively simple.

The program shows much of the decoding logic to illustrate the techniques. In real applications, you will probably use a precoded library for this basic processing. See the specific programming language chapters in Part III for information on these libraries.

Initialization

A CGI program is started from scratch each time the server receives a request for its URL. This means that the program must establish its processing environment each time it is called. The CGI interface does not require any specific initialization. However, your application might need to perform application-specific initialization before processing the CGI request.

Before calling the CGI program, the server places labeled information into environment variables and places other specific information on the command line. The server also communicates with the CGI program through its standard input (stdin) and output (stdout) file handles. In your specific language implementation, you might need to initialize the standard input and output before you use them. Perl does not require initialization of stdin and stdout because it writes to these by default, so only program-specific variables must be initialized at this point.

In showcgi.pl, you need only to set the variable $http_action to the script name. This practice allows the program to perform more complex processing by returning an action that includes a QUERY_STRING or PATH_INFO entry. For example, suppose your program needs to store information about the user between transactions. The program stores the information in a file whose name is a generated serial number and then passes it back on the ACTION= parameter of the FORM tag. Here is the Perl code for this type of processing:


$HTTP_ACTION = "http://www.myhost.com/cgi-bin/myprog.pl"

# Processing occurs here . . .



if ($ENV{'REQUEST_METHOD'} eq "GET") {

    &File = "$serialnumber"

    print <<EOF

    <HTTP>

    # more of your form here

    <FORM ACTION="$HTTP_ACTION?$File" METHOD=POST>

    # the remainder of the form

EOF

    ;

} else {

    $File = $ENV('QUERY_STRING'};

#    The remainder of POST processing here

}

# The remainder of the program here

When this program is first called, the URL does not contain the filename. When the user fills out the form and presses the Submit button, the file information is returned in QUERY_STRING and can be used to retrieve the configuration file.

Determining the Request Method

Next, the CGI program must determine what the user is requesting and perform that action. Typically, the first request to a CGI program causes the program to send a form to the user. Once the user has filled out the form, he clicks a Submit button, which returns the information from the form for the CGI program to process.

The CGI program can use several methods to identify the user's request. The most straightforward method is to check the REQUEST_METHOD environment variable. The default request method is GET. That is, if the user simply clicks the URL, the server returns the GET method to his program in the REQUEST_METHOD variable. The POST method is returned when the Submit button is coded with the METHOD=POST parameter. The code to test these is a simple IF statement:


if $ENV{'REQUEST_METHOD'} eq "GET" {

    Format and return the form

} else {

    Process the user's information

}

Note

Perl provides the %ENV associative array as a method to access the environment variables passed to the program. Of course, for more complex processing, you can use combinations of request methods and other techniques to distinguish among multiple processing paths

Sending the Form

Sending a form involves writing the HTML in the correct format to the CGI program's standard output. This information is processed by the server and passed more or less intact to the browser. Remember, the program must write a short header so that the server knows what type of document it is processing. The line in a Perl program to send an HTML page is


print "Content-type: text/html\n\n";

Other headers are determined by the server software. Remember to return a blank line formed by two end-of-line characters (\n\n). Depending on your operating system, these are either carriage return/line feed pairs or simple line feeds.

After the program has sent the header, simply write the document to the standard output in its native format. For HTML, this format is plain ASCII text with HTML markers. The following code fragment illustrates the use of this statement:


# Program processing to this point . . .

     print "Content-type: text/html\n\n";

     print <<EOF;

<H1>Simple Test Form</H1>

<HR>

<FORM ACTION="$http_action" METHOD=POST>

Field1: <input name="field1"><br>

Field2: <input name="field2"><br>

Check : <input type=radio name="cbx1" value="1"> Yes

<input type=radio name="cbx1" value="2"> No <P>

<input type="submit"><P>

</form>

EOF

     ;

# Additional program processing below . . .

In this code fragment, the program prints the header and then prints all the information between the print <<EOF statement and the line containing only EOF. Note the FORM tag specifically:


<FORM ACTION="$http_action" METHOD=POST>

The METHOD= option specifies the method that is returned when the user clicks the Submit button. This particular form sends the POST method. Figure 12.1 shows the form produced by this simple script.

Figure 12.1 : The form produced by showcgi.pl.

Note that the Submit button doesn't have to be labeled "Submit."


<input type="submit">

It's the TYPE= parameter that counts, not the label on the button. You can change the label to anything you want. Note, also, that a more complex screen would probably also include a reset button, which allows the user to clear the screen:


<input type="reset">

This can be useful if the user uses the Back button on his browser to clear the screen.

Receiving Information from the User

The user fills out the form you send through the server and presses the Submit Query button. This causes the browser to extract the information and send it back to the server. This time, the server sends the POST action in the REQUEST_METHOD environment variable.

The CGI program needs to process the data received with this POST request. The key to receiving this information is to parse the input string and decode it correctly. The subroutine ReadData() does this for my little program.


sub ReadData {

    local (*in) = @_ if @_;

  local ($i, $loc, $key, $val);



  # Read in text

  if ($ENV{'REQUEST_METHOD'} eq "GET") {

    $in = $ENV{'QUERY_STRING'};

  } elsif ($ENV{'REQUEST_METHOD'} eq "POST") {

    read(STDIN,$in,$ENV{'CONTENT_LENGTH'});

  }



  @in = split(/&/,$in);



  foreach $i (0 .. $#in) {

    # Convert plus's to spaces

    $in[$i] =~ s/\+/ /g;



    # Split into key and value.

    ($key, $val) = split(/=/,$in[$i],2); # splits on the first =.



    # Convert %XX from hex numbers to alphanumeric

    $key =~ s/%(..)/pack("c",hex($1))/ge;

    $val =~ s/%(..)/pack("c",hex($1))/ge;



    # Associate key and value

    $in{$key} .= "\0" if (defined($in{$key})); # \0 is the multiple separator

    $in{$key} .= $val;



  }

}

It is not my intention to teach Perl in this chapter. That is left for Section 4, "Perl." You can use this same technique to interpret the input in whatever language you use, so examining the process has some merit.

The first thing to do is read the text. This task varies depending on the method (GET or POST). The following short if statement accomplishes the task:


if ($ENV{'REQUEST_METHOD'} eq "GET") {

    $in = $ENV{'QUERY_STRING'};

  } elsif ($ENV{'REQUEST_METHOD'} eq "POST") {

    read(STDIN,$in,$ENV{'CONTENT_LENGTH'});

  }

If the request method is GET, the information is contained in the QUERY_STRING environment variable. Simply get the information from this variable into the local variable ($in). If the request method is POST, the process is only slightly more complex. The information is read from standard input. However, the server is not required to place an end-of-field marker of any kind on the data stream. Instead, the length of the data to read from standard input is placed in the CONTENT_LENGTH environment variable. The following line accomplishes this read:


read(STDIN,$in,$ENV{'CONTENT_LENGTH'});

You can use a similar function in C to accomplish this.

After the program has the data, it must decode it. Say that the input from your form was the following data:


Field 1: Field 1 Input

Field 2: Field 2 Input

Cbx1 : YES Checked

The input stream would look like this code:


field1=Field+1+input&field2=Field+2+input&cbx1=1

The CONTENT_LENGTH variable would contain 48.

To process this information, the program has to parse the input into VARIABLE=VALUE pairs. It then has to decode the text VALUE pairs. Finally, it places the information into program variables for processing. The following Perl code accomplishes this for my program:


@in = split(/&/,$in);



  foreach $i (0 .. $#in) {

    # Convert plus's to spaces

    $in[$i] =~ s/\+/ /g;



    # Split into key and value.

    ($key, $val) = split(/=/,$in[$i],2); # splits on the first =.



    # Convert %XX from hex numbers to alphanumeric

    $key =~ s/%(..)/pack("c",hex($1))/ge;

    $val =~ s/%(..)/pack("c",hex($1))/ge;



    # Associate key and value

    $in{$key} .= "\0" if (defined($in{$key})); # \0 is the multiple separator

    $in{$key} .= $val;

I won't spend time on the Perl code, which is explained in Section 4. However, notice that Perl's arrays and powerful text-handling features make it especially adept at this type of processing.

System Processing

At this point, your instructions have been retrieved and decoded so that the function of your program is determined. Your program performs the heart of its processing at this point. My "little program" does very little, but the processing that can be done in CGI scripts is limited only by the language and the APIs available to you.

One caution is that your program shouldn't spend too much time doing its work. Browsers have time-outs and if your program takes too long, the browser will think that the transaction has failed and will return a time-out message to the user.

Sending a Reply to the User

After your program finishes processing the request, it can return the new document to the user. Again, your program can choose to send one of many document types. The process is essentially the same as sending out the original document. The following code fragment accomplishes this for my program:


print "Content-type: text/html\n\n";

print "<H1>Results</H1>";

print "<HR>";

print "<H2>Variables</H2>";



print "<B>GATEWAY_INTERFACE</B>: $ENV{'GATEWAY_INTERFACE'}<BR>\n";

print "<B>SERVER_NAME</B>: $ENV{'SERVER_NAME'}<BR>\n";

print "<B>SERVER_SOFTWARE</B>: $ENV{'SERVER_SOFTWARE'}<P>\n";

print "<B>SERVER_PROTOCOL</B>: $ENV{'SERVER_PROTOCOL'}<BR>\n";

print "<B>SERVER_PORT</B>: $ENV{'SERVER_PORT'}<BR>\n";

print "<B>PATH_INFO</B>: $ENV{'PATH_INFO'}<BR>\n";

print "<B>PATH_TRANSLATED</B>: $ENV{'PATH_TRANSLATED'}<BR>\n";

print "<B>SCRIPT_NAME</B>: $ENV{'SCRIPT_NAME'}<BR>\n";

print "<B>QUERY_STRING</B>: $ENV{'QUERY_STRING'}<BR>\n";

print "<B>REMOTE_HOST</B>: $ENV{'REMOTE_HOST'}<BR>\n";

print "<B>AUTH_TYPE</B>: $ENV{'AUTH_TYPE'}<BR>\n";

print "<B>REMOTE_USER</B>: $ENV{'REMOTE_USER'}<BR>\n";

print "<B>REMOTE_IDENT</B>: $ENV{'REMOTE_IDENT'}<BR>\n";

print "<B>CONTENT_TYPE</B>: $ENV{'CONTENT_TYPE'}<BR>\n";

print "<B>CONTENT_LENGTH</B>: $ENV{'CONTENT_LENGTH'}<P>\n";

print "<B>HTTP_ACCEPT</B>: $ENV{'HTTP_ACCEPT'}<BR>\n";

print "<B>HTTP_USER_AGENT</B>: $ENV{'HTTP_USER_AGENT'}<P>\n";

print "<HR>\n";

print "<H2>Variables Found</H2>\n";

print "<B>Raw Input:</B> $in<p>";

print &PrintVariables(%in);

As you can see, formatting an HTML page is nothing more than including a set of statements that write to standard output. The subroutine &PrintVariables(%in) formats the variables parsed and decoded by the ReadData subroutine to produce a "pretty" output. Figure 12.2 shows part of the page produced by these print statements.

Figure 12.2 : Print statements to reply to the user.

Note that you can use showcgi.pl to show the environment variables passed by the server to the CGI program. You might want to try it on your installation.

Summary

Those are the basics of CGI processing. Your programs will be more complex depending on the processing that they must do and the document types they return-but the basics will be the same, straightforward processing as presented in this chapter.