CGI Manual of Style: Ch03: Writing CGI Scripts

Chapter 3: Writing CGI Scripts

Choosing a programming panguage

Testing and debugging your script

Configuring the server

Calling your CGI script

Security

In the previous chapter, you learned how your CGI script receives data from the Web server and how to return your script's results to the Web browser. You now have the all the knowledge you need to begin writing useful CGI scripts. So let's get started.
This chapter lays the foundation that you need to write, debug, and run your CGI scripts. The first piece of this foundation is choosing a programming language in which to write your scripts. You learn what to look for when selecting a language and which languages are the most popular. You also pick up a few pointers on testing and debugging and common mistakes that you can avoid. After you have finished coding and debugging, you learn how to configure your Web server and how to call your CGI script. Finally, you learn a bit about security for both the data that is being transmitted and the system on which your script is running.

Choosing a programming language
Before starting your script, you need to choose which programming language to use. For most projects, choosing a language is largely a matter of preference. While die-hard UNIX gurus will rely on one of the UNIX shells, C, or Perl, a Windows user might prefer a DOS batch file or Visual Basic. The choice is up to you, but you must choose a language that will produce a program that is executable on the system hosting the Web server. This means that if your Web server is on an Apple Macintosh system, you cannot use a scripting language-such as a UNIX shell program-that does not run on that Apple Macintosh.
Although this is the only restriction, it should not be your sole consideration. You benefit from choosing a language that is familiar to you, that is commonly used by other CGI scripters, that is a good match for your specific system, and that can perform the operations needed to accomplish the objective for writing the script.

Common Languages
Most CGI scripts are written in AppleScript, C, C++, Perl, TCL, any UNIX shell, or Visual Basic. These are not the only languages in use, but they are by far the most common. Although you can use any language you wish, there are two good reasons to consider using one of the common ones.
First, it will be faster to program CGI scripts because of shared code. Many people on the World Wide Web have already written CGI scripts for common tasks. Several have made their scripts freely available for others to use. You can usually find these scripts by searching through the various script repositories on the Web. There is a list of many of these script archives in the "Script Archive" section of the Appendix. Form handlers, access counters, and shopping carts-which are explained with examples in Chapters 4 and 5-are just a few examples of the types of scripts to be found in these script archives. Most of the scripts even give you permission to alter the code to suit your particular needs. As you can imagine, most of these scripts are written in the common languages mentioned earlier, so you will be better equipped to use them if you know one of these languages. These script archives also contain many useful library routines for parsing form data and returning valid headers.
Second, if you use one of the common CGI scripting languages it is easier to get help with debugging. If you are having trouble with one of your scripts, you may need to get help. On the Internet, there are a few ways of posing questions and getting knowledgeable advice without having to pay large consulting fees. You can post questions and read responses in USENET newsgroups or mailing lists. If you are using one of the popular languages, more people will be able to help you with any problems that crop up.

A Fit for Your Platform
When choosing which programming language to use, you should take into consideration what platform it will run on. Most of the common languages are available for all of the platforms for which there are Web servers. The most obvious example of one that is not is AppleScript. Clearly, if your Web server does not run on an Apple machine, you shouldn't write your CGI scripts in AppleScript, even if that is your favorite language. However, if your Web server runs on an Apple Macintosh, AppleScript is a powerful choice because it allows easier interaction with other programs on the Macintosh than C or C++. So, choose a language that will work on your system, and make your job the easiest.

Appropriate for the Problem
Finally, make sure that the language you choose is appropriate for the task at hand. For simple form handling, any of the common languages work well, but when you start developing more complicated scripts, such as accessing a database, Perl is clearly a better choice than a UNIX shell. However, C is better than Perl when you need to do a lot of sorting very quickly, because C is compiled and Perl is interpreted. Keep in mind what you need to accomplish when selecting the programming language, and choose appropriately.

Testing and debugging your script
Like other programs, CGI scripts should be thoroughly tested before you make them available for the whole Internet to use. The simplest way to do this is by splitting the debugging task into two areas: the logic of the program and the interface. Testing the logic of the program is no different than testing a normal program. Does it do what you intended it to do? To find out, imagine that your CGI script is a regular program. Test it by executing it from the command line (or however you run your programs) and make sure it works correctly. Then, after the logic is correct, insert the CGI-specific code that interfaces with the Web browser and server-such as reading the environment variables and returning the correct header. To test the interface, you may want to use a combination of command-line execution (with Simulation of Environment Variables) and browser execution (running it from within the browser). Once you are ready to test the script from within the browser, you need to configure the Web server to run CGI scripts. This may already be done for you. If not, see the section "Configuring the Server" later in this chapter.

Command-Line Execution
After you have written your CGI script, you should try to run it from the command line. The command line is the old text-only terminal interface in which the user is prompted for text input and the computer responds to the commands the user types. Two examples of a command-line interface are the DOS shell and the UNIX shell. To run your script from the command line, just type the name of the script file at the command prompt and press Enter.
The main purpose of running your script from the command line is to catch any logical errors. At this point you can check for syntax errors and runtime errors as well as verifying that the program does what you want. To verify that your program does what it should, you may need to assign values to the environment variables to simulate running from a browser.

Simulate Variables
When you are debugging your code from the command line, the environment variables discussed in Chapter 2 will not be set. To verify that your program performs correctly, you may need to simulate the environment variables that would be set if you were running it from a Web server. You can do this easily by assigning values to the necessary environment variables in a section of temporary code in your script. After you are done debugging your script, just comment out or remove these lines of code.

Common Problems
One of the most common error messages received with CGI scripts is, "Server Error, document returned invalid header." This error is usually the result of not specifying a valid header to be returned to the Web server. Remember, all data returned to the server must have a valid header. The header has to be properly formatted with spaces and punctuation, as described in the previous chapter. Often the extra blank line between the Content-Type Parsed header line and the actual start of the data is missing. If you have double-checked your code and the proper header is being returned but you are still receiving the Server Error, check for an abnormal termination of your script. If the script "crashed" during execution, the Web server would receive the crash message instead of the header and would interpret this message as an invalid header.
On UNIX systems, CGI scripts often do not execute as a result of incorrect file permissions on the script file. The script file must not only be executable on the command line, but must be executable by the Web server. If the user ID under which the Web server is running is not the owner or in the group of the script file, the script must be "world-executable," which is a file permission status for files on UNIX systems. Check your file system and Web server documentation for more information about user IDs and file permissions.
For Windows systems using Perl, the most common problem is with the perl.exe executable. Unlike the UNIX system, all Perl programs on the Windows system must be executed running the Perl interpreter and passing the name of your Perl program file as a parameter. For example, to execute the file first.pl, you would type
c:\perl\perl.exe first.pl
at the command prompt of a DOS shell and press Enter. You could also type just
perl first.pl
if the path to the perl.exe file is in your PATH statement in your autoexec.bat file.
Knowing this, your first thought might be to execute your Perl CGI script by using a URL in the form
http://www.robertm.com/cgi-bin/perl.exe?first.pl
which will work. However, this can be a major security risk because your Perl interpreter is in a directory that is available to the entire World Wide Web. This security risk has been exploited on many machines.
There are two ways to overcome this security risk. One is to use a wrapper program, which is another program that will reside in the cgi-bin directory and call the Perl interpreter in a secure manner. A much easier solution, however, is to create an association. In Windows, an association is based upon the file name extension. For example, most Windows systems consider all files with the .txt extension to be text files that are opened with the Windows program Notepad. (In other words, .txt files are "associated" with Notepad.) Double-clicking on a file called readme.txt in a Windows environment will start the Notepad program and open the file readme.txt within Notepad. Similarly, you can create an association between the Perl interpreter and files with the .pl extension. This strategy allows you to leave your perl.exe program in another directory on your machine and call your CGI script with URLs in the form http://www.robertm.com/cgi-bin/first.pl. Consult your Windows documentation for instructions on how to create the association.
Finally, when executing the script from within your browser, always make sure that the script file is in the correct location for the Web server. The Web server must know that the file being requested is a CGI script. Otherwise, it will simply display the contents of the script file in the user's browser. On most systems, CGI scripts reside in a special directory, the cgi-bin directory. When the Web server receives a request for a file in the cgi-bin directory, it knows that the file is a script that it should execute. Many Web servers have an option that allows them to recognize files with the .cgi extension as script files. If this option is enabled, the CGI scripts can be in any directory under the document root, but must have a .cgi extension. For more information on the cgi-bin directory and the .cgi extension, see the next section.

Configuring the server
Web server software makes use of configuration files to store specifics about the Web site, such as the location of the documents (known as the document root), the port number on which to listen, and the location of CGI scripts. (There are many other settings in the server configuration files. Check the documentation for your server for a complete list of the settings.) If you have Web space on someone else's server (your Internet service provider's server, for instance), you need to consult that system's webmaster for details on where to place your CGI scripts. If you run your own server, check the documentation for details on how to specify a cgi-bin directory or enable the .cgi extension.

cgi-bin
Most Web servers store all CGI scripts in a single directory. This directory is called cgi-bin (sometimes it's called cgi-win for Windows machines). The cgi-bin directory can be anywhere on the system, as long as the Web server has access to it. Your Web server's configuration file must specify the full path name to the cgi-bin directory so the server can find it. Once the Web server is configured to recognize the cgi-bin directory, you execute your scripts by referencing the cgi-bin directory as if it were at the document root level. For example, suppose your domain is inter.net and your Web site is at http://www.inter.net/. To run the funtimes.pl script in the cgi-bin directory, you would type
http://www.inter.net/cgi-bin/funtimes.pl
The .cgi Extension
Most Web servers let you specify a file name extension (typically .cgi) to designate which files are CGI scripts. If you enable this feature on your Web server, you can store your CGI scripts in any directory on your Web site, not just in the cgi-bin directory. This allows you to keep your script files with the HTML files they work with. If the .cgi designation is recognized, you could call the funtimes.cgi script in the /football directory by typing
http://www.inter.net/football/funtimes.cgi
Calling your CGI script
By now you have learned how to test your script from the command line. You know that you need to configure your server, and you know where to put your CGI script files. Now you need to think about how you want to call your script from your HTML file. In most cases, what you want to accomplish dictates how you call the script. When you need to handle form input, for instance, you have to specify the name of the CGI script within the <FORM> tag. The other methods for calling your CGI script-in the anchor tag (<A>), in the image tag (<IMG>), and with Server Side Includes (more on these in the section "Calling a CGI Script in Server Side Includes" later in this chapter)-also require that you specify the name of the CGI script within the HTML tag.

Calling a CGI Script in the <FORM> Tag
If you are writing a script to handle form input, you place the name of the CGI script in the ACTION attribute of the <FORM> tag, as shown here:
<FORM METHOD=POST ACTION="/cgi-bin/formhandle.pl">
Note: It is assumed that you already know how to write the HTML code for a form. If not, consult the forms section of an HTML book. If you are already somewhat familiar with this topic, but need to refresh your memory, the HTML is presented briefly in Chapter 4.

Intuitively, the value of the ACTION attribute is what action the browser will perform when the form is submitted. You can use any valid URL as the ACTION for a form. When the form is submitted, the browser sends the Web server a request for that item. If the item is a CGI script, the script is executed and all of the data that was entered in the fields of the form are sent via the QUERY_STRING environment variable or standard input. For other URLs, such as an HTML or GIF file, the document is returned to the browser. In the preceding example, the browser is requesting a CGI script, formhandle.pl, which the Web server will execute. Here is the HTML code you would use if you were using the .cgi extension for your CGI scripts.
<FORM METHOD=POST ACTION="formhandle.cgi">
Calling a CGI Script in the <A> Tag
You can also call CGI scripts by assigning them to the HREF attribute of the <A> tag, like this
<A HREF="/cgi-bin/clicked.pl">Click Here</A>
or
<A HREF="clicked.cgi">Click Here</A>
Note: The rest of the examples in this book reflect scripts stored in the cgi-bin directory of the Web server. Keep in mind that you can implement the same scripts by using the .cgi extension if that option is enabled on your server.

When clicked, the links defined by the preceding lines of HTML code cause the Web server to execute the referenced CGI script (in this example, the clicked.pl or clicked.cgi script).
Recall from Chapter 2 that the value for QUERY_STRING is the user-provided information when the request method is GET. One way the user can pass this information via the GET method is to append a question mark followed by the data to the URL requesting the CGI program. Here is an example of appending the information to a CGI script request in the HREF attribute of the <A> tag.
<A HREF="/cgi-bin/clicked.pl?file=clicked.html">Click Here</A>
As the programmer, you can use this method to pass parameters to your program by URL encoding a string of name/value pairs on the command line. In the preceding example, the name of an HTML file is being passed to the clicked.pl script via the QUERY_STRING environment variable.
In most cases, this approach will not be very useful. Remember, the line of HTML is hard-coded in the HTML page, unless the page is generated dynamically (there will be some examples of this in Chapter 5). It will always be the same value, no matter who is clicking on the page. However, this approach can be useful in some applications, such as the shopping cart examples in Chapter 5.

Calling a CGI Script in an IMG Tag
You can also call CGI scripts from within the <IMG> HTML tag. If you place the path to a CGI script as the value to the SRC attribute, the Web server will execute the CGI script and return the output as the source for the image. Your CGI script must return output that is in a graphic format, either by directly returning the ASCII description of an image or by redirecting the Web server to the location of the graphics file. For example, here is a line of HTML code that calls the CGI script cgi-image.pl:
<IMG SRC="/cgi-bin/cgi-image.pl">
Here is the contents of cgi-image.pl, which simply redirects the Web server to a graphic file:
#!/usr/local/bin/perl

print "Location: /graphics/image1.gif\n\n";
As you probably know, this is not a practical use of the feature. You could achieve the exact same effect simply by referencing the image1.gif file in the SRC attribute of the <IMG> tag. However, being able to dynamically return an image file lets you create graphic images at runtime or vary the image file that is displayed.

Calling a CGI Script in Server Side Includes
Most Web servers can enable a feature known as Server Side Includes. Server Side Includes are Web server commands that reside within the HTML of an HTML document. When the Web server parses the HTML file, it executes all the Server Side Include commands and places the results in the place of the command. Server Side Includes allow you to include another item within your Web page simply by adding an HTML tag. These included items can be many different things, such as another HTML document, an image file, or a CGI script. All Server Side Include statements are in the form

To execute a CGI script from within a Server Side Include, you use the exec command. The exec command has two valid tags, cmd and cgi. The cmd tag executes the associated value with /bin/sh (the UNIX Bourne shell). The cgi tag calls the CGI script whose virtual path is the associated value. Here is the line of HTML code that executes and includes the output from the CGI script include-me.pl:

Security
Security is an important and sometimes overlooked issue with CGI scripts. As the author and user of CGI scripts, you must understand that both the data being transmitted and the machine the Web server is running on are vulnerable to unauthorized access.

Vulnerability of Data
Many CGI scripts are used in conjunction with form input. These scripts gather data from the person using the Web browser, interpret that data, and pass information back to the browser. The data en route to either the script or the browser passes through many machines as it travels across the Internet. Under most circumstances, someone with access to these other machines can view this information.
Most of the time, the data will pass between the browser and your script unnoticed and untouched. However, keep in mind that the information is vulnerable. For this reason, it is best not to send and receive sensitive data that you do not want someone other than the intended recipient to view. Several companies are working on providing greater security for data being transmitted on the Web. The Netscape Commerce Server and the Secure Mosaic Server are the only ones readily available at the moment. These are two versions of Web servers that enhance the security of the data while it is in transit.

Vulnerability of Your Server
When a Web browser requests a CGI script from your Web server, that script is run on your machine. Anytime someone else runs a CGI script on your machine, there is the potential for abuse. If you do not take precautions when writing your scripts, your machine may be open to invasion by unauthorized individuals.
For example, suppose you wrote a simple HTML page with a form for the user to enter his or her name. When the user submits that form, your CGI script receives the data the user entered and outputs it within an HTML page. On the surface, that seems harmless enough, and it usually is. If you have Server Side Includes enabled, however, this could be a major security risk. Suppose the user entered something like

Remember that the value of the cmd attribute is executed by /bin/sh on UNIX systems with the results being sent back to the browser. In this case, the browser could receive the contents of the password file for the system.
As long as you know that your server is vulnerable, you can protect your machine and still safely run your CGI scripts. In the previous example, you could place a line of code in your CGI script that checked for Server Side Include statements, alleviating that risk.