by Michael Erwin
This chapter covers the tools of CGI programming and the options you have as a CGI programmer. Today you have more options than ever before. As you'll see, you need to choose your programming tools wisely. The tools you have available are just like hardware tools; some work for specific tasks better than others (for example, you wouldn't use a hammer to cut a piece of wood). The same concept applies to CGI tools. Certain tools work extraordinarily well for specific tasks.
One of the first things you'll realize is that a huge amount of CGI code is already out there, and you'll want to use some of these scripts. Why would you want to build a CGI when there's already one that will work for you? For that reason, before you write a complex CGI application, you should look at the contents of the accompanying CD-ROM, which contains most CGI scripts used in this book. If you don't find a script on the CD that you can use for your application, you'll want to visit the following URL:
http://www.yahoo.com/Computers_and_internet/Internet/World_Wide_Web/Programming
This area of Yahoo contains numerous links to a wide variety of CGI tools and script libraries for just about every platform known-and then some.
When choosing your tools, think about what you already know. For example, if you already know how to use a power saw, why would you want to use a hand saw to cut a piece of plywood? However, you would have to use the hand saw if you didn't have electricity for the power saw.
The same thing applies to CGI. If you use your service provider's Web server, you'll need to look at what CGI tools and languages the service provider will allow you to use. For example, if your service provider is using a Windows NT-based server, you can't use UNIX shell CGI scripts.
This chapter introduces you to the following topics:
Interpreted languages are programming languages for which you don't have to create a compiled binary file in order to execute the program. Interpreted scripts are written in simple ASCII text files. For these languages to be executed on a computer, they require the use of a program called an interpreter. These languages rely totally on this interpreter to perform their programmed tasks. Interpreted scripts can be as simple as a list of operating system commands, also known as batch programs. Some interpreted languages, such as Perl, may require that you have a compiled interpreter program on your system.
Although interpreted CGI scripts are just plain ASCII text files, they're totally different from an ASCII HTML file. Interpreted script programs tell the interpreter what task to have the computer perform, whether it's as complex as searching a database or as simple as clearing the display screen.
The HTML file, on the other hand, tells the client's browser how to display a simple text file. Think of the browser as the interpreter of the HTML file.
For the most part, interpreted languages are generally easy to learn and use, widely available, and portable-because they can be run, more or less, on different operating systems. This also makes them great tools for CGI programming.
AppleScript is a scripting language for Apple Computer's Macintosh System 7. An English-like language, AppleScript is much more than just a batch programming language; it lets you write programs that automate and interconnect your program with other existing Mac programs, such as the Finder. AppleScript is very similar to HyperTalk.
One of the best features of AppleScript is its natural language syntax. It makes CGI applications easy to build, understand, and maintain. Compared with the complexity of compiled language programming such as C or C++, Apple-Script programming is now probably the fastest and easiest way of writing CGI scripts for the Macintosh.
AppleScript is great for handling small to medium CGI projects and works great for searching text files and manipulating data. These scripts can be very efficient and effective for producing CGI applications on Macintosh-based Web servers (see fig. 2.1). Because AppleScript programs are normally small, efficient, and fairly easy to learn, you might want to consider learning AppleScript programming if your Web server is running on a Macintosh. AppleScript is one of the most widely used CGI languages on the Mac; however, you also have the choice of C, HyperTalk, and Perl.
Figure 2.1 : This HTML interface was written in AppleScript. Notice the use of multiple languages.
In addition to working only on the Mac platform, AppleScript's System 7 (or higher) requirement makes this scripting solution unattainable to many. But for Mac users, AppleScript is readily available and inexpensive to add to System 7, and it's included with System 7 Pro and System 7.5. Perhaps most important, because AppleScript is widely used by Macintosh Webmasters, you'll find a number of solid CGI applications written in AppleScript available on the Web and on the CD-ROM that accompanies this book. You also can look at Chapter 24, "Tips and Techniques for AppleScript," which deals entirely with AppleScript programming.
Shell scripts are great for small and simple projects and for searching text files. They also can be very efficient and effective for CGI. All UNIX-based servers will have some type of shell scripting language available. Because shell scripts are small, efficient, and fairly easy to learn, you might want to consider learning shell programming.
Most computers connected to the Internet have been running some flavor of UNIX from the start. Because UNIX has been part of the Internet for so long, you'll find solid Web-server software for UNIX. You'll find a huge variety of well-documented and solid UNIX shell-based CGI scripts on the Web and on the CD-ROM that comes with this book.
All flavors of UNIX have an important common user interface called the UNIX shell. The UNIX shell is just another program that runs on a UNIX-based computer. In most UNIX systems, the shell will be abbreviated to just sh. Because it's just a program running on a UNIX-based computer, it has been modified and updated over the years.
There are several different flavors of shells. The standard shell on most UNIX systems is the Bourne shell, named after its creator, S.R. Bourne. Another popular Bourne shell derivative is the Bourne Again shell, or bash. One of the other popular flavors of shell is the C shell, or csh, which has a syntax that looks like the compiled programming language C.
The best way to think of UNIX shell programming is to liken it to batch programming, in which the programmer creates a text file of shell commands in the order they are to be processed. The programmer can also pass data to and from the shell with standard input (STDIN), standard output (STDOUT), and environment variables.
Using UNIX shell scripts for CGI programming allows you to take advantage of other programs already on UNIX-based systems. For example, a small Bourne shell script can take the output of an existing UNIX command named cal and generate a simple HTML page that contains a calendar (see fig. 2.2).
Figure 2.2 : A shell script created this output of the UNIX cal command.
UNIX shell scripts pose a security risk because shell scripts are basically a list of UNIX shell commands and links to other programs. This risk can be compounded if very many people are going to be placing shell-based CGI scripts on your Web server.
If you're using a UNIX-based Web server, UNIX shell programming is the fastest and currently one of the easiest ways to get started with simple CGI programming. One of the best places for support on UNIX shell scripts is the Usenet newsgroup comp.unix.
Perl is one of the greatest utility languages that has come along in years. Created by Larry Wall, the acronym PERL stands for Practical Extraction and Report Language. As the name implies, Perl was originally intended for handling data and creating reports from that data. Over the past few years, Perl has evolved into a complete programming language. Originally made available for UNIX systems, Perl has since been ported to Amiga, MS-DOS, OS/2 Warp, VMS, Windows NT, Window 95, and Macintosh. One nice thing about Perl is that it's available for free. It's also included on the CD-ROM for various platforms.
Because you probably already have access to the Internet, support for Perl can be found in various online locations. One of the best places is the Usenet newsgroup comp.lang.perl, where you can actually see messages from the author of Perl himself.
Perl overcomes some of the deficiencies of C and UNIX shell programming. Perl can perform most of the tasks that a C program can but with less effort. You'll not only be able to perform most of the same tasks more easily in Perl, you'll also be able to understand what Perl's doing because of the language syntax. Compared with UNIX shell programming, Perl is much more capable and complete. It can even handle UNIX system commands.
With Perl, you can create additional tools that you can use within other Perl programs. This is somewhat like using libraries in C. However, unlike C, you don't need to know how to compile the Perl program. On UNIX systems, you run the Perl program just as you do a shell program. On other operating systems, you need to run a Perl interpreter and tell it the name of the Perl script. You do all this in one command line-for example, by entering perl wwwstat.pl (see fig. 2.3).
NOTE |
You can tell most Perl programs by their .pl extension, although you don't have to use these extensions. Many of the more experienced Net users don't use standard extensions on the CGI applications. This may allow others to exploit security holes in CGI languages. |
TIP |
If you're already familiar with Perl, you may want to consider using MacPERL on Macintosh-based Web servers. Also, some programmers find it easier to test Perl scripts on the Mac before moving them to UNIX systems. |
Programs written in Perl allow you to address certain security issues. Perl can check the variables to be passed on to other programs for security breaches. This feature, handy for UNIX systems, allows you to prevent some dangerous program execution. With Perl, you can even trace the data flow to determine whether the data came from an insecure source.
When trying to learn Perl, in most instances you can simply look at the Perl code or script and be able to tell what it does. Because it's so easy to understand, you can find many excellent CGI scripts written in Perl. You can also modify an existing Perl script that's close to what you want to exactly what you need.
With Perl, you can handle very complex data structures and emulate various data formats. You can even use hashed tables in the form of associative arrays. Variable names can be as long as you want, as can the lines in a Perl program. UNIX shell scripts and C/C++, on the other hand, have a line-length limit.
TCL (pronounced "tickle") is another simple interpreted language that's needs to be addressed here because it's used on several UNIX systems for CGI applications. It's perfect if you plan to remain with a specific system for a while. TCL is slowly growing as a popular CGI programming language but is far from being the top contender.
TCL requires that you know C because TCL is basically a library of C programming language procedures. You'll be surprised how easily it is to learn TCL-however, it's not as easy to learn as Perl or AppleScript. But you'll get more speed out of compiled TCL programs and gain some security by making the compiled TCL program's internal workings unavailable.
TCL isn't as fast as compiled native C applications, but you can create applications fairly quickly. You can even create graphical applications more quickly than with C or C++. To create most GUI TCL applications, however, you'll want to use an extension to TCL, called TK. TK is Tool Kit for TCL, sometimes referred to as TCL/TK. Using both TCL and TK will allow you to create X Window programs quite easily, which allows you to prototype applications quickly. See figure 2.4 for an example of a TCL CGI application.
Figure 2.4 : This HTML document shows the interface to an underlying TCL CGI script.
TIP |
Because TCL is much deeper than just a companion to C, you may want to check out TCL's FAQ at http://www.sco.com/Technology/tcl/tclFAQ/. |
Portability of TCL applications is kind of a double-edged sword. TCL applications can use direct system calls, which tie the TCL application to a specific system. To make the application portable, therefore, the programmer needs to avoid using native system calls. TCL isn't available for as many operating systems as Perl or C/C++, but it is available for Macintosh, MS-DOS, and most UNIX platforms.
Two great things about using a compiled language for writing CGI applications are the speed and size of the finished product. Compiled languages achieve this speed and small size by a process called compiling. After you code for a compiled language, you take your finished source code and process it through a program called a compiler, which takes your code and generates a stand-alone native binary executable.
This leads to one of the pitfalls of compiled languages. You have to compile your source code on a compiler written specifically for each operating system and hardware platform on which you plan to run that program. Therefore, if you write a compiled language program for an Intel-based computer running some flavor of UNIX, for the most part you can run only that compiled program on that exact system. If you decide to run the same program on an Intel-based computer running another operating system, such as Windows NT, you'll have to recompile your program for the new system. A current exception to this is Java-based applications. Even then, Java programs still need to have a native interpreter for each platform, which may or may not exist.
Compiled languages also offer you a sense of security for your CGI applications. By using a compiled language for your CGI applications, if other system users or hackers do manage to acquire your scripts, they can't see the internal workings of your scripts or modify them. This is very important to programmers who want to keep prying eyes out of their CGI application's source code.
The compiled language known as C has been around since 1971. Like UNIX, C was developed at the Bell Labs. In fact, the C language was developed to write the UNIX operating system. C's predecessor was an earlier computer programming language called-you guessed it-B. Even after all these years, C is one of the most popular procedural languages today.
The standard for C programs was originally developed by Brian Kernighan. To make the language more acceptable internationally, an international standard was developed, called ANSI C (ANSI stands for American National Standards Institute).
Over the years, C has become a widely used language for many professional programmers. C has high-level constructs within it. It produces efficient programs, and virtually every computer platform has a C compiler available.
One problem of writing CGI applications in C is that the language doesn't handle strings very well. You normally have to get creative in handling and manipulating long strings, which will be a real problem in some cases. If your CGI applications will be handling character and string data, you'll need to juggle that data around to get it converted from one form to another-but it can be done. Therefore, if your CGI application is going to be handling a large amount of strings, you might want to consider using another programming language. However, if you become accomplished with C programming, you'll have access to a powerful tool. Figure 2.5 shows an example HTML interface to a C-based CGI script.
Figure 2.5 : This HTML interface is for a C-based guestbook CGI script.
One other drawback of C is its lack of decent error detection or debugging. It's so poor, in fact, that many beginning C programmers give up learning. If you can get through this part of C programming, however, it can produce big payoffs. Why? Because after you learn the rules of C programming, you can bend them. You can't bend the rules with many programming languages. If you do this properly and carefully, you can write some really powerful C programs.
Another popular compiled programming language is C++, which is based on the C language. An object-oriented programming language, C++ is an entirely different programming language from C and an entirely different approach to writing programs.
The advantage of programming in C++ is that parts of the source
code are reusable in other C++ programs, which increases the speed
of program development. The reusable parts of C++ programs are
known as classes. You can link several classes together
with additional source code to create a totally different program.
This capability to reuse programming source code and classes is
partially where the term object-oriented programming (OOP)
comes from. By using OOP, you assemble various objects, pieces
of source code, or classes to build other pieces of source code.
As a matter of fact, you will find several CGI related C++ libraries,
and C++ CGI scripts on the accompanying CD-ROM.
NOTE |
Object-oriented programming was created as a reaction to problems encountered with large programs. It's much easier to write new programs by assembling existing pieces of other programs. |
Object-oriented programming leads to somewhat of a problem. If you're used to programming in procedural languages such as Pascal or even COBOL, you will need to learn a new way of thinking. Being able to create reusable classes is an art form all to itself.
Many operating systems and hardware platforms have C++ compilers available for them. Therefore, you could probably use some C++ CGI scripts written for UNIX-based systems with OS/2 or Windows NT Web servers with little or no modification to the C++ source code. As an example, figure 2.6 shows the HTML interface to a C++-based CGI script that was moved from C to C++, and then ported from UNIX to OS/2 Warp. Also, differences exist between the compilers for the same operating system, and some commercial compilers are better than others. By having these various compiler options, you gain flexibility in writing your CGI applications. As with all programming languages, however, there's always a tradeoff. In this case, the tradeoff is that, currently, a lot of C++ CGI scripts aren't publicly available, but this is changing.
Figure 2.6 : This is an HTML interface to the C++-based User Site CGI script.
Another advantage of C++ is that most of the programs you've written in C will work in C++. C++ may handle the job better because it offers you alternatives for handling the job.
C++ handles strings better than the C language, but you'll still
need to get creative in handling and manipulating long strings.
If your CGI applications will be handling large amounts of character
and string data, you still might want to consider another CGI
programming language, such as Perl.
What's in a Name? |
You may be wondering where C++ got its name. In C, you can use the ++ operator to increment a variable. For example, I++ means increment the variable I by one after it's referenced. The designers of C++ thought it was simple-"one better than C"-and so named it. |
Visual Basic, also known as VB, is a programming language system for Windows 3.x, Windows 95, and Windows NT. Like Perl, Visual Basic grows with your needs and experience. You can create everything from simple CGI program applications such as Web page-hit counters (as shown in fig. 2.7) to advanced, enterprise-wide client/server-based SQL CGI applications, many examples of which can be found on the CD-ROM that comes with this book.
Figure 2.7 : Visual Basic was used to produce the page-hit counter in this example.
Another benefit of using Visual Basic is that it takes advantage of the latest three-tier client/server capabilities. The foundation of this programming system is object linking and embedding (OLE), Microsoft's open object model. VB offers you one of the world's largest and fastest-growing object libraries you can use and reuse in your programs. This translates into vast amounts of great CGI scripts already written in VB.
Another flavor of Visual Basic is Visual Basic for Applications. VBA includes an integrated database engine and data controls for easily developing links to other database programs. This is a nice feature, although it comes with a high price: VBA programs can be very CPU intensive, and, on an underpowered server, performance can be devastatingly slow.
Because Visual Basic can handle fairly complex links to database programs, you can juggle strings and perform text manipulation easily inside databases. VB is, in my humble opinion, the second strongest programming language for text and data juggling. Only Perl is stronger at this.
You can use VBA to create a complex client/server CGI application that supports data access to local and remote databases. You could create a secure sales-marketing tracking system, for example, to be accessed by your sales team scattered around the world. This could be an alternative to implementing a proprietary system using something such as Lotus Notes.
Writing VB-based CGI applications requires a couple of considerations. First, the CGI program can be executed, or run, only on a Windows-based system running on an Intel-based hardware platform. However, as Windows NT becomes widely used on other CPU-based hardware platforms, such as the DEC Alpha RISC processor, expect this to change. If you're using a Windows-based Web server, you'll find a wide variety of CGI applications available for VB on the Web.
The other consideration to writing your CGI applications in Visual Basic is that not too many commercial Web space service providers are now using Windows-based Web servers. This information is based on the current trend of Web servers, in which many of the new and fairly powerful Web servers require you to run the software on a Windows NT platform. So even if your Web service provider is running only a UNIX-based server, look for it to add at least a development Web server running Windows NT.
Because many Webmasters are exploring Windows NT as an alternative to UNIX for their Web servers, Visual Basic is gaining fast on Perl as the #1 programming language for CGI applications. As experienced Webmasters become comfortable with Windows as a viable server platform, they'll port many of the existing Perl scripts to VB or VBA. This will dramatically increase the existing base of VB and VBA CGI scripts.
Many of you may have heard or read about Sun Microsystem's Java
and JavaScript. These are the new golden children or "killer
apps" of the industry. Because they're so new, many things
are still to be decided about them, such as their syntax, features,
options, and even their existence. CGI programs or applications
written in these new languages are the rave of the Net and are
still being developed.
NOTE |
Several of the commercially available C++ compiler companies are also working on variations of their software tools to work with Java. You should see new commercial Java compilers becoming available and being refined over the next few years. |
The big thing with Java, JavaScript, and Microsoft's newly announced Visual Basic Script (VBScript) is that programs written in these languages run on the client's side. Java applications actually run "in" the client's browser. The browser simulates a platform-specific virtual compiler within the client's browser. In the cases of JavaScript and VBScript, the browser becomes the program's interpreter.
By saying "on the client's side," I mean that these applets are actually downloaded to the client's computer. Then the applets are executed when the browser receives all the code sent from the Web server. This makes Java applets very different from other compiled CGI applications that actually run totally on the Web server side.
This "client-side" execution has payoffs for you as a CGI developer. One payoff is that you rely on the computing power of the platform at the other end to actually run the program, which frees up your Web server to move on to process additional requests. Another payoff is that the client side is available to preprocess forms and data, and then send just the results back to your server. For example, the client could validate the form data, perhaps ensuring that the e-mail address has a valid format, before sending the information back to the server. This creates a true client/server relationship by spreading the computing or processing load to the various computers.
Java, a new programming language developed by Sun Microsystems, allows you to create self-contained programs, known as applets, that aren't tied to any specific hardware platform or operating system. You'll find several Java applets on the CD-ROM that comes with this book.
The language was originally developed in the early 1990s by a team of programmers at Sun Microsystems as a user-interface programming language called Oak. Oak was supposed to revolutionize how everyday consumers interacted with ordinary electronic devices. Then an amazing thing happened-no one bought or used Oak. It floundered as a user-interface language. In 1994, Sun started to adapt Oak to be used for the Internet. By the first part of April 1995, Oak was renamed Java.
To be able to run Java applets, you need to have another program that actually runs, or interprets, the Java programs for your specific hardware platform or operating system. This interpreter, originally called HotJava, allowed everyone-especially the people of Netscape Communications-to see the potential power of Java-based applications. On May 23, 1995, Netscape licensed Java from Sun, which started the whole Net community buzzing about Java.
Java is now aimed at changing the way a user interacts with HTML and the Web servers. In figure 2.8, Netscape's Navigator 2.x, a Java-compatible browser, an example of an actual interactive spreadsheet created by Java. One of the other great examples Sun Microsystems has on its Java Web site is one in which several Java applets were used to create a real-time scrolling stock market ticker marquee and real-time graphs in the HTML document (see fig. 2.9). Sun Microsystem's Java language Web site is located at http://java.sun.com.
Figure 2.8 : Here, Java is being used to create an actual interactive spreadsheet.
Figure 2.9 : Sun used several Java applets to create this scrolling stock market ticker marquee.
Because Java is object-oriented, you can create class libraries
of Java code that can be used by the entire Net, if you want.
Think of Java as a slightly different flavor of C++, in that you
can have and make various class libraries, modules, objects, and
routines. However, it differs from C++ in that Java applications
don't depend on the operating system or hardware platform you
created and compiled it on. This gives Java its true power. This
capability to be hardware-independent makes Java inherently stronger
than C++ and Visual Basic.
NOTE |
Java isn't completed as a programming language and probably won't be for quite some time. Java, like HTML, is still evolving rapidly, which has caused a couple of implementation problems with it. At the time of this writing, one of the biggest problems with the rapid development of Java is that not all hardware platforms have a Java-compatible browser available for them; an example is Apple's Macintosh. I have a feeling this problem will be resolved soon. |
One item of concern with Java has to do more with Java programmers than with the language itself. Because the browser has to wait while downloading the Java applet before it can run the applet, considerable delays can result for the user. Depending on how big the Java binary is, it could take a long time to see the first results of the applet. This problem could and should be handled by writing Java applets that take advantage of Java's preemptive multithreading capability. You can find tips and techniques for Java in the Usenet newsgroup comp.lang.java.
JavaScript, by Netscape, is another newcomer to the area of CGI programming. A small cross-platform, lightweight scripting language, JavaScript is loosely based on Java and can be considered a partner scripting language to Java. JavaScript basically fills the void between HTML extensions, Java applications, and true CGI applications.
JavaScript allows you to embed a standard ASCII text script directly into your HTML documents. The embedded JavaScript commands will be interpreted and run by JavaScript-enabled browsers. When a JavaScript-compatible browser encounters the program, it then interprets and executes the program.
JavaScript can't be considered an actual CGI language because it runs entirely within the client's browser. However, JavaScript does have the potential of helping CGI applications by preprocessing information entered into a form. In fact, it's possible to create a CGI application that takes information from form-inputted data and create a custom JavaScript application to send back to the user.
Visual Basic Script, or VBScript, is another new and exciting CGI scripting language that compares favorably with Sun Microsystem's JavaScript. VBScript, written by Microsoft to compete with Netscape's JavaScript, is another lightweight scripting language. VBScript also allows inline scripting with HTML pages.
VBScript will provide scripting, automation, and customization capabilities for enabled Web browsers. VBScript is a simple subset of Visual Basic for Applications (VBA) but is fully compatible with VB and VBA. This compatibility gives VBScript a powerful and experienced programmer base to build on.
Automation of OLE is another benefit of VBScript. VBScript can be used to manipulate the browser and other OLE-enabled applications on the desktop through an API (application programming interface). Perhaps most importantly, it can be used to set properties and methods on OLE controls and OCX files, and even help control applets created with Java that are contained within an HTML page. This would open up a wide area for CGI programs to link directly into existing Windows applications on the client's computer. You could write code to start up a user-spreadsheet software, insert data into the sheet, and then create a custom graph for it.
You also could turn this scenario around. Suppose that an expense report is on the user's computer. The VBScript code could launch the corresponding application (for example, Lotus 1-2-3), and then 1-2-3 could load the expense report worksheet and export certain fields back to the Web server running another CGI application. The CGI application then could generate the sales/marketing department expense report totals for management. This could be made invisible to the user and be invoked simply by having the user request a specific page.
VBScript enables developers to write Visual Basic code that lives within the HTML document. You already know that HTML documents have tags that define such things as heading levels, font attributes, basic text controls, inline images, and other features. Web browsers can also use helper applications to handle additional file formats, such as video and sound. Currently, it's not known what VBScript's CGI performance degradation will really be; however, all indications would have it seem less than the performance hit encountered with Java. Microsoft will be implementing VBScript as a DLL, so you should see some nice speed resulting from that decision.
When a VBScript-enabled browser encounters the <SCRIPT>
tag, it calls VBScript to compile and run the code. Unlike Java,
VBScript and JavaScript code is represented as regular ASCII text
within the HTML document. The VBScript code is interpreted and
compiled while the browser is downloading it from a Web server.
NOTE |
At the time of this writing, Netscape had not licensed VBScript and had not given any indication of including VBScript into the company's product lines. On the other hand, Microsoft and others have licensed Java from Sun Microsystems. |
Now that most of your options of CGI programming tools and languages
have been covered, you may be asking yourself, "Which language
is best for me and for my environment?" The following table
shows you an overview of your options. This table covers some
of the operating systems the more popular CGI tools are written
for.
Language | |||||
AppleScript | |||||
UNIX Shell | |||||
C/C++ | |||||
Visual Basic | |||||
Perl | |||||
TCL | |||||
Java | |||||
JavaScript | |||||
VBScript |