by Bob Breedlove
The Practical Extraction and Reporting Language (Perl) might be one of the best choices for Common Gateway Interface (CGI) scripting languages among the interpreted languages. It is certainly the most popular for a number of reasons. In this chapter, I examine Perl as a language for Internet applications.
Perl is an interpreted language optimized for scanning arbitrary text files, extracting information from these files, and printing reports based on that information. It is also a good language for many system management tasks. The language is intended to be practical-easy to use, efficient, and complete-rather than beautiful-tiny, elegant, and minimal. Perl was written by Larry Wall (lwall@sems.com), with the help of many other folks.
Perl combines some of the best features of C, sed, awk, and sh. People familiar with these languages should have little difficulty being productive in Perl. Perl's expression syntax is very C-like. Perl uses sophisticated pattern-matching techniques to scan large amounts of data very quickly. Although optimized for scanning text, Perl also can deal with binary data. If you have a program that would ordinarily use sed or awk or sh, but it exceeds their capabilities and you don't want to write the program in a compiled language such as C, then Perl may be for you.
Perl has many advantages as a CGI scripting language, as described in the following sections.
First, it is generally available on most server platforms, including most UNIX variants, MS-DOS, Windows NT, OS/2, and Macintosh. It also has the distinct advantage of low cost. It is often distributed free or for a small copying fee depending on the source from which you receive the package. Perl is also starting to be distributed with many operating systems or utility packages. For example, the Windows NT Resource Kit includes a copy of Perl.
Actually, Perl is distributed under the GNU CopyLeft agreement or under an "Artistic" license. These licenses have some differences in their requirements and rights assigned. But, basically, they enable you to execute Perl on your system(s), create and distribute Perl applications (scripts) and, if you want, gain access to the source code for Perl itself.
Perl is readily available from many sources, including any comp.sources.unix archive or comprehensive Perl archive network (CPAN) site. If you don't have it on your server or development machine, it is easy to obtain either as source code or precompiled binaries for many platforms. For those not on the Internet, Perl is available via anonymous uucp from both uunet and osu-cis. Also, it is often distributed with CD collections of utilities for UNIX platforms.
Perl is interpreted. This can be either an advantage or disadvantage, depending on your point of view. I discuss the disadvantages of interpreted languages in the next section. There are some definite advantages I should go over first.
One advantage of an interpreted language for script development is that you can perform incremental, iterative development and testing without going through a create/modify-compile-test-debug process (cycle). This can speed the development cycle drastically. An interpreted language also can be helpful if you are evolving your application by implementing it with minimal capabilities and adding advanced capabilities later.
The heart of programming CGI applications that produce Web pages is text processing. Perl is optimized for text processing, and therefore, it is very efficient at producing Web pages. For example, fields are passed to a CGI application as a string of URL-encoded (Universal Resource Locator or address) text. This means that given three variables (A, B, and C) with text values, for example, you would get something on standard input that looks like this:
B=value+of+variable+1&C=value+of+variable+2&C=value+of+variable+3c
The CONTENT_LENGTH environment variable is set to the length of this string (67). To read and decode this into variables that the program can use, the CGI application must read in the string for a length of 67, split the variable=value pairs at the & signs, and then change the + signs to spaces. Here is some Perl code to do just that:
read(STDIN,$in,$ENV{'CONTENT_LENGTH'}); @in = split(/&/,$in); foreach $i (0 .. $#in) { # Convert plus's to spaces $in[$i] =~ s/\+/ /g; # Split into key and value. ($key, $val) = split(/=/,$in[$i],2); # splits on the first =. # Convert %XX from hex numbers to alphanumeric $key =~ s/%(..)/pack("c",hex($1))/ge; $val =~ s/%(..)/pack("c",hex($1))/ge; # Associate key and value $in{$key} .= "\0" if (defined($in{$key})); # \0 is the multiple separator $in{$key} .= $val;
This code ends up with an associative array that can be referenced by variable name. Thus, when the CGI program needs the value of variable A, it can refer to
$workfield = $in{'A'}
Perl is also very good at handling text files that can be used for small databases. Because Perl uses as much memory as is available, it often can hold entire small files in memory in arrays. Because Perl variables are not strongly typed, arrays can hold combinations of numeric and alphabetic information. Here is an example of Perl code to open a comma-delimited file of the type typically produced by databases or spreadsheets and read it into a set of variables:
open(IN, "$inFile"); while(<IN>) { ($name, $age, $city, $state, $telephone) = split(/,/); ... } close(IN);
Perl also can assign dbm databases to associative arrays. This feature can make dbm databases appear as associative arrays and make them as easy to manipulate as an associative array that is not associated with an underlying database.
The code can create a dbm file and add, update, and delete records from this file with the same commands used to manipulate simple associative arrays. Here is code to open a dbm database and manipulate the records:
dbmopen(%mydb,"$filename",0644); # Add a record to the array with a key of "A" $mydb{'A'} = 25; # Get the value of a record with a key of "B" $value = $mydb{'B'}; # Delete the record with a key of "C" delete $mydb{'C'}; dbmclose(%mydb);
Perl can access C libraries to take advantage of much of the code written for this popular language. Utilities included with Perl distributions enable you to convert the headers for these C libraries into their Perl equivalents.
Perl has many specialized extensions, primarily for handling specific databases such as Oracle, Ingres, and Informix. These combine the strengths of the Perl language with the access to the host database.
ftp.demon.co.uk (158.152.1.44) is the official repository for the database <foo>perls in the following list, which can be found in /pub/perl/db/perl4/. The site is mirrored at ftp.cis.ufl.edu (198.17.47.33) in /pub/perl/scripts/db/.
Perl has the capability to read and write TCP/IP sockets. This gives it the capability to communicate with many servers of all types that rely on socket communication. It also enables you to write utility and "robot" programs in the Perl language. For example, you can use Perl's socket capability to write a robot program to automate site checking to verify the validity of links on your pages. This can be especially useful in keeping a site up-to-date, given the volatility of the Internet in its relative infancy.
Unlike many programming languages, Perl is designed to be practical rather than beautiful. Programming in Perl is relatively easy, especially if you have experience in C or another C-like language. Like many scripting languages, Perl reads its programs from the first line to the last. It doesn't require complex structures to create a program. It does, however, support subroutines or functions, and version 5.0 can be object-oriented.
As an example, the "Hello World" program in C is
void main() { printf("Hello World!"); }
In Perl, it is
print 'Hello World!';
The Perl interpreter has a built-in debugger that can help reduce the time it takes to debug applications. Because of the nature of CGI programs, however, you might not be able to use this debugger as extensively as you would with other applications.
Because Perl is very popular as a CGI programming language, there
is a lot of help out there. Newsgroup discussions are a good place
to start when you require help on Perl programming. There are
newsgroups devoted entirely to Perl and newsgroups devoted to
Web page creation in which the majority of the discussion is about
Perl. Here are some of them:
Newsgroup | Comment |
comp.infosystems.www.authoring... | Information about Web page authoring in general. |
Comp.infosystems.www.authoring.cgi | Information about general CGI programming. Because of the popularity of Perl for CGI, a majority of questions is about the use of Perl on various platforms and with various servers. |
Comp.infosystems.www.authoring.html | Information about the use of HTML. Some questions require programming in order to implement. |
Comp.infosystems.www.authoring.misc | Miscellaneous questions about Web authoring. Not as valuable as the specific newsgroups because it carries cross postings from other groups and duplicates much of the information. |
Comp.infosystems.www... | Information about the specific platforms. The groups covering servers can be valuable. |
Comp.lang.perl... | Information about Perl in general. Much of the discussion in the specific groups covers using Perl for utility purposes and also as a CGI scripting language. |
Comp.lang.perl.announce | Information about new modules for Perl programming. |
Comp.lang.perl | The main newsgroup about Perl. |
Comp.lang.perl.modules | Discussions of Perl modules. |
Comp.lang.perl.tk | Discussions of tk use with Perl. |
There are, of course, Web pages related to Perl. Check the newsgroups
for announcements about these pages. Here are just a couple I
have found:
URL | Comment |
http://www.perl.com/ | The Perl language home page. Links to Perl resources. |
http://www.eecs.nwu.edu/perl/perl.html | Northwestern University's Perl page. |
http://www.yahoo.com/Computers/Languages/Perl/ | Yahoo's Perl index. |
http://www.virtualschool.edu/mon/Perl.html | The "middle of nowhere" Perl archive (Netscape 2.0 pages). |
http://www.teleport.com/~rootbeer/perl.html | References with a special emphasis on using Perl for Web-related programming and learning Perl. |
Several Frequently Asked Questions (FAQ) lists are posted to the
Perl newsgroups. One of the best to start with is the Perl Meta-FAQ
produced by Neil Bowers (neilb@khoros.unm.edu). As you
would expect, this is a FAQ about FAQs. It's available at this
writing from the following sources:
HTML | http://www.khoros.unm.edu/staff/neilb/perl/metaFAQ/metaFAQ.html |
PostScript | ftp://ftp.khoros.unm.edu/pub/perl/metaFAQ.ps |
ASCII | ftp://ftp.khoros.unm.edu/pub/perl/metaFAQ.txt |
There are also several excellent books on programming in the Perl language. Most of these give you a good background in the language. Here are a couple of excellent titles from Sams Publishing:
Till, David. Teach Yourself Perl in 21 Days. Sams Publishing. ISBN: 0-672-30586-0, $29.99.
Teach Yourself CGI Programming with Perl in a Week. Sams.net Publishing. ISBN: 1-57521-009-6, $39.99.
Again, because Perl is so popular as a utility language, there are lots of examples of Perl modules out there. One of the best sources is available via file transfer protocol (FTP) from one of the CPAN sites around the world.
Following are the sites available at the time of this writing.
The master CPAN site is ftp://ftp.funet.fi/ (Finland,
Europe). Select the site nearest to you from the following list
to get the best response time and bandwidth:
Africa | |
South Africa | ftp://ftp.is.co.za/programming/perl/CPAN/ |
Asia | |
Japan | ftp://ftp.lab.kdd.co.jp/lang/perl/CPAN/ |
Taiwan | ftp://dongpo.math.ncu.edu.tw/perl/CPAN/ |
Australasia | |
Australia | ftp://coombs.anu.edu.au/pub/perl/ ftp://ftp.mame.mu.oz.au/pub/perl/CPAN/ |
New Zealand | ftp://ftp.tekotago.ac.nz/pub/perl/CPAN/ |
If you program for Perl 5, you might want to get a copy of the Perl 5 Module list maintained by Tim Bunce (Tim.Bunce@ig.co.uk) and Andreas Koenig (modules@franz.ww.tu-berlin.de). Here's a bit about the list from its introduction:
"This document is a semi-formal list of Perl 5 Modules. The Perl 4 concept of packages has been extended in Perl 5 and a new standardized form of reusable software component has been defined the Module. Perl 5 Modules typically conform to certain guidelines which make them easier to use, reuse, integrate, and extend. The list is posted to comp.lang.perl.announce and comp.answers on a semi-regular basis. It has two key aims:
This list includes the Perl 5 standard modules, other completed modules, work-in-progress modules, and would-be-nice-to-have ideas for modules. It also includes guidelines for those wishing to create new modules including how to name them."
Perl has few negatives as a programming language for producing Web pages, but there are some you need to know.
Perl is interpreted. Therefore, it is not as fast as compiled languages such as C or C++. Given the speed of modern CPUs, this does not make a significant difference in all but very large or time-critical applications. In fact, the interpreted nature of the language can reduce development time significantly by eliminating the time needed to compile and debug versions of the program.
The GNU license under which Perl is distributed is really pretty innocuous, but it might be a problem depending upon the type of application you are developing. If you intend to do either of the following, Perl is probably not the best language to choose:
Perl is used to develop many Internet applications and their supporting utility applications. I present some examples here, but Chapters 14, "The Perl Language," and 15, "Perl in Internet Applications," give you a broader understanding of the types of Internet programming that you can perform with the language.
As mentioned throughout this chapter, Perl is one of the most popular languages for creating CGI applications. There are literally thousands of examples of dynamic CGI programming in Perl. You can use Perl to create dynamic Web pages that can change depending on different factors, including which visitor is viewing them.
One of the most common uses of Perl on the Internet is processing form input. Perl is especially adept at this chore because most of this input is textual, Perl's strength.
Another popular use of Perl is the automated processing of Internet e-mail. Perl scripts have been used to filter mail based on address or content. Perl scripts also have been written to automate mailing lists. One of the most popular of these programs is Majordomo.
I have written a Perl script to automate my "What's New?" Web page. This script processes mail messages and adds them to my "What's New?" page. It also removes the entries from the page after they have been there for a specified length of time.
You can use Perl to automate the maintenance of Web sites. Because Web pages are little more than text files in a specific format, Perl is particularly adept at processing them. You also can use Perl's socket capability to contact other sites and request information using HTTP. There has even been a Web server written in Perl.
To check the links on a site, a Perl program must parse the site's pages starting with the main page, extract the URLs, and determine that these URLs are still active.
Several FTP clients are written in Perl. You can use Perl to automate file retrieval via FTP. Again, this combines the socket capability of Perl with its text-processing capability.
Only you can answer this question. The next few chapters give you a good foundation in the Perl language, which might help you decide if you want to use Perl for Internet programming. If you don't make it your main Web programming language, you might find that it becomes your utility language for the Web because of its versatility, ease of use, and popularity.