Chapter 13

Custom Database Query Scripts

by Matthew D. Healy


CONTENTS

By using CGI scripts to access a d, you can provide worldwide, cross-platform access to your data via the Web while your database helps protect your data from corruption and provides efficient querying. It's not easy to build a good Web/database application, but the results can be well worth the effort. As you'll see in this chapter, a Web gateway to a database can be a powerful way to circumvent some major disadvantages of conventional client/server technology.

In this chapter, you'll learn

Fundamental Design and Security Issues

Figures 13.1 and 13.2 show how information flows between the various programs that together constitute a complete Web/database application. Each layer hides many internal details from the other layers and presents the layers above and below it with standardized protocols. This information hiding is the great advantage a Web front end has over conventional client/server systems: The numerous, widely distributed users need nothing but standard Web browsers on their computers. The Web server and the database server could reside on one machine, or they could reside on different machines, in which case your CGI program talks over a network to the database server.

Figure 13.1 : This schematic of Web/database interaction shows how information flows between the programs that make up the application when the database server and the Web server are on the same machine.

More complex possibilities exist. One application running at my site talks to several database servers, integrating information from a local database with several remote databases. Indeed, users may not even realize how many different computers are cooperating to answer their queries; with a well-written database application, all a user needs to know is what information to request. Your job as a Web/database developer is to build tools that allow users to find information without their knowing or caring where the information is actually stored.

Why Build a Web/RDBMS Interface?

Building a Web interface to a relational database management system, or RDBMS, isn't simple. You need to become familiar with relational database technology and CGI scripting, and you also need to understand how these two technologies can be integrated with each other.

Figure 13.2 : This schematic shows how information flows between the pro-grams that make up the application when the database server and the Web server are on different machines.

You may also need to do much more "roll-your-own" work than you would with a conventional client/server design because you'll need to polish the rough edges off the available tools for Web gateways to relational databases. Many of the most powerful products are freeware or shareware, and the commercial products are all very new. Just about every database vendor now has some kind of Web offering, but most are either in beta or just barely out of beta. In short, if you do Web/database development, you'll be on the cutting edge.

So why do it? For one thing, the Web is now a "hot" field, so you may feel a compelling desire to be in the vanguard. You may also see the neat tricks others make their Web sites do and wonder how you can duplicate such effects.

I believe there are also some very good technical reasons for putting databases on the Web. You can combine the strengths of RDBMS technology for building robust data repositories with the strengths of the Web for providing distributed, cross-platform remote access to your database. In my view, the main advantages of a Web/RDBMS interface are as follows:

Of course, most of these advantages are shared by any Web application, whether it uses the simplest of flat-file structures or a high-end relational database engine.

What's Meant by Relational?
A relational database management system, or RDBMS, stores all data in the form of tables, each of which contains one or more rows, which each contain one or more columns. A search of a relational database can, and often does, retrieve data from more than one table.
Much of the art of designing a relational database application lies in assigning fields to tables in such a way as to minimize the number of places where a given data value is stored within the database. There is a very powerful mathematical theory of relational database design, based on set theory, to guide the designer in avoiding the most common pitfalls.
The essential principles of relational database design were set by Dr. E.F. Codd in 1969-1970. Familiarity with these principles is no substitute for careful planning, of course, but if you're familiar with them, you're much more likely to create a robust database that can be modified to meet future needs.
Formally, what I call a table in this chapter is known as a relation; what I call a row is known as a tuple; what I call a column is known as an attribute; and what I call a primary key is known as a unique identifier. But the precise technical definitions of these terms are well beyond the scope of this chapter.

Let's talk a little about the reasons for using a relational database. I'll admit at the start that a relational database forces you to do a lot of work before you have anything to show for it. With a simple flat-file database, you can get up and running much faster. The extra work involved in setting up a relational database does bring significant benefits, especially for large and/or complex databases.

One major advantage of a relational database over a flat-file approach is that, because a relational database has dramatically less redundancy than a flat-file database, it will have far fewer inconsistencies.

Suppose that Jane Smith's address appears 14 times in a flat-file list of orders from your customers. Some records might list Jane's street address as 123 Elm Street, Apt. 507; others as 123 Elm St. #507; and still others as 123 Elm #507. (Take a look at a week's worth of your own snail mail!) Now suppose that Jane moves. How can you be totally certain you'll update every copy of her address-without changing the 17 copies of Janet Smith's address in your database?

If your database isn't very large, you can check for such inconsistencies manually, with the help of the searching and sorting features of your flat-file database. However, if you have hundreds or thousands of customers, it'll be nearly impossible to keep a flat-file database consistent. With a relational database you store a fact, such as a customer's address, in just one place; you then use the capability to search multiple tables with one query to cross-reference the table of customer addresses with other tables whenever an address is needed.

I won't try to teach relational design in this chapter, because that's beyond the scope of this book. Many excellent books on relational database design are available; I'd recommend practically anything by C.J. Date, David McGoveran, Joe Celko, George Tillman, and Andrew Warden (pseudonym for Hugh Darwen; some of his works have appeared under both names at different times). Any large technical library or bookstore should have works by most of these authors. At a more advanced level, the writings of E.F. Codd, who invented the relational model, will reward prolonged study. If more RDBMS programmers read Codd's works with care, the RDBMS implementations now on the market would be far better.

After reading what these people have to say, I hope you'll share my passionate conviction that the relational model is both an elegant mathematical formalism and a powerful tool for solving real-world database problems in a manner that no other approach can match. In the long run, a flat-file database that gets heavy use will likely create more trouble than you avoided by not doing a good relational design from the start.

SQL and its Dialects
SQL has become the lingua franca of relational database work. Starting as a research project called SEQUEL at IBM in the mid-1970s, it has grown in popularity over the intervening decades. Today it's by far the most important database query language with probably hundreds of implementations in use. ANSI has published several official standards-in 1986, 1989, and 1992-and is now working on a new version informally known as SQL3.
As is typical with a "standard," every implementation that derives from one or another ANSI standard dialect of SQL has its own extra features deemed useful by those who built it, which is something of a mess. Fortunately, in a typical CGI application, you'll mostly be using a rather small subset of SQL that's found-in very nearly the same form-in all commonly used dialects of SQL.

Limitations of HTTP in a Database Context

With a CGI front end to an RDBMS, you have multiple programs cooperating to accomplish a task: The remote browser is a client to your HTTP server, but the CGI script launched by the HTTP server is itself a client to the RDBMS server. Furthermore, the HTTP protocols are stateless-the browser connects to the server, sends one query, and waits for the reply. There's no concept of a current connection, so there's no simple mechanism for keeping client state information around on the server.

For most searches, you can get around the statelessness of the server by keeping all client state information on the client. Suppose that a user has just done a search that returns the list of employee records shown in figure 13.3.

Figure 13.3 : In this sample query results screen, each item in the results list is a hot link that performs another search.

Encoded in the URL behind each hot link in a list like that in figure 13.3 is sufficient information for the CGI script to perform a database lookup that will return more details to the user. This technique-sending program-generated HTML that contains enough state information to perform the next search-is known as dynamic HTML. The URLs listed in a search results screen, such as the one in figure 13.3, will typically look something like the following:


<a href=/cgi-bin/EmpSrch?id=503">Tom     Anderson</a>
<a href=/cgi-bin/EmpSrch?id=229">Mike    Johnson</a>
<a href=/cgi-bin/EmpSrch?id=507">Steve   Jones</a>
<a href=/cgi-bin/EmpSrch?id=917">Sarah   King</a>
<a href=/cgi-bin/EmpSrch?id=467">Susan   Moore</a>
<a href=/cgi-bin/EmpSrch?id=327">John    Wang</a>

id is the primary key on the employees table that's to be searched. (By primary key, I mean the database designer has set things up so that a given key value uniquely identifies a single record.)

How does this allow the database server to display a detail view without any information beyond that in the URL? Well, recall from Chapter 4 "Understanding Basic CGI Elements," that when an URL contains a question mark, the part following the question mark is passed to the CGI program as the environment variable QUERY_STRING. The SQL code generated in response to a QUERY_STRING containing a primary key value of 503, like the first sample URL above, might look something like listing 13.1.


Listing 13.1  Sample SQL Query Generated by Clicking a Hot Link
select
     employees.first_name,
     employees.last_name
     employees.salary,
     employees.startdate
depts.name,
     depts.locid
     emp_dept.empid,
     emp_dept.deptid
from
     employees,depts,emp_dept
where
     employees.id = 503
     and depts.id = emp_dept.deptid
     and employees.id = emp_dept.empid

Chapter 17, "Person-to-Person Interaction," shows other examples of how HTML generated in response to each request is changed so that the next request will have the correct state information. Database applications will need to make particularly heavy use of such techniques.

TIP
Look at the bottom of any Yahoo (http://www.yahoo.com) search results screen for an elegant example of how URLs can pass information from query to query. The links under the heading Other Search Engines will automatically search other databases.

When you're updating a database, the use of a CGI interface presents some serious difficulties if more than one user is allowed to make changes to the database. If you aren't careful, you can have the following sequence of events:

With a conventional client/server database system, each user maintains an active connection with session-specific state information. If a user has downloaded a record for editing, the server can keep that record locked until the user who downloaded that record for editing has either submitted the changes or canceled the editing. Any other user who tries to change the locked record will be told that it's now locked by the user who's editing it. With a stateless HTTP server, there's no concept of a current connection and thus no simple mechanism for locking a record.

TIP
One way to handle updates in a CGI program is with a timestamp field in each record-you must update it every time the record changes. In your editing forms, include hidden timestamp fields so that the CGI program can detect conflicting updates.

Another significant limitation of HTML forms for database work is the lack of field-level validation. A conventional client/server database typically allows the designer to specify various constraints on each field: this one must be an integer between 5 and 99, that one must be in a telephone number format, and so on. When a user tries to type a name into a phone number field, the program immediately beeps and displays an error message. But an HTML form is sent to the server all at once when the user clicks the submit button, so your script must handle the errors all at once.

Security Issues

Remember that any CGI script is being executed on the same machine as your HTTP server to fulfill a request from an untrusted client browser. Although this is of course true of any CGI script, with a CGI RDBMS application, your script is itself a trusted client to your RDBMS server. That means you must be even more careful about what you allow users to do through your CGI interface than you would when writing other types of CGI programs.

I said that your CGI program is a trusted client to your RDBMS server. How does the RDBMS server know your script is to be trusted? Two mechanisms are commonly used by database servers to authenticate client programs, both of which have important security implications.

One approach is for the database server to implement its own user name and password system, independent of your operating system's or Web server's user names and passwords. In this case, your program source code must contain a database password, which it transmits every time it tries to connect. This, of course, means you must be careful to prevent strangers from being able to see your actual program code.

Also, to limit the damage in case someone does manage to see the password contained in your program, you should create a user account on your database server with only the access rights needed for your CGI program to function, and use that in all your CGI programs. If most of your CGI programs do only searching, and only one does updating of the database, just that one should have update rights in the database.

COUTION
A particular trap in the CGI-database context is the use of file-name extensions to tell your HTTP server which files are CGI executables and which are documents. If your text editor makes a backup copy in the same directory with a different extension (such as name.bak or name.cgi~), a wily cracker might be able to download your actual code.
For this reason, I strongly advise anyone using CGI scripts with hard-coded passwords to configure the HTTP server so that certain directories are defined as CGI directories; files in those directories can only be executed, never displayed. In my experience, any time you mix documents and programs in the same directory, you're asking for trouble.

The other common mechanism for the database server to decide whether a client can be trusted is to define database access rights for specified operating system user names. This, of course, means the database server must trust the operating system to authenticate users. In the CGI context, the use of operating system user names for authentication presents an especially tricky issue because most HTTP servers run all CGI scripts under a special, low-privilege user name (for example, most UNIX HTTP servers run all CGI scripts under the name "nobody" or the name "www"). This means you must trust every person who writes CGI scripts on your HTTP server!

One alternative provided in some operating systems is the capability to have a CGI program run as the user name of its owner with all rights that person would have. This eliminates the need to have your database server trust every CGI script, but creates its own security problems because now your CGI program has significantly more access rights to the rest of your system than does a CGI running under a special, low-privilege ID. Of course, if your HTTP server is running under a single-user operating system that lacks the concept of user names, you must trust every CGI program in any case.

TIP
To handle multiple classes of users with one script, put symbolic links (aliases) to your script in multiple directories with different access policies. Your script checks the SCRIPT_NAME environment variable to see which version was called.

A Simple Working Example in Perl

I often find vague generalities and isolated code snippets frustrating because they don't tell you how the various pieces fit together to form a complete application. To show you how a Web/database gateway works, in the following sections I'll build a small working application that maintains a hotlist of Web sites, which remote users can search.

First, let's see how a Perl script talks to the database. I'll show the specific calling conventions of two database engines; then I'll show how to make the code much more portable. The sample application is written using this portable interface.

DBMS Access from Perl

As I noted earlier in the sidebar titled "SQL and its Dialects," the subset of SQL needed for most CGI/database programming is very nearly universal; the same SQL code can often be ported with little or no change from one database engine to another. However, the details of sending that SQL to the database server and getting the returned results back from the server are much more varied. Most SQL database servers provide some form of C API for this purpose-typically, a set of functions that can be linked into a C client program. Because Perl, with its very strong string manipulation and I/O facilities lends itself so well to database manipulation, Perl wrappers have been written for most of the common database server APIs. This enables database access from within Perl programs, but limits portability because each database server API is unique.

Two DBMS APIs

To illustrate the variation in database APIs, listings 13.2 and 13.3 show two short sample programs written in the Perl wrappers for Sybase and mSQL servers. Each program connects to the server, asks for a database called test, sends a single SQL statement, and prints out the results.


Listing 13.2  An Example of a Simple SQL Query with Sybperl
#!/usr/local/bin/perl
require sybperl;
#
#This code tested with Sybase 4.9.1 and Sybase 10.0.1 under SunOS 4.1.2
#
#NOTE: for Perl4, or for Statically loaded Perl5 versions
#of sybperl, you must edit the first line to replace
#the name 'perl' with the name of your sybperl version

#raw_syb_perl_demo.p
#A simple demonstration of Sybperl in action
#
#Must define $USER,$PWD,$SERVER here!
    $dbproc = &dblogin( $USER,$PWD,$SERVER);
    $dbproc != -1 || die "Can't connect to $server ...\n";
    &dbuse( "test" ) || die "Can't use $database ...\n";

#Create the SQL statement & send to the server
$SQL = "select last_name,salary,id from employees";
&dbcmd( $SQL ) || die "Error in dbcmd.\n" ;
&dbsqlexec || die "Error in dbsqlexec.\n" ;
$result = &dbresults($dbproc);

#and get the resulting rows
%row = &dbnextrow($dbproc, 1); #get first row
while (%row = &dbnextrow($dbproc, 1))
     {
          print "last_name:$row{'last_name'}\t";
          print "salary:$row{'salary'}\t";
          print "id:$row{'id'}\t";
          print "\n";
     }


Listing 13.3  The Same Query as Listing 13.2, but Using mSQL
#!/usr/bin/perl
#raw_msql_perl_demo.p
#
#This code has been tested with mSQl 1.0.6 under SunOS 4.1.4
#
#A simple demonstration of Msqlperl in action
require "Msql.pm";$host = shift || "";
package main;
#Connect in two steps: (1) Connect and (2) SelectDB...
if ($dbh = Msql->Connect($host))
    {print "Connected\n";} else {die "failed to connect\n";}
if ($dbh->SelectDB("test"))
    {print("Test db\n");} else {die "Select db failed\n";}

$SQL = "select last_name,salary,id from employees";
$sth = $dbh->Query($SQL) or die $Msql::db_errstr;
#get the hash associating fieldnames and numbers:
@fieldnum{@{$sth->name}} = 0..@{$sth->name}-1;
# %fieldnum is now a list of fieldnums, indexed on names
#and get the rows
while (@row = $sth->FetchRow())
     {
          print "last_name:$row[$fieldnum{'last_name'}]\t";
          print "salary:$row[$fieldnum{'salary'}]\t";
          print "id:$row[$fieldnum{'id'}]\t";
     print "\n";
     }

The output from either program would look something like the following:


last_name:Smith     salary:21000     id:123
last_name:Huskins     salary:19500     id:124
last_name:Williams     salary:51075     id:125
last_name:Jones     salary:27000     id:126
last_name:Hill     salary:17500     id:127

Notice in listings 13.2 and 13.3 that the SQL code string is exactly the same for either database server, and the output is also the same (assuming identical data in the table, of course). Also, the structure of the two programs is very similar-connect to the database, send the query, get the rows. But the details of how the client communicates with the server are different. Obviously, what you need if your code is to be portable is some kind of abstraction layer that insulates the programmer from most database-specific details. Fortunately, such a layer has been written.

A Simple DBMS Abstraction Layer

The author of the excellent WDB database forms-generation package, discussed in more detail later in the section "WDB," has written a simple database interface, or dbi layer. (The dbi libraries are included in the WDB distribution on the CD-ROM accompanying this book.) By isolating most of the database-specific details to one Perl function library, he made the whole package quite easy to port-and various database programmers have written versions of the dbi library. At the time of this writing, versions of WDB-and therefore of the dbi library-are available for Sybase, Informix, mSQL, and Postgres95. The most recent release of WDB includes an alpha version of the Oracle port. Listing 13.4 is a dbi version of the simple Sybase and mSQL clients in listings 13.2 and 13.3.


Listing 13.4  The Same Query as Listings 13.2 and 13.3, Using the dbi Layer
#!/usr/local/bin/perl
#Either_dbi_demo.p
#
#This works with either Sybperl or MsqlPerl

#AS SHOWN HERE, this works with MsqlPerl.
#To make it work with Sybperl, change the
#    $dbFlag line below.
#
#Also, if you are using the Perl4 version of sybperl
#then you must change the first line of this program

$dBFlag = 'MSQL';  ## OR $DbFlag = 'SYBASE'
#this is the msql version!

if ($DbFlag eq 'MSQL') {require 'msql_dbi.pl';}
elsif ($DbFlag eq 'SYBASE') {require 'syb_dbi.pl';}
else {die "unsupported database\n";}

$database = "test"; #define $User, etc here!
&dbi_connect( $user, $pswd, $server, $database );

$Query = "select last_name,salary,id from employees";
&dbi_dosql($Query);

if ($DbFlag eq 'MSQL') #one extra thing needed for Msql
{&dbi_fieldnames( 'last_name', 'salary','id');}

while( %row = &dbi_nextrow  ) {
          print "last_name:$row{'last_name'}\t";
          print "salary:$row{'salary'}\t";
          print "id:$row{'id'}\t";
          print "\n";
       }

If you have either Sybase and Sybperl or mSQL and MsqlPerl installed on your system, you can run the code in listing 13.4 on either platform by editing it as indicated by the comments in the program. And revising it to work with the other versions of the dbi library shouldn't be much more difficult. All Perl examples in the rest of this chapter use the msql_dbi.pl interface, so they could easily be ported to any other database for which WDB has been ported.

Schema for the Working Example

This example, as I said earlier in the section "A Simple Working Example in Perl," is a simple interactive hotlist of Web sites, with the URL and a description for each site. Remote users can search the hotlist and submit new entries for potential inclusion. The administrator (who knows the appropriate password) can review submissions, adding approved submissions to the hotlist for public viewing. Think of it as a very rudimentary equivalent of Yahoo-just as the Wright brothers' flying machine of 1903 was a very rudimentary equivalent of an airliner.

Database Tables and Fields

This database has three tables. The UIDs table in table 13.1 is used for generating UIDs so that each record in the other tables will have a unique identifier that can be used as a primary key.

Table 13.1  The UIDs Table

ColumnType
Len
Not Null
Key
TableNamechar
40
Y
Y
MaxUIDlongint
4
Y
N

This is a very common technique used by database designers: One creates one row in the UIDs table for each table that needs UIDs generated. MaxUID then records the highest UID yet assigned; each time you create a new row for a data table, you increment the MaxUID value for that table and use this for the new row of data.

The Hotlist table, shown in table 13.2, contains data for all approved submissions to the database.

Table 13.2  The Hotlist Table

ColumnType Len
Not Null
Key
UIDlongint8
Y
Y
URLchar100
Y
N
SHORTDESCchar50
N
N
DESCRIPTIONchar200
N
N

New submissions are stored in the Submissions table until they have been approved by the database administrator, as shown in table 13.3. The Hotlist table and the Submissions table are otherwise identical.

Table 13.3  The Submissions Table

ColumnType Len
Not Null
Key
UIDlongint8
Y
Y
URLchar100
Y
N
SHORTDESCchar50
N
N
DESCRIPTIONchar200
N
N

Directory Layout

The data tables are stored by the database server-mSQL was used for this example. Although any database server almost certainly stores the actual data as disk files somewhere, the database server manages those internally; indeed, that's fundamentally why one uses a database server. In addition to the database tables described in the preceding section, my sample application consists of three HTML documents and three Perl scripts placed in three directories, as follows:

~healy/public_html: (documents)
   DemoHome.html
   Search.html
   Submission.html

The three files are the top-level main screen and the two forms for searching and submitting data. Because they reside in my public HTML directory, they can be viewed by any user on the Web.

The following two programs reside in an unprotected directory within the cgi-bin hierarchy on this Web server, so they can be run as CGI scripts by anyone on the Web.

…/cgi-bin/healy/public: (public CGI programs)
   SearchHotlist.p
   ShowDetails.p

The following directory is password protected by means of the .htpasswd and .htaccess files, so you must type a name and password to run the program in this directory as a CGI script.

…/cgi-bin/healy/private: (private program)
   .htpasswd
   .htaccess
   ListSubmissions.p

Searching the Hotlist

The user of this application typically will begin with a simple opening screen listing the available options. I've intentionally kept the screen in figure 13.4 as simple as possible. Listing 13.5 shows the HTML for it. Most of the hot links in this opening screen point to scripts that perform the actual work of providing database access.

Figure 13.4 : The opening screen for the Hotlist database has hot links to the available programs for database access.



Listing 13.5  The Opening Screen
<HTML>
<HEAD><TITLE>Hotlist Demo HomePage</TITLE></HEAD><BODY>
<H1>Hotlist Demo HomePage</H1>

<A HREF="Search.html">Search the Hotlist</A><p>

<A HREF="Submission.html">Submit an Item for the Hotlist</A><p>

<a HREF="/cgi-bin/healy/SearchHotlist.p">See All Records in Hotlist
</a><p>

<a HREF="/cgi-bin/healy/ListSubmissions.p">Transfer Submitted Data</a>
to the Public portion of the database (password required).<p>
</PRE></BODY>
</HTML>

Clicking Search the Hotlist calls up the search form, which I've likewise kept as simple as possible (see fig. 13.5). Listing 13.6 is the HTML for the search form in figure 13.5.]

Figure 13.5 : The user enters search criteria into this form, which then posts the criteria to the searching script.



Listing 13.6  The Search Form
<HTML>
<HEAD>
<TITLE>Hotlist Searching Form</TITLE>
</HEAD>
<BODY>
<H1>Hotlist Searching Form</h1>
Please enter a keyword and click the <b>Send</b> button at the bottom.
The <b>Reset</b> button will clear everything you've typed.<P>
<FORM ACTION="http://server.wherever/cgi-bin/healy/SearchHotlist.p" METHOD="POST"><hr>
<p>
<INPUT name="SHORTDESC" size=20 value=""><b>SHORT description</b><BR>
<hr>
<INPUT TYPE="submit" VALUE="Send"> <INPUT TYPE="reset" VALUE="Reset"><P>
<hr>
</FORM>
</BODY>
</HTML>

Submitting a search request-by entering a key to search and clicking the Submit button-will POST the search key to a simple searching script. This script generates a SQL query and submits it to the server. It works just like the three sample database query scripts in listings 13.2 through 13.4, except that the SQL string is built up from the form data. This is known as dynamic SQL.

NOTE
I do no error checking on input; I just wrap it in an SQL like clause in the program, which is shown later in listing 13.7.

To keep this example as simple as possible, I provide only one search field, and I canonicalize that field to uppercase in the database. If your search form provides multiple lookup fields, you must generate a complex where clause based on which fields contain search strings. The WDB package, which I discuss later in the section "WDB," can build up such a where clause based on form contents.

In the program in listing 13.7 and all my other form-handling CGI programs, to avoid having all the messy details of reading and parsing the form information, my programs call Steven Brenner's cgi-lib.pl routines. This library can be found at many FTP sites or on the CD-ROM included with this book. One particular advantage of cgi-lib.pl for database work is that it handles GET or POST identically: Name value pairs can be appended to the URL as ?name1=value1&name2=value2... or sent as a POST data block.


Listing 13.7  Perl Code to Perform the Search
#!/usr/local/bin/perl
#
#This program tested with Msql 1.0.6 under SunOS 4.1.4 and
#NCSA httpd 1.5 with Perl 5.001m

#do this as soon as possible!
print "Content-type:text/html\n\n";

#Define two little subroutines for urlencode/decode
#
#replace funny characters with %xx hex for urls
sub escape
{
    ($_)=@_;
    s/([^a-zA-Z0-9_\-.])/uc sprintf("%%%02x",ord($1))/eg;
    $_;
}

#replace + with space and %xx with that ASCII character
sub unescape {
    ($_)=@_;
    tr/+/ /;
    s/%(..)/pack("c",hex($1))/ge;
    $_;
}

#load the cgi library
require "cgi-lib.pl";
#load the Msql database interface library
require 'msql_dbi.pl';

# Start output

#read in the form contents:

&ReadParse(); #handles GET or POST forms w/identical results
#now @in has key=value pairs, and %in{key} = value
#Main Program Begins Here

$SHORTDESC = $in{'SHORTDESC'};
$SHORTDESC =~ tr/a-z/A-Z/;  #convert to uppercase
$SCRIPT_NAME = $ENV{'SCRIPT_NAME'};

#connect to database server
$user = "healy";
$server = "server.wherever";
$passwd = "dummy";  #not used, for msql goes by Unix UID;
$database = "test";
&dbi_connect( $user, $pswd, $server, $database );

$Query = "select UID,URL,SHORTDESC from HOTLIST";
$Query = $Query . " where SHORTDESC like '%";
$Query = $Query . $SHORTDESC . "%'";

&dbi_dosql($Query);
#the next line is msql-specific; comment-out for other ver
&dbi_fieldnames('UID','URL','SHORTDESC','DESCRIPTION');

print "<h1>Search Results</h1>\n";

while( %row = &dbi_nextrow  )
     {
      print '<a href="';
      print "$row{'URL'}";
      print '">';
      print &unescape($row{'SHORTDESC'});
      print "</a> ";
      print '<a href="';
      print '/cgi-bin/healy/ShowDetails.p?';
      print 'UID=';
      print $row{'UID'};
      print '">';
      print "Details</a><p>\n";
      }

print "Click on a link to go there, or click on
  1. <b>details</b> for a more-detailed description of the link\n";

After a search is performed, the output looks something like the sample screen shown in figure 13.6. Listing 13.8 is the HTML generated for the search results in figure 13.6.

Figure 13.6 : After searching the hotlist database, the user sees a list of hot links like this one. Each hot link calls a Perl script that shows the details of that record.



Listing 13.8  Typical Output from Search of Hotlist Database, with URLs to Detailed Views
<h1>Search Results</h1>

<a href="http://arch-http.hq.eso.org/bfrasmus/wdb/install.html">
WDB INSTALLATION INSTRUCTIONS</a>
<a href="/cgi-bin/healy/public/ShowDetails.p?UID=2">Details</a><p>
<a href="http://arch-http.hq.eso.org/bfrasmus/wdb/">WDB HOMEPAGE</a>
<a href="/cgi-bin/healy/public/ShowDetails.p?UID=3">Details</a><p>
<a href="http://cscsun1.larc.nasa.gov/~beowulf/db/
existing_products.html">LIST OF WWW DBMS GATEWAYS</a>
<a href="/cgi-bin/healy/public/ShowDetails.p?UID=7">Details</a><p>
<a href="http://server.wherever/~healy/Submission.html">DB GATEWAY
</a>
<a href="/cgi-bin/healy/public/ShowDetails.p?UID=13">Details</a><p>
Click on a link to go there, or click on <b>details</b>
for a more-detailed description of the link

Viewing the Detail Record

Notice that the Details links in the search results screen shown in figure 13.6 (with the HTML given in listing 13.8) point to a second cgi script and that each URL has ?UID=nn appended. This simple example shows how state is maintained on the client browser side-no history is maintained on the stateless server. Listing 13.9 shows the code for ShowDetails.p, the CGI program that generates the detail record; it's quite similar to the previous example.


Listing 13.9  Perl Code to Return the Detail View
#Up to here code is identical with SearchHotlist.p above
#
#now @in has key=value pairs, and %in{key} = value
#Main Program Begins Here
#
$UID = $in{'UID'};

#connect to database server
$user = "healy";
$server = "server.wherever";
$passwd = "dummy";  #not used, for msql goes by Unix UID;
$database = "test";
&dbi_connect( $user, $pswd, $server, $database );

$Query = "select UID,URL,SHORTDESC,DESCRIPTION from HOTLIST where UID = $UID";

&dbi_dosql($Query);
#the next line is msql-specific; comment-out for other ver
&dbi_fieldnames('UID','URL','SHORTDESC','DESCRIPTION');

print "<h1>Detail View</h1>\n";

while( %row = &dbi_nextrow  )
     {
      print "Hot link to this item: ";
      print '<a href="';
      print "$row{'URL'}";
      print '">';
      print &unescape($row{'SHORTDESC'});
      print "</a><br>";
      print "Detailed description: ";
      print &unescape($row{'DESCRIPTION'});
      print "<p>\n";
      }

Figure 13.7 shows an example of the detail screen, and listing 13.10 shows the HTML generated for this screen.

Figure 13.7 : Clicking an URL in the search results screen of figure 13.6 generates a detail view such as this one.



Listing 13.10  The Detail View of Figure 13.7
<h1>Detail View</h1>
Hot link to this item:
<a href="http://cscsun1.larc.nasa.gov/~beowulf/db/existing_products.html">
LIST OF WWW DBMS GATEWAYS</a><br>
Detailed description: Comprehensive List of Tools for Building RDBMS CGI Gateways<p>

In this simple example, I have only one hot link-to the URL being described. In a real application, you can (and should) have multiple hot links in your detail screens-hot links that perform lookups on this or other databases. The HTML snippet in listing 13.11, from a hypothetical Employee Detail screen, shows what I mean.


Listing 13.11  A Hypothetical Employee Detail Screen
<h1>Tom Anderson</h1>
Department:
<a href="http://server.wherever/cgi-bin/DeptSrch?Deptid=17">Engineering</a><p>
Location:
<a href="http://server.wherever/cgi-bin/LocSrch?Locid=29">Podunk
</a><p>
Position:
<a href="http://server.wherever/cgi-bin/PosSrch?Posid=17">CAD 
Technician</a><p>
Mail Stop:
<a href="http://server.wherever/cgi-bin/EmpSrch?Mailid=97">POD-43
</a><p>

Clicking any field in the detail record will perform a lookup of related records in that category. The list would contain the names of employees, and the URL behind each name would do a lookup on the employee ID. This effectively converts a relational database into a giant hypertext document.

Such hyperlinks can just as easily link one database with another. For example, several large international databases that are widely used in molecular biology have Web interfaces. A local database used by one research group for its own internal data can include hot links to related information from one or more of these international databases.

TIP
If you want to check them out, the following are some URLs to major biological and chemical databases: Most of these URLs have links to other sites with related information as well.

Consider a hypothetical business-related example. The marketing people in your company have created a Web-accessible database of product information, including which company locations build given parts of each product. Now suppose that you're a programmer in the personnel department given the job of putting your company directory onto the Web. You could use hyperlinks in your directory database to link product information with directory information. For instance, the location field in a directory detail screen could show the list of products made at that location.

With the cooperation of the folks maintaining the product information database, you could also work the other way: When seeing a record listing the various locations making each part of the product, you could click links that locate people who worked at each location. The possibilities for using the Web to integrate multiple databases in different locations are limited only by the programmer's imagination-and the quality of available databases, of course.

Submitting Data to the Hotlist

Of course, a database is scarcely complete without a means for entering data. For some databases, the Web front end allows only searching. For this example, however, I'll also include a simple data submission form so that any user can submit proposed records for possible inclusion in the publicly searchable database. Figure 13.8 shows the submission screen. Listing 13.12 is the HTML for the submission screen.

Figure 13.8 : This form is used for the remote submission, via the Web, of records to be added to the database.



Listing 13.12  A Simple HTML Form to Submit New Data via the Web
<HTML>
<HEAD>
<TITLE>Url Submission Form</TITLE>
</HEAD>
<BODY>
<H1>Url Submission Form</h1>
Please enter data in the fields below and click the <b>Send</b> button at the bottom.
The <b>Reset</b> button will clear everything you've typed.<P>
<FORM ACTION="http://server.wherever/cgi-bin/healy/public/Submit.p"
METHOD="POST">
<hr>
<p>
<INPUT name="URL" size=60 value="" ><b>The URL</b><BR>
<INPUT name="SHORTDESC" size=20 value=""><b>SHORT description</b><BR>
<TEXTAREA name="DESCRIPTION" ROWS=2 COLS=40></TEXTAREA>
Longer Description (up to 3 lines)<BR>
<hr>
<INPUT TYPE="submit" VALUE="Send"> <INPUT TYPE="reset" VALUE="Reset"><P>
<hr>
</FORM>
Note:<p>
The data entered will be reviewed by the database administrators before being added to
the database; submitted records will usually be available for viewing in one or two
working days. Please check back in a few days to confirm that your submission has been
added.<p>
</BODY>
</HTML>

Submitted data will be posted to the script in listing 13.13.


Listing 13.13  Perl Script to Handle Data Submission Form
#Up to here code is identical with SearchHotlist.p above
#
#now @in has key=value pairs, and %in{key} = value
#Main Program Begins Here
#connect to database server
$user = "healy";
$server = "server.wherever";
$passwd = "dummy";  #not used, for msql goes by Unix UID;
$database = "test";
&dbi_connect( $user, $pswd, $server, $database );

$UID = $in{'UID'};
$URL = $in{'URL'};
$SHORTDESC = &escape($in{'SHORTDESC'});
$SHORTDESC =~ tr/a-z/A-Z/;  #convert to uppercase
$DESCRIPTION = &escape($in{'DESCRIPTION'});
$Query = "select MaxUID from UIDs where TableName = 'SUBMISSIONS'";
&dbi_dosql($Query);
#the next line is msql-specific; comment-out for other ver
&dbi_fieldnames('MaxUID');
%row = &dbi_nextrow; 
$MaxUID = $row{'MaxUID'} + 1;
$Query = "Update UIDs Set MaxUID = $MaxUID where TableName =
   'SUBMISSIONS'";

&dbi_dosql($Query);
$Query = "Insert into SUBMISSIONS values(";

$Query = $Query . $MaxUID . ",'";

$Query = $Query . $URL . "','";
$Query = $Query . $SHORTDESC . "','";
$Query = $Query . $DESCRIPTION . "')";

&dbi_dosql($Query);

print "<h1>Submission Accepted</h1>\n";
print "Thank you for your submission. \n";
print "It will be reviewed by the database administrator \n";
print "for possible inclusion in our hotlist \n";

A couple of interesting "wrinkles" to this script don't appear in the other programs:

The theoretical possibility exists that two people might submit new data at precisely the same moment, so that between the instant of getting the current value of MaxUID and the instant of updating UIDs, another user could get the same UID value. Although that's not very likely with a simple application like this one, it's very much a concern for very active databases. Most high-end database engines have a feature called transactions (which mSQL doesn't support); one declares the actions of getting the UID and updating the UID to be one transaction that must be run as a unit or not run at all.

NOTE
For simplicity, the script in listing 13.13 doesn't validate user input-it just sticks whatever the user entered into a table. Your real applications should, of course, perform appropriate validation on the input data. Your script also could try to GET the URL entered by the user, to verify that it's a valid URL.

Generating SQL Code to Transfer Submitted Data

The last piece of this package is a mechanism by which an administrator can transfer data from the Submissions table to the Hotlist table. To sidestep the complexities of updating via the stateless Web server, I use a different approach-a CGI script that doesn't perform any updating itself, but generates an SQL script to perform the required actions. Listing 13.14 is the Perl code.


Listing 13.14  Generating Transfer SQL to Move Data to Public Table
#Up to here code is identical with SearchHotlist.p above
#
#now @in has key=value pairs, and %in{key} = value
#Main Program Begins Here
#connect to database server
$user = "healy";
$server = "server.wherever";
$passwd = "dummy";  #not used, for msql goes by Unix UID;
$database = "test";
&dbi_connect( $user, $pswd, $server, $database );

$Query = "select UID,URL,SHORTDESC,DESCRIPTION from SUBMISSIONS";
    print "#SQL Query: $Query\n\n";
    print "#\n#\n#\n";
    print "#Review this SQL script with care, then ";
    print "pipe it through msql\n#\n#\n";

&dbi_dosql($Query);
#the next line is msql-specific; comment-out for other ver
&dbi_fieldnames('UID','URL','SHORTDESC','DESCRIPTION');

print "#Inserting into HOTLIST\n\n";
while( %row = &dbi_nextrow  )
     {
      print "Insert into HOTLIST values(\n";

      print "$row{'UID'}'\n,";

      print "$row{'URL'}'\n,'";
      print "$row{'SHORTDESC'}'\n,'";
      print "$row{'DESCRIPTION'}'";
      print ')\g';
      print "\n";
      }

$Query = "select MaxUID from UIDs where TableName = 'SUBMISSIONS'";
&dbi_dosql($Query);
#the next line is msql-specific; comment-out for other ver
&dbi_fieldnames('MaxUID');
$MaxUID=0;
$Query = "select MaxUID from UIDs where TableName = 'SUBMISSIONS'";
&dbi_dosql($Query);
#the next line is msql-specific; comment-out for other ver
&dbi_fieldnames('MaxUID');
$MaxUID=0;
%row = &dbi_nextrow;
$MaxUID = $row{'MaxUID'};
print "\n\n#Updating UIDs\n\n";
print "Update UIDs Set MaxUID = $MaxUID where"
print " TableName = 'HOTLIST'" . '\g' . "\n\n";

print "\n\n#Deleting from SUBMISSIONS\n\n";
  1. print 'delete from SUBMISSIONS where UID <= . $MaxUID \g';

Running this script via the Web will generate SQL similar to that in listing 13.15.


Listing 13.15  Typical Transfer SQL Generated by the Program in Listing 13.14
#SQL Query: select UID,URL,SHORTDESC,DESCRIPTION from SUBMISSIONS
#
#
#Review this SQL script with care, then pipe it through msql
#
#
#Inserting into HOTLIST

Insert into HOTLIST values(
18
,'http://gasnet.med.yale.edu/'
,'GASNET'
,'The%20Gasnet%20server%20has%20various%20resources%0D%0Afor
%20Anesthesiology...')\g
Insert into HOTLIST values(
17
,'http://www.ncbi.nlm.nih.gov/BLAST/'
,'BLAST'
,'BLAST%20Homepage%20at%20the%20National%20Center%0D%0Afor
%20Biotechnology%20Information')\g
Insert into HOTLIST values(
16
,'http://www.eol.ists.ca/~dunlop/wdb-p95/'
,'WDB%20POSTGRES'
,'WDB%20Port%20to%20Postgres')\g
Insert into HOTLIST values(
15
,'http://www.comvista.com/net/www/cgidata.html'
,'MAC%2FWWW%2FDB'
,'List%20of%20URLs%20with%20info%20on%20Mac%20WWW%2FDBMS%0D
%0Ascripting')\g

#Deleting from SUBMISSIONS

delete from SUBMISSIONS where UID <= 18\g
#Updating UIDs

Update UIDs Set MaxUID = 18 where TableName = 'HOTLIST'\g

The database administrator edits the SQL generated to delete records that shouldn't be added to the Hotlist table, and then feeds the script through the mSQL command line. Alternatively, if you want a purely Web solution, you can modify this script to generate a form containing all the SQL in a scroll- able text area. That form would submit the edited SQL to another script that pipes the SQL through the mSQL command line.

Probably the cleanest approach is to generate updating forms that contain database fields instead of SQL code. However, the issue of conflicting updates would then need to be addressed, probably by the use of timestamps. The sequence would be something like the following:

  1. Generate an updating form with all data in editable fields and the timestamp included as a hidden field in the form.
  2. On submission of the edited form, first check the timestamp in the database record against the timestamp in the form. If they vary, emit an error message and quit without updating the record.
  3. When updating the record, also update the timestamp value.

With mSQL, you need to update the timestamps in the CGI script and in every program that updates the database. With many higher-end servers, you can define a timestamp that the database engine will maintain automatically with no possibility that a programmer will forget to change timestamps whenever any data changes.

NOTE
In Sybase, the term timestamp is a misnomer; the timestamp value has no relationship to wall-clock time at all. Sybase simply takes responsibility for guaranteeing that the value of this field will change every time any other field in that record changes. Thus, the sole value of Sybase timestamp fields is to check for conflicting updates.

Web/Database Tools

Database access via the Web is such a "hot area" now that it would be impossible to list every product available or promised. Every major database vendor either has a Web-related product available now or promises to have one soon, and many individual programmers are working on their own database gateways, running with a wide range of database engines. For a large database, one probably needs a high-end engine-such as Sybase, Oracle, or Illustra-but for simple projects, a microcomputer-based DBMS such as Access or FileMaker Pro can also be used. For intermediate-scale projects, you can also use middle-of-the-road databases, such as MiniSQL, which are capable servers for UNIX but lack the power (and expense) of high-end commercial databases.

You can safely assume that a mechanism exists for linking any reasonable combination of database server and HTTP server, but the quality of the implementations varies wildly. If you're thinking of putting a database on the Web, I'd suggest you start by reading Usenet postings in newsgroups related to your HTTP server, your favorite programming languages, and your favorite database packages. You should also look at the Web site of your database vendor. In the following sections I list a few tools available on the Web and some Web sites containing pointers to information about Web/database tools.

Installing Web/Database Tools
Some of the tools I mention are commercial programs, which generally come with installation instructions. Others are shareware or freeware. As with all shareware or freeware programs, part of the price you pay for low-cost software is less hand-holding when it comes to installation. Generally, you install these in the same manner as you would install any shareware program for your OS.
Many of the tools I list are for UNIX servers and typically come as source code in tar form. If you've never installed such a package, you should probably seek out your local UNIX guru for assistance. UNIX comes in many different flavors, and every system I've ever used has a number of local customizations added to the standard system, so it's not possible to give detailed directions here. Most of the UNIX packages I mention do come with makefiles or other installation scripts, but you will probably need to check these over and edit them as required by your local configuration; pay particular attention to where various directories are located on your system.
Most of these tools do have installation and configuration information on their authors' Web sites, so you should look there first. Also, the directions for many database interfaces assume that you have the database and its API library already installed, so you should do that part first.
For microcomputer operating systems such as Windows or Mac, most tools are distributed as executable binaries, rather than as source code, so installation will typically be somewhat simpler than for UNIX.

WDB

WDB is a freeware package written entirely in Perl that greatly simplifies the task of building a Web front end to a relational database under UNIX. (Source code and installation directions for WDB, in UNIX tar format, are included on the CD-ROM that comes with this book.) Because all the database-specific aspects of WDB are confined to one module of the program, and the entire package is written in Perl, porting WDB to various databases has proven relatively easy. WDB has been successfully ported to a number of database servers and to many versions of UNIX, including Linux. If you're UNIX and Perl-oriented, as I am, WDB should be very high on your short list of preferred free solutions, especially because the excellent shareware database mSQL is one of the supported SQL engines.

The heart of WDB is what its author, Bo Frese Rasmussen, calls a form definition file (FDF). For each searching form, you write a description in a relatively high-level form definition language to list the tables and fields to be used for this form, the search constraints, how you want the output formatted, and so on. WDB comes with a utility program that generates an FDF template from a database table or view by querying the metadata provided. You then edit its output. Although the program can't do your thinking for you, many details that would otherwise need tedious hand-hacking are automatically filled in by the FDF-making utility.

Given an FDF, WDB will generate an HTML search form on the fly. As with the program-generated FDFs, you may want to edit the output. When the user submits a completed search form, WDB generates the SQL query, performs the search, and returns a list of matching records, each item of which has a hot link to the full screen view of that record.

The upshot is that WDB does much of the donkey work-especially for ad hoc searches-for you. Its feature set is somewhat limited, but if you know Perl, you can probably modify it to your heart's content-the code is well-structured and well-commented. Simpler customizations can be handled by adding snippets of Perl code to the FDF because several "hooks" are provided to call your own code at various stages in the query process.

A particularly nice feature is the capability to define input and output conversion functions that apply to specified fields; WDB will automatically invoke these at the correct time in its query-generation and formatting cycle. WDB also has several features designed to help you generate URLs from database lookups, which can perform other lookups of related data when clicked. Clever use of this elegant feature of WDB can effectively turn your relational database into a gigantic hypertext document.

Information about WDB, installation instructions, and the downloadable source code can be found at the following URLs:

http://www.eol.ists.ca/~dunlop/wdb-p95/
http://www.dtv.dk/~bfr/wdb/

Web/Genera

Web/Genera, by Stanley Letovsky, Mary B. Berlyn, and others, is another public-domain software tool set that simplifies the integration of Sybase databases into the Web. It can be used to retrofit a Web front end to an existing Sybase database, or to create a new database. Similar to WDB, Web/Genera requires you to write a specification of the Sybase database and of the desired appearance of its contents on the Web by using a high-level schema notation. The Web/Genera programs process this description to generate SQL commands and formatting instructions that together extract objects from a database and format them into HTML.

Web/Genera also supports form-based relational querying and whole database formatting into text and HTML formats. The notation of Web/Genera seems to be richer than that of WDB because the former includes notions, such as set and group, for field values. This makes it possible to build pick lists on the fly. However, unlike WDB, Web/Genera doesn't provide hooks for users to write customized extensions. To make any extensions to Web/Genera, the user must change its source code, which is written mainly in C and Perl.

Information about Web/Genera, installation instructions, and the down-loadable source code can be found at the following URL:

http://gdbdoc.gdb.org/letovsky/genera/genera.html

GSQL

GSQL, written by Jason Ng at NCSA, covers much the same territory as WDB and Web/Genera, but is written in C instead of Perl. As with WDB, GSQL uses a higher-level language to write what its authors call PROC files. Unlike WDB, GSQL doesn't provide a utility to generate an initial version of the PROC file for you. A similarity with WDB is that GSQL has also been designed with porting in mind and has been ported to several different databases on various flavors of UNIX. The PROC language used by GSQL is more procedural in flavor than the FDF language used by WDB, which I would describe as more descriptive.

Overall, I consider GSQL somewhat more powerful than WDB but significantly more difficult to install and maintain. Also, customization must be done in C rather than in Perl. The more procedural nature of its schema language gives it more flexibility than WDB but also means you have to do more work.

Information about GSQL, installation instructions, and the downloadable source code can be found at the following URL:

http://www.ncsa.uiuc.edu/SDG/People/jason/pub/gsql/starthere.html

MORE

MORE is an acronym for Multimedia Oriented Repository Environment, which is part of the Repository Based Software Engineering Project (RBSE) led by Dave Eichmann, Terry McGregor, and Dann Danley. As you may have guessed from the acronyms, MORE is designed for very large-scale database repositories. It comprises a number of large C programs-totaling more than 30,000 lines of code-on top of a large Oracle database. It's also designed to allow a single database schema to be distributed across a number of servers, presenting a consistent user interface to a larger amount of data than any single server could handle.

In addition to the demonstration of MORE itself, its Web site also contains links to some very interesting technical papers discussing how MORE works and how the inherent limitations of current Web technology constrained the design of MORE. These papers are well worth reading-even if you never intend to build anything as complex as MORE-because they have some very lucid discussions of general Web/database design issues.

Information about MORE can be found at the following URL:

http://rbse.jsc.nasa.gov:81/DEMO/

MiniSQL (mSQL) and W3-mSQL

mSQL is a shareware SQL engine (free to academic users) that has been ported to most versions of UNIX. In combination with WDB, it provides an excellent, low-cost way to build a simple relational database with a Web front end. And the combination of WDB with mSQL is known to work well under Linux.

mSQL's author, David Hughes, has also written a new program called W3-mSQL, which was in alpha at the time of writing. Although I've included both mSQL and W3-mSQL on the CD-ROM included with this book, I haven't yet tried W3-mSQL myself, so I don't know how well it works. According to its author, W3-mSQL works via HTML documents with embedded tags that perform database lookups, so you don't need to write any CGI scripts to use it.

mSQL supports only a relatively small subset of SQL, but what it does support is in strict conformance to ANSI standards. According to its author's Web site, he's working on a major upgrade of mSQL that will support a much larger set of features, but doesn't yet know when this will be released.

Information about mSQL and W3-mSQL, installation instructions, and the downloadable source code can be found at the following URLs:

http://Hughes.com.au/
http://Hughes.com.au/product/msql/
ftp://Bond.edu.au/pub/Minerva/msql/
ftp://Bond.edu.au/pub/Minerva/msql/Contrib/
http://Hughes.com.au/product/w3-msql/

DBI

The long-term future of Perl-database interfaces in many applications, including Web interfaces, may well lie with the new DBI project. The long-term goal of this project is to create a generic Perl interface with versions for every SQL database in the hope of providing greater portability for Perl database applications. It also takes advantage of the Perl 5 object-oriented extensions.

NOTE
The DBI (uppercase) interface defined by this project isn't the same as the dbi (lowercase) interface used in WDB. I used the WDB version in my examples for three reasons:
  • The WDB interface is simpler.
  • mSQL doesn't yet support the new DBI (uppercase) standard.
  • At the time of this writing, the new DBI (uppercase) standard is something of a moving target. DBI's authors have publicly halted all development on the current version of DBI (0.65) and will do a complete rewrite based on ODBC standards.

Information about the evolving DBI standard can be found at this URL:

http://www.hermetica.com/technologia/DBI/index.html

DB Gateway System

This low-end database gateway written in Visual Basic is intended for use with Windows/Windows NT Web servers and either FoxPro or Microsoft Access 95. Unlike most Windows-based Web/database products, it doesn't use ODBC drivers. Instead, it uses the database interface built into Visual Basic, which means it can access only a database that's local to the machine on which the Web server is running. According to its authors, Philip Jones and Larry Perkins, this is because they're targeting users who don't have the money or the computing resources for a client/server database system.

Information about the DB Gateway can be found at

http://fcim1.csdc.com/dbgate.htm

ODBC Tools

In the Windows/Windows NT environment, ODBC has long been one of the most popular ways to access databases from other programs. As a result, most popular Windows databases and most programming languages targeted for the Windows environment have ODBC drivers included or available. If you come from a UNIX-oriented background, you may want to use one of the ODBC drivers for NTPerl that can be found various places on the Net. If you come from a Windows background, you'll probably be happier using Visual Basic or Visual C with any of the numerous ODBC drivers available for those languages.

NOTE
If you use ODBC under Windows or Windows NT for your database CGI programs, you have a choice of query languages: the SQL interface to ODBC, or such ODBC-specific features as dynasets. On one hand, dynasets can improve performance significantly because the SQL parsers in many ODBC drivers are rather slow. And dynasets can be easier to code than SQL for simple lookups. On the other hand, the use of SQL is more portable to non-ODBC environments.

Information about ODBC access from Perl under NT can be found at the following URLs:

http://info.hip.com/ntperl/PerlFaq.htm
ftp://ftp.digex.net/pub/access/psii/

Macintosh Tools

Just as ODBC has become the lingua franca of Windows/NT Web/database integration, AppleEvents have become the most popular mechanism for CGI work on the Macintosh platform. A number of freeware, shareware, and commercial tools link various Macintosh databases to the WebSTAR server (by far the most popular Web server on Macs) via AppleEvents. However, Apple-Events provide standardization only at the level of interprocess communications; the most popular Macintosh databases don't support SQL, which means you must learn the proprietary language of your database. One exception is an excellent Macintosh SQL engine called Butler, from Everywhere, with good interface products (Tango and Bolero) for WebSTAR.

In most cases, the database will be either FileMaker Pro at the low end or 4th Dimension at the high end. FileMaker Pro is very easy to use and sufficiently powerful to handle simpler designs quite well, but it has limited functionality for more advanced work. Claris recently added some relational features to FileMaker Pro and very recently put some tools on its Web site for CGI work with FileMaker Pro.

ACI's 4th Dimension is a very powerful product with a rather steep learning curve. Its programming language has a depth and richness few other database languages can match, and its object-oriented tools for drawing the relational schema of your database and the input/output screen layouts are superb. It takes time to learn, but the results can be well worth it. For simpler applications it's probably overkill. And, from the programmer's point of view, because the language of 4D is nothing like SQL, the transferability of skills is limited between it and other database systems.

Information about Web/database integration on the Macintosh can be found at the following URLs:

http://www.starnine.com/
http://www.fsti.com/
http://macweb.com/webfm/
http://www.everyware.com
http://www.claris.com
http://www.comvista.com/net/www/cgidata.html

Some Useful Hotlists

Many sites on the Web have pointers to various Web/database tools and projects. The following URLs are the most comprehensive listings of Web/database resources I have found in an extensive search for such information:

Problem Solving

Because a Web/database gateway involves multiple programs, possibly running on multiple machines, various things can go wrong. Your application may not work at all-or it may run, albeit far too slowly. For database-specific debugging and performance tuning, you'll obviously need to consult the documentation for your database package. Here I give a few general hints for debugging and tuning a Web/database gateway-hints that should apply to any platform.

NOTE
Bear in mind that no amount of after-the-fact tweaking can fully compensate for careful planning at the outset of your project.

Debugging

Debugging a Web/database gateway isn't simple because problems can occur at multiple levels. In addition to the hints for debugging any CGI script found in Chapter 25, "Testing and Debugging CGI Scripts," you need to examine the SQL generated by your CGI scripts with a hand lens. I suggest the first version of any database CGI script not try to access the database, but instead display the generated SQL code in a document that begins with the header Content-type: text/plain, followed by a blank line. When you think the generated SQL looks correct, pipe it through the command-line interface of your database. After this confirms that you can generate correct SQL, the next step is to write your own client program that sends the SQL to the server and executes it from the command line.

If you can't even get your CGI script to generate any SQL, try sending your form input to a CGI script that does nothing more than list all the information it got from the Web server. Given that list, add hard-coded assignment statements to your code so you can run it from the command line and watch how it behaves when given a simulated CGI environment. Only then should you try to combine the CGI and database interfaces into one program and run that program as a CGI script.

If it doesn't work, look at the error log of your server for any messages your program may have sent to standard error. If that tells you little or nothing, you need to think about the differences between a command-line environment and the CGI environment. Are you assuming something about the PATH environment variable, which has far fewer directories in a CGI context? Do you need other environment variables for database work? For example, most Sybase installations define an environment variable, DSQUERY, that tells Sybase clients which server to use if none is specified. In the CGI environment, this variable isn't defined, so your script will fail unless you specify the server in your script.

Finally, remember that CGI scripts usually run as a lower-privilege user ID than regular programs. If your database server uses OS-level user IDs for authentication, you may have client programs that work just fine from the command line but not as CGI scripts.

The general strategy is to divide and conquer, so you can isolate the problem to a particular part of your system. Start simply and build up to the full functionality you want one step at a time.

Performance Tuning

From the hardware perspective, database and HTTP servers tend to be I/O bound, not CPU intensive. This means the performance of your Web/database application will depend mainly on disk access and process-launching overhead. A fast hard drive and lots of memory are more important than a fast processor. However, a badly designed Web/database application can bring even the most powerful server platform to its knees.

Remember that your application is, in fact, a client/client/server application in which your CGI script connects as a client to the database engine, sending SQL to the engine and accepting data returned from the engine. Considerable overhead is associated with setting up and tearing down the connection between the CGI script and the database server. Even a well-designed CGI application will have to incur this overhead every time a request comes to the HTTP server; a single session from the viewpoint of the remote user involves multiple database logons. This is unavoidable, but a badly designed CGI program can make matters worse if it opens and closes multiple connections to the database engine in a single HTTP request.

Even if you avoid opening an excessive number of connections to the database server, you can still hurt performance by sending too much data back and forth between the CGI script and the database server. It's a nearly universal rule in client/server database programming to do as much work as possible inside the database server to minimize the overhead associated with the transferring of data between database client and database server. In CGI work where you're already incurring some extra overhead because each HTTP request requires a new connection to the database, this principle applies with particular force.

For server efficiency, observe the following rules:

These rules apply even if your database engine and your HTTP server reside on different machines. This is because all data transfer between your CGI program and the database server incurs network overhead as well, as the overhead inherent in any connection to the database server.

Recently, a colleague asked me for help with a CGI interface to an Illustra database that was taking more than 3 minutes to respond to some queries. By applying these principles and taking advantage of Illustra's unusually powerful version of SQL, he could get the response time to under 10 seconds. Most database servers can perform only internal manipulation of fixed-size data types; large text fields can be copied in or out, but must be manipulated outside the database program. Illustra's string-manipulation commands can be applied to any amount of text. We used this feature of Illustra to build very complex SQL queries that would perform extensive string manipulation inside Illustra and return the results. Most Web queries can now be handled by one or two monster SQL statements instead of by many small SQL statements.

The Future of Web/Database Interfaces

In this chapter, I have presented the fundamentals of building a Web gateway to a relational database engine, explaining the unique capabilities this combination makes possible and pointing out some of the limitations inherent in doing this kind of thing over the Web.

As you've seen, current Web technology has some serious limitations for database work-no client-side validation, no facility for extending the user interface toolkit provided by HTML, and no mechanism for the graceful handling of concurrent updates. Equally serious for multimedia database work-something for which the Web by its nature seems so well suited-are the limitations of conventional database technology, which supports only a limited set of data types. All other types of data must be stored as text or binary fields that can't be manipulated within the database itself.

On the other hand, object-oriented databases are extensible, but lack the data integrity and flexible querying features of relational databases. In my view, the new object-relational paradigm exemplified by Illustra has enormous promise because it addresses the limitations of relational and object-oriented databases. Illustra has a full SQL implementation and has facilities for defining and enforcing relational integrity constraints as with any relational database. It also can extend its built-in types in numerous and powerful ways.

In my view, object-oriented database technology-and especially object-relational database technology-may also be an excellent server-side counterpart to such client-side extensions as Java. For database work, client-side scripting promises three major advantages:

In my view, these two new technologies have the potential to revolutionize the Web-after the bugs are worked out and robust implementations are available for every common platform. And I don't say this because of all the media hype about Java-I say this because I'm all too familiar with the limitations of the current Web and database technologies and with the various kludges developers use to circumvent them. There's some real substance behind the media excitement; in the fullness of time we'll all find out whether the implementations of these new ideas live up to their promise.

This work was supported in part by the Human Brain Project (NIMH, NASA, and NIDCD) grant R01 DC02307 and by NIH Grant G08LM05583 from the National Library of Medicine's IAIMS Program. The section on Web/GENERA was written by Kei Cheung, one of the authors of that program.