Chapter 14

The Perl Language

by Bob Breedlove


CONTENTS

The goal of this chapter is to explain the Perl language so that you can use it to create Web applications. I do not attempt in this short space to cover all the capabilities of Perl. Several good books on programming Perl are available, including Teach Yourself Perl in 21 Days and Perl 5 Unleashed, both from Sams Publishing. This chapter assumes that you have at least a basic understanding of programming and programming terminology.

This chapter relies heavily on the Perl manual pages (man pages). The UNIX man facility provides online documentation from specially formatted files and is the standard for UNIX-based documentation. Implementations of Perl for other operating systems might also supply versions of this authoritative documentation. For ease of access, the Perl manual has been split up into several sections. References are made throughout this chapter using the standard naming convention for these pages as shown in the following table.

Man pageDescription
perlPerl overview (this section)
perldataPerl data structures
perlsynPerl syntax
perlopPerl operators and precedence
perlrePerl regular expressions
perlrunPerl execution and options
perlfuncPerl built-in functions
perlvarPerl predefined variables
perlsubPerl subroutines
perlmodPerl modules
perlrefPerl references and nested data structures
perlobjPerl objects
perlbotPerl object-oriented tricks and examples
perldebugPerl debugging
perldiagPerl diagnostic messages
perlformPerl formats
perlipcPerl interprocess communication
perlsecPerl security
perltrapPerl traps for the unwary
perlstylePerl style guide
perlapiPerl application programming interface
perlgutsPerl internal functions for those doing extensions
perlcallPerl calling conventions from C
perlovlPerl overloading semantics
perlembedHow to embed Perl in your C or C++ app
perlpodPerl plain old documentation

One excellent version of the documentation is supplied as Adobe Acrobat files. It is available on the Web at


http://www.perl.com/CPAN/authors/id/BMIDD/perlpdf-5.002.tar.gz

Adobe readers are available free for many operating systems; check http://www.adobe.com for current availability.

The man pages are also available in HTML format as Web pages. They are available on the Internet at


http://www.perl.com/CPAN/doc/manual/html/index.html

or packaged in two formats at


http://www.perl.com/CPAN/doc/manual/html/PerlDoc-beta1g-html.tar.gz

http://www.perl.com/CPAN/authors/id/BMIDD/perlhtml-5.002.tar.gz

If you do not have easy access to this complete reference set with your version of Perl, you should take the time to download a copy.

At this writing, two versions of Perl are in common use. Version 4+ and version 5+ (version numbers for version 5 might vary). I cover aspects of the language common to both and only touch on the more advanced aspects of version 5+ that are directly applicable to Web applications.

After you have completed this and the next chapter, "Perl in Internet Applications," you should have a solid understanding of Perl and be able to create Web applications using this language.

About the Perl Chapters

This and the following chapters form a reference and tutorial for the Perl language in Internet programming. This chapter follows the organization of the Perl manual. I hope that the information here clarifies some aspects of Perl programming that are especially important in Internet programming.

Perl provides a number of equally good ways to accomplish a task. The best way to learn to program in Perl is to program in Perl. Because it is an interpreted language, you can develop and test small portions of a program with relative ease in a small amount of time. In the following chapter, a small application acts as a tutorial in Perl programming. You are encouraged to enter the program code along with the chapter, to try it, and to experiment with alternative programming techniques by either modifying the samples or creating your own code based on the demonstrated techniques.

Perl varies some depending on the platform on which it is implemented and the version of the language that runs on your installation. I'll point out differences where applicable.

Writing Perl Scripts

To reiterate an important point, the best way to learn to write Perl scripts is to simply write Perl scripts. That statement isn't quite as silly as it sounds. Perl is intended to be a language in which you can get things done, and it usually gives you more than one way to accomplish a task. The scripts can be simple, straightforward, and quick-and-dirty or elegant and organized. (This ability to write either quick-and-dirty code or elegant and organized code is especially true if you are using the object-oriented aspects of Perl version 5. These aspects of the language are beyond the scope of this chapter, however.)

In keeping with the tradition of most programming texts (at least for C-like languages), here is the popular Hello World script (program) for Perl:


print "Hello World\n";

Not much of a program, is it? It is fully functional, however. Compare the Perl script with its C counterpart:


void main()

       printf("Hello World\n");

}

You'll notice differences right away. First, because Perl is a scripting language, it starts at the top of the script file and works its way to the bottom. It can take some branches, perform some loops, or execute some functions, but top-to-bottom execution is the basic rule of script programming. Perl has no main() function. Perl starts with the first executable instruction it finds and executes instructions until it executes the entire script. Note that subroutines (functions), methods, packages, and so on are not considered executable instructions for this purpose.

The Perl print statement is also less complex than its C counterpart. Perl actually supports the printf() function, but the print statement prints to the standard output just fine.

Executing Perl Scripts

The techniques used to execute Perl will vary somewhat by operating system. Generally, Perl is supplied as an executable program (file). Refer to your specific operating system manuals for instructions on how to execute programs, and refer to the documentation with Perl for your specific platform for specifics on executing Perl scripts. This section takes the simplest example of command-line operating systems such as UNIX and MS-DOS. In general, the following steps are needed to create and execute the Hello World script:

You should see the Hello, world phrase on your screen followed by a newline.

On most UNIX systems and Windows NT, the Perl interpreter can be associated with a particular file naming pattern to allow a safer execution of Perl for purposes of writing CGI programs. For example, if you had associated Perl with files ending in .pl, the command hello.pl would execute the Hello World script when entered on the command line.

Note that in Web programming the following construct is very dangerous, and you should avoid using it at all costs:


http://{host}/{library}/Perl?hello.pl

If you allow this, any Perl script can be substituted, with possible disastrous effects for your host site. Instead, if you are going to do Web programming in Perl, you should be able to execute your Perl scripts by entering only the name of the script. The exact method you use to accomplish this task depends upon your operating system.

On UNIX platforms, Perl scripts can be executed in the same way that shell scripts are executed-that is, by providing the full location of the interpreter (Perl, in this case) on your system by making the first line of the script a comment in a special format:


#!{interpreter location}

On my installation, the location is /usr/bin/Perl. Thus, you can modify the "Hello World" program to be


#!/usr/bin/Perl

print "Hello World\n";

Then use the chmod command to set the resulting script file to be executable using some variation of the command:


chmod +x hello.pl

You run the executable by simply entering its name at the command prompt (hello.pl<enter>).

On other systems, you might have to register the extension (.pl) with the operating system to run the interpreter when a file with this extension is selected. Note also that some HTTP daemons or installations require scripts with specific extensions. The installation on which my home page is located (http://www.channel1.com/users/rbreed01/), for example, requires that all executable scripts have an extension of .cgi.

Perl Style

Everyone who writes in Perl develops a personal style. Style is important when you want to change something on your script-and you will want to change things-sometimes months after you have implemented the script. Style and comments can help you make improvements in your script at a later date with a minimum of fuss.

Programmers can argue style until the cows come home. Larry Wall has some definite feelings about Perl style. If you're interested, check out the perlstyle man page. The important point is readability and maintainability. You have to be able to figure out what is going on and be able to make changes to your code quickly.

The following list outlines some more substantive style issues you might want to consider. For examples and other issues, see the perlstyle man page.

NOTE
Using "here documents" can actually detract from readability of indentations used for formatting in programs. You might want to include some comments and whitespace to delineate the documen

Perl Data Types

Perl has three data types:

Perl is not strongly typed. In fact, all data in Perl is either a scalar, an array of scalars, or a hash of scalars. You do not have to declare variables as a particular type (integer, character, or Boolean) before you use them. Variables can contain either numeric or alphanumeric data and can vary throughout the execution of the program. The following code is valid in a Perl script:


$a = 'some string';

...

$a = 25;

Because arrays are arrays of scalars, different elements of an array can contain either numeric or alphanumeric data. The following code


@a = (1, 2, 'buckle my shoe,', 3, 4, 'shut the door.');

print join(' ',@a), "\n";

works just fine and results in the following line:


1 2 buckle my shoe, 3 4 shut the door.

A scalar value is interpreted as TRUE in the Boolean sense if it is not the null string or the number 0 (or its string equivalent, 0). The Boolean context is simply a special kind of scalar context.

The two varieties of null scalars are defined and undefined. Undefined null scalars are returned when something doesn't have a real value, such as when an error occurs, at end of file, or when you refer to an uninitialized variable or element of an array. In Perl, variables do not have to be predefined. Therefore, an undefined null scalar may become defined the first time you use it as if it were defined (such as in an assignment statement). However, before that, you can use the defined() operator to determine whether the value is defined.

Normal arrays are indexed by number with the first element indexed at zero. Negative subscripts count from the end. Hash arrays are indexed by string.

Scalar values are always named with $, even when referring to a scalar that is part of an array:


$month # a simple value holding the month of the year



@month = 

('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec');

$month[0] # the first element of array @month, 'Jan'

%month = ('Jan',31,'Feb',28,'Mar',31);

$month{'Feb'} # the 'Feb' value from the associative array %month or 28

Entire arrays or array slices are denoted by @:


@month # The entire array 

@value[3,4,5] # the 4th through 6th elements of the array

@things{'abc','def'} 

# the elements indexed by 'abc' and 'def' from the associative array

Entire hashes are denoted by %:


%days # (key1, val1, key2, val2 ...)

Subroutines are named with an initial &:


if (&getValue() < 10) {

       ...

}

sub getValue {

       ...

       return $value

}

Perl Variable Naming Conventions

Every variable type has its own namespace. You can use the same name for a scalar variable, an array, or a hash. Therefore, $foo and @foo are two different variables, and $foo[1] is a part of @foo, not a part of $foo.

In general, Perl variable names can contain any combination of characters, underscores, and special characters. I personally prefer to use all characters and digits to avoid some of the special cases described in the following paragraphs.

Because variable and array references always start with $, @, or %, the Perl "reserved" words, which define the language constructs, aren't in fact reserved with respect to variable names. They are reserved with respect to language elements, such as labels and filehandles, however, which don't have an initial special character.

Like C, case is significant in Perl-"FOO", "Foo", and "foo" are all different names. Names that start with a letter or underscore can also contain digits and underscores.

You can use an expression that returns a reference to an object of the same type as a variable.

Names that start with a digit can only contain more digits. Names that do not start with a letter, underscore, or digit are limited to one character-for example, $% or $$. (Most one-character names have a predefined significance to Perl. For instance, $$ is the current process ID.)

Scalar Values

Numeric literals are specified in any of the customary floating-point or integer formats:


12345 

12345.67 

.23E-10 

0xffff # hex 

0377 # octal 

4_294_967_296 # underline for legibility

String literals are usually delimited by either single or double quotes. Double-quoted string literals are subject to backslash and variable substitution. Single-quoted strings are not subject to these substitutions except for "" and \\. The usual UNIX backslash rules apply for making characters such as newline and tab, as well as some more exotic forms.

You can also embed newlines directly in your strings; that is, strings can end on a different line from where they begin. This feature is nice, but if you forget your trailing quote, the error is not reported until Perl finds another line containing the quote character, which might be much farther on in the script. Variable substitution inside strings is limited to scalar variables, arrays, and array slices. The following example prints the name in the line:


$name = 'Fred';

print "Hello, $name!\n";

As in some shells, you can put curly brackets around the identifier to delimit it from following alphanumerics. In fact, an identifier within such curlies is forced to be a string, as is any single identifier within a hash subscript.


$days{'Feb'}

can be written as


$days{Feb}

and the quotes are assumed automatically. Anything more complicated in the subscript is interpreted as an expression.

Note that a single-quoted string must be separated from a preceding word by a space because a single quote is a valid character in an identifier.

A word that has no other interpretation in Perl is treated as a quoted string. These words are known as barewords. A bareword that consists entirely of lowercase letters risks conflict with future reserved words. You might want to avoid barewords entirely or always code them in uppercase.

Perl supports a line-oriented form of quoting. Following a command, you specify a string to terminate the quoted material, and all lines following the current line down to the terminating string are the value of the item. The terminating string can be either an identifier (a word) or some quoted text. If quoted text, the type of quotes you use determines the treatment of the text, just as in regular quoting. An unquoted identifier works like double quotes. You cannot leave a space between the << and the identifier. The terminating string must appear by itself-unquoted and with no surrounding whitespace-on the terminating line.

This line-oriented format can be especially helpful in producing HTML pages. The following script prints the template for an HTML page:


print <<EOF;

<HTML>

<HEAD>

<TITLE>...</TITLE>

</HEAD>

<BODY>

...

</BODY>

</HTML>

EOF

       ;

Note that the terminating EOF is on a line by itself and a semicolon (;) is supplied on the next line.

List values are separated by commas and enclosed in parentheses. The following code


@myList = (1,2,3,4,5);

assigns the values 1-5 to the array variable @myList. Arrays assigned to other arrays lose their identity. Given the assignment above,


@myList2 = (@myList,6,7,8,9,10);

is equivalent to


@myList2 = (1,2,3,4,5,6,7,8,9,10);

You cannot identify @myList within @myList2.

The null list is represented by ().

Lists can be assigned to when the elements of the list are valid to be assigned to. This feature can be useful when splitting comma-delimited files.


while(<IN>) {

       chop;

       ($name, $addr, $city, @junk) = split(/,/);

       ...

}

The preceding script reads lines from the filehandle IN and splits them by commas, placing the first three results from the split into $name, $addr, and $city, respectively, and the remainder of the line, if any, into the array @junk. (If you aren't going to use the remainder of the line, you can leave off @junk, and the remainder of the line is not assigned.)

You can actually place an array or hash anywhere in a list, but then all remaining items are assigned to the array; any subsequent items in the list are unassigned.

When assigning values to a hash, you can use the => operator. Using the operator is just a more visible way of showing the assignment. For example, the following assignments are equivalent:


%stuff = ( thing1 => 'abcde',

           thing2 => 'defgh',

           thing3 => 'ijklm');



%stuff = ( thing1, 'abcde',

           thing2, 'defgh',

           thing3, 'ijklm');

One thing to note about hashes: The order in which a hash is initialized is not necessarily the order in which the elements are retrieved from the hash (see the description of the SORT statement).

Assigning an array to a scalar variable returns the number of items in the array. If the number returned is zero, then the array is empty.

Perl uses an internal type called a typeglob to hold an entire symbol table entry. The type prefix of a typeglob is a * because it represents all types. Typeglobs used to be the preferred way to pass arrays and hashes by reference into a function, but now that there are real references, you rarely use this technique.

One place to still use typeglobs is for passing or storing filehandles. To save a filehandle, do this:


$fh = *STDOUT;

Use the same method to create a local filehandle.

Predefined Variables

Predefined variables are names that have special meaning to Perl. Most of the punctuation names (such as $$ and $#) have reasonable mnemonics, or analogues, in one of the shells. Nevertheless, if you wish to use the long variable names, you just need to say


use English;

at the top of your program. This statement will alias all the short names to the long names in the current package (that is, the module) so that you can use the longer English names instead of the more cryptic special character names. Some of them even have medium names, generally borrowed from the UNIX pattern matching language awk.

A few of these variables are considered read-only. If you try to assign to a read-only variable, either directly or indirectly through a reference, you raise a runtime exception.

The following table describes the Perl predefined variables.

VariableDescription
$ARGThe default input and pattern-searching space.
$Here are the places where Perl assumes $ even if you don't use it:
  • Various unary functions, including functions like ord() and int(), as well as all the filetests (-f, -d) except for -t, which defaults to STDIN.
  • Various list functions like print() and unlink().
  • The pattern matching operations m//, s///, and tr/// when used without an =~ operator.
  • The default iterator variable in a foreach loop if no other variable is supplied.
  • The implicit iterator variable in the grep() and map() functions.
  • The default place to put an input record when a C<<FH>> operation's result is tested by itself as the sole criterion of a while test. Note that outside a while test, this condition does not occur.
$<digit>Contains the subpattern from the corresponding set of parentheses in the last pattern matched, not counting patterns matched in nested blocks that have been exited already. [read-only]
$MATCH $&The string matched by the last successful pattern match excluding any matches hidden within a BLOCK or eval() enclosed by the current BLOCK. [read-only]
$PREMATCH $'The string preceding whatever was matched by the last successful pattern match excluding any matches hidden within a BLOCK or eval enclosed by the current BLOCK. [read-only]
$POSTMATCH $'The string following whatever was matched by the last successful pattern match excluding matches hidden within a BLOCK or eval() enclosed by the current BLOCK. [read-only]
$LAST_PAREN_MATCH $+The last bracket matched by the last search pattern. This variable is useful if you don't know which of a set of alternative patterns matched. [read-only]
$MULTILINE_MATCHING $*Set to 1 to do multiline matching within a string, or 0 to tell Perl that it can assume that strings contain a single line for the purpose of optimizing pattern matches. Pattern matches on strings containing multiple newlines can produce confusing results when $* is 0. Default is 0. Note that this variable influences only the interpretation of ^ and $. You can search for a literal newline even when $* is 0.
Input_line_number HANDLE EXPR $INPUT_LINE_NUMBER $NR $. The current input line number of the last filehandle that was read. An explicit close on the filehandle resets the line number. Line numbers increase across ARGV files.
Input_record_separator HANDLE EXPR The input record separator.
$INPUT_RECORD_SEPARATORSet to newline by default.
$RS $/Treats blank lines as delimiters if set to the null string. You can set the separator to a multicharacter string to match a multicharacter delimiter. Note that setting the separator to \n\n means something slightly different than setting it to "" if the file contains consecutive blank lines. Setting it to "" treats two or more consecutive blank lines as a single blank line. Setting it to \n\n blindly assumes that the next input character belongs to the next paragraph, even if it's a newline.
Autoflush HANDLE EXPR $OUTPUT_AUTOFLUSH $| A setting of nonzero forces a flush afterevery write or print on the currently selected output channel. The default is 0.
Output_field_separator HANDLE EXPR $OUTPUT_FIELD_SEPARATOR The output field separator for the print operator.
$OFS $,Ordinarily, the print operator simply prints the comma-separated fields you specify.
Output_record_separator HANDLE EXPR $OUTPUT_RECORD_SEPARATOR The output record separator for the print operator.
$ORS $\Ordinarily, the print operator simply prints the comma-separated fields you specify with no trailing newline or record separator assumed.
$LIST_SEPARATOR $"This separator is like except that it applies to array values interpolated into a double-quoted string or similar inter-preted string. Default is a space.
$SUBSCRIPT_SEPARATOR $SUBSEP $; The subscript separator for multidimen-sional array emulation. Default is. \034Note that if the keys contain binary data, might not have a safe value.
$OFMT $#The output format for printed numbers. The initial value is %.20g.
format_page_number HANDLE EXPR $FORMAT_PAGE_NUMBER $% The current page number of the currently selected output channel.
Format_lines_per_page HANDLE EXPR $FORMAT_LINES_PER_PAGE $= The current page length of thecurrently selected output channel. Default is 60.
Format_lines_left HANDLE EXPR $FORMAT_LINES_LEFT The number of lines left on the page of the currently $- selected output channel.
Format_name HANDLE EXPR $FORMAT_NAME The name of the current report format for the currently selected output channel. Default is the name of the filehandle.
Format_top_name HANDLE EXPR $FORMAT_TOP_NAME $^ The name of the current top-of-page format for the currently selected output channel. Default is the name of the filehandle with _TOP appended.
Format_line_break_characters HANDLE EXPR $FORMAT_LINE_BREAK_CHARACTERS $: The current set of characters after which a string can be broken to fill continuation fields (starting with ^) in a format. Default is \n- to break on whitespace or hyphens.
Format_formfeed HANDLE EXPR$FORMAT_FORMFEED $^L What the program outputs to perform a form feed. Default is \f.
$ACCUMULATOR $^AThe current value of the write()accumulator for format() lines. A format contains formline() commands that put their result into $^A. After calling its format, write() prints the contents of $^A and empties. You never actually see the contents of $^A unless you call formline()yourself and then look at it.
$CHILD_ERROR $?The status returned by the last pipe close, backtick (``) command, or system() operator. Note that this status word is returned by the wait() system call, so the exit value of the subprocess is actually ($? > 8>). Thus, on many systems, $? & 255 specifies which signal, if any, the process died from and whether a core dump occurred.
$OS_ERROR $ERRNO $!If used in a numeric context, $! yields the current value of errno with all the usual caveats. (That is, you shouldn't depend on the value of $! to be anything in particular unless you've gotten a specific error return indicating a system error.) If used in a string context, $! yields the corresponding system error string. You can assign a value to $! in order to set errno if, for example, you want $! to return the string for error n or you want to set the exit value for the die() operator.
$EVAL_ERROR $@The Perl syntax error message from the last eval() command. If null, the last eval() parsed and executed correctly (although the operations you invoked might have failed in the normal fashion).Note that warning messages are not collected in this variable. You can, however, set up a routine to process warnings by setting $SIG{___WARN___} below.
$PROCESS_ID $PID $$The process number of the Perl running this script.
$REAL_USER_ID $UID $<The real user ID (uid) of this process.
$EFFECTIVE_USER_ID $EUID $> The effective user ID of this process.
$REAL_GROUP_ID $GID $(The real group ID (gid) of this process.If you are on a machine that supports membership in multiple groups simulta- neously, the variable gives a space-separated list of groups you are in. The first number is the one returned by getgid(), and the subsequent numbers are returned by getgroups(), one of which may be the same as the first number.
$EFFECTIVE_GROUP_ID $EGID $)The effective gid of this process. If you are on a machine that supports member-ship in multiple groups simultaneously, it gives a space-separated list of groups you are in. The first number is the one returned by getegid(), and the subse-quent numbers are returned by getgroups(), one of which may be the same as the first number.
$PROGRAM_NAME $0Contains the name of the file containing the Perl script being executed. Assigning to $0 modifies the argument area that the ps(1) program sees.
$[The index of the first element in an array and of the first character in a substring. The default is 0.
$Perl_VERSION $]The string prints the version number of this Perl installation (equivalent to the command line perl -v).
$DEBUGGING $^DThe current value of the debugging flags.
$SYSTEM_FD_MAX $^FThe maximum system file descriptor, ordinarily 2. System file descriptors are passed to exec()ed processes, whereas higher file descrip-tors are not. Also, during an open(), system file descriptors are preserved even if the open() fails. (Ordinary file descriptors are closed before the open() is attempted.) Note that the
close-on-exec $^Fstatus of a file descriptor is decided according to the value of at the time of the open, not at the time of the exec.
$INPLACE_EDIT $^IThe current value of the inplace-edit extension. Use undef to disable inplace editing.
$PERLDB $^PThe internal flag that the debugger clears so that it doesn't debug itself.You could conceivably disable debug-ging yourself by clearing it.
$BASETIME $^TThe time at which the script began running in seconds since the epoch (beginning of 1970). The values returned by the -M, -A, and -C filetests are based on this value.
$WARNING $^WThe current value of the warning switch,either TRUE or FALSE.
$EXECUTABLE_NAME $^XThe name that the Perl binary itself wasexecuted as, from C's argv[0].
$ARGVContains the name of the current file when reading from <>.
@ARGVThe array @ARGV contains the command-line arguments. Note that $#ARGV is the number of arguments minus one because $ARGV[0] is the first argument, not thecommand name. See $0 for the command name.
@INCThe array @INC contains the list of places to look for Perl scripts to be evaluated by the do EXPR, require, or use constructs. @INC initially consists of the arguments to any -I command-line switches, followed by the default Perl library, followed by ., to represent the current directory.
%INCThe hash %INC contains entries for each filename that has been included via do or require. The key is the filename you specified, and the value is the location of the file actually found. The require command uses this array to determine whether a given file has already been included.
$ENV{expr}The hash %ENV contains your current environment. Setting a value in ENV changes the environment for child processes.
$SIG{expr}The hash %SIG is used to set signal handlers for various signals.

Perl Syntax

Perl is generally a free-form language. The only elements that you need to declare are report formats and subroutines. To create a variable or other object, simply use it. All uninitialized user-created objects are assumed to start with a null or 0 value until they are defined by some explicit operation such as assignment.

The sequence of statements is executed just once. The interpreter first "compiles" the script, checking for syntax errors. With the exception of subroutines (functions), Perl executes statements from the first line of the script to the last. Of course, like any programming language, Perl supports looping and branching statements, which affect the flow of the program but generally continue execution at the next sequential statement after the statements complete their operation.

Comments-Documenting the Script

Comments are indicated by the # character and extend to the end of the line. Here are examples of comments:


# This is a comment which extends over the entire line

# Comments do not span lines, the "#" character must be used

# at the start of the comment on every line.

$a = 1; # Comments can be added to lines

$b = 2; # to explain usage of a particular instruction.

Declarations

Declarations all take effect at compile time when the script is first executed. You can put a declaration anywhere you can put a statement, but a declaration has no effect on the execution of the primary sequence of statements.

Declaring a subroutine allows a subroutine name to be used as if it were a list operator from that point forward in the program.

Simple Statements

A simple statement is an expression that is evaluated and executed by the interpreter. You must terminate every simple statement with a semicolon unless it is the final statement in a block.

Optionally, you can place a SINGLE modifier after any simple statement; the modifier goes just before the terminating semicolon (or block ending). Here are the four valid modifiers:


if {expression} 

unless {expression} 

while {expression} 

until {expression}

The if and unless modifiers work as you might expect. The statement is executed if or unless the {expression} is true.


$a = 2 if $b = $c;

The variable $a is initialized to 2 if the variable $b is equal to the variable $c. An equivalent statement in a more traditional format is


if $b = $c {

       $a = 2;

}

The unless operator executes the statement only if the {expression} is not true. For example,


$a = 2 unless $b = $c;

sets $a equal to 2 only if $b is not equal to $c. The equivalent statement is


if $b != $c {

       $a = 2;

}

The while and until modifiers first evaluate the conditional except when they follow a do block.


$c = 0;

$c += 2 until $a + $c > 10;

adds 2 to $c until $a plus $c equals 10. If $a is 0 before this statement, it might execute several times. If $a is equal to or greater than 10 when the statement is first evaluated, it would never execute.

In the case of a do{} block, the block executes once before the conditional is evaluated. Therefore, you can write loops such as


do { $in = <STDIN>; ... } while $in ne ".\n";

The statements within the do{} block read and process at least one statement from standard input. If that statement is a period followed by an end-of-line character, the loop terminates.

Compound Statements

A series of statements that defines a scope is called a block (referenced as BLOCK throughout this chapter). Generally, a block is delimited by braces ({}).

You can use the following compound statements to control flow:


if (expression) {...} 

if (expression) {...} else {...} 

if (expression) {...} elsif (expression) {...} else {...} 

label while (expression) {...}

label while (expression) {...} continue {...} 

label for (expression; expression; expression) {...} 

label foreach variable (list) {...} 

label {...} continue {...}

Note that, unlike C and Pascal, which execute only the next statement after the conditional unless braces or begin/end pairs are used, Perl compound statements are defined in terms of BLOCKs, not statements. Consequently, the braces are required.

The if statement in Perl works the same as it does in all other languages. If you use unless in place of if, the sense of the test is reversed.

The while statement executes the block as long as the expression is true (not the null string or 0 or 0). The label is optional. If a label is included, it consists of an identifier followed by a colon. The label identifies the loop for the loop control statements next, last, and redo. If the label is omitted, the loop control statement refers to the innermost enclosing loop.

A continue block is always executed immediately before the conditional is about to be evaluated again, just like the third part of a for loop in C. Therefore, you can use a continue block to increment a loop variable, even when the loop has been continued via the next statement.

Loop Control

The following statements control looping in Perl scripts. They are generally equivalent to their C language counterparts.

next

Starts the next iteration of the loop. In the following example, the code in the while loop will not be executed if the line begins with a #, indicating a comment in a Perl script. (This is a convenient construct to strip comments from Perl code.)


LINE: while (<IN>) { 

next LINE if /^#/; # strip comments 

... # additional code

} #end of while loop

last

Immediately exits the loop. The continue block, if any, is not executed.


GET1: while (<IN>) { 

last GET1 if /^EOF/; # exit when a line starting with EOF is encountered;

 ... 

}

redo

Restarts the loop block without evaluating the conditional again. The continue block, if any, is not executed.

For example, when processing a file, input lines might end in a continuation character, such as the plus sign. You can use redo to skip ahead and get the next record.


while (<>) { 

       chop; #remove training linefeed

       if (s/\+$//) {  # record ends in a plus sign indicating a continued line

              $_ .= <>; #append the next line to $_

              redo unless eof(); # go back and check again

       } # end of if statement

... # process the record after gathering continuation lines

} # end of while statement

For Loops

Perl's for loops are exactly like their C equivalents.


for ($i = 1; $i < 100; $i++) { ... }

You could write the same thing using a while loop:


$i =1; 

while ($i < 100) { 

       ... 

} continue {

       $I++;

}

Foreach Loops

The foreach loop iterates over a normal list value and sets the variable to be each element of the list in turn. The variable is implicitly local to the loop and regains its former value upon exiting the loop.

The foreach keyword is actually a synonym for the for keyword, so you can use foreach for readability or for for brevity. If the variable is omitted, $_ is set to each value. If LIST is an actual array (as opposed to an expression returning a list value), you can modify each element of the array by modifying the variable inside the loop.

Here are some examples. The first reads the array @things, returning each value into $_ and changing the name smith to jones.


foreach (@things) { 

       s/smith/jones/ 

} 

The next example reads each element of @numbs into $num and then multiplies it by 2.


foreach $numb (@numbs) { 

$numb *= 2; 

}

The last example prints the keys and values in the associative array %items.


foreach $item (key %items) {

       print "$item = $items{$item}\n"

}

Blocks

A labeled or unlabeled block is equivalent to a loop that executes once. Therefore, you can use any of the loop control statements to leave or restart the block. The continue block is optional.

Unlike C, Perl does not have a switch statement. Perl does have several alternative ways to write equivalent statements. The block construct is particularly nice for doing case structures. Here's an example using a block. Note that SWITCH: is not a statement, but merely a label. Any other label would work as well. This statement tests the first character of $_ and performs the logic related to it.


SWITCH: { 

       if (/^a/) { 

              $value = 1; 

              last SWITCH; 

       } 

       if (/^b/) { 

              $value = 2; 

              last SWITCH; 

       } 

       if (/^c/) { 

              $value = 3; 

              last SWITCH; 

       } 

       $value = 0; 

}

Goto

Perl does support three forms of the goto statement: goto-LABEL, goto-EXPR, and goto-&NAME. A loop's LABEL is not a valid target for a goto; it's just the name of the loop. However, these statements could be considered bad programming form, so I advise against using them unless absolutely necessary and will not spend time on them here. These functions are described in the "Alphabetical Listing of Perl Functions."

Perl Operators

The following list outlines Perl operator associativity and precedence, from highest precedence to lowest. With very few exceptions, Perl operators operate on scalar values only, not on array values.

Left Terms and List Operators (Leftward)
left ->
nonassoc ++ --
right **
right ! ~ \ and unary + and -left
=~
!~
left * / % x
left + - .
left << >>

Nonassoc Named Unary Operators
nonassoc < > <= >= lt gt le ge
nonassoc == != <=> eq ne cmp
left &
left | ^
left &&
left ||
nonassoc ..
right ?:
right = += -= *= and so on
left , =>

Nonassoc List Operators (Rightward)
left not
left and
left or xor

The following sections present the operators in precedence order.

Terms and List Operators (Leftward)

Any term (shown throughout this chapter as "TERM") is of the highest precedence of Perl. Terms include variables, quote and quote-like operators, expressions in parentheses, and functions whose arguments are parenthesized.

If any list operator (print(), for example) or any unary operator (chdir(), for example) is followed by a left parenthesis as the next token, the operator and arguments within parentheses are taken to be of highest precedence, just like a normal function call.

In the absence of parentheses, the precedence of list operators such as print, sort, or chmod is either very high or very low depending on whether you look at the left side of the operator or the right side of it. In the following example, the elements (commas) on the right of the sort are evaluated before the sort, but the commas on the left are evaluated after.


@ary = (1, 3, sort 4, 2); 

print @ary; # prints 1324

List operators tend to "gobble up" all the arguments that follow them and then act like a simple TERM with regard to the preceding expression. Note that you have to be careful with parentheses. This is illustrated in the following examples.


# These evaluate exit before doing the print and, thus never print: 

print($foo, exit); # Obviously not what you want. 

print $foo, exit; # Nor is this. 



# These do the print before evaluating exit: 

(print $foo), exit; # This is what you want. 

print($foo), exit; # Or this. 

print ($foo), exit; # Or even this.

Also note that


print ($foo & 255) + 1, "\n";

probably doesn't do what you expect at first glance. A complete discussion of parentheses is beyond the scope of this chapter. See "Named Unary Operators" in the perlop man page for a more complete discussion of parentheses.

Also parsed as terms are the do{} and eval{} constructs, as well as subroutine and method calls and the anonymous constructors [] and {}.

The Arrow Operator

Just as in C and C++, -> is an infix dereference operator. If the right side is either a [...] or {...} subscript, then the left side must be either a hard or symbolic reference to an array or hash (or a location capable of holding a hard reference, if it's an lvalue). See the perlref man page for a more complete explanation of its use.

Otherwise, the right side is a method name or a simple scalar variable containing the method name, and the left side must either be an object or a class name. See the perlobj man page for a more complete discussion.

Autoincrement and Autodecrement

++ and -- placed before a variable increment or decrement the variable before returning the value. Placed after, they increment or decrement the variable after returning the value.

The autoincrement operator has an extra functionality built into it. If you increment a variable that is numeric, or that has ever been used in a numeric context, you get a normal increment. If, however, the variable has been used only in string contexts since it was set and has a value that has any number of alpha characters followed by any number of numeric characters (/^[a-zA-Z]*[0-9]*$/), the increment is done as a string, preserving each character within its range with carry. Here are some examples:


print ++($foo = '99'); # prints '100' 

print ++($foo = 'a0'); # prints 'a1' 

print ++($foo = 'Az'); # prints 'Ba' 

print ++($foo = 'zz'); # prints 'aaa'

The autodecrement operator does not perform this little trick in reverse.

Exponentiation

** is the exponentiation operator. Note that it binds even more tightly than unary minus, so -2**4 is -(2**4), not (-2)**4.

Symbolic Unary Operators

These are operators represented by single character symbols. Many are equivalent to their counterparts in languages like C, COBOL, or Pascal. Others have unique meaning or usage in Perl.

!
Logical negation, that is, "not."
-
Arithmetic negation if the operand is numeric. If the operand is an identifier, a string consisting of a minus sign concatenated with the identifier is returned. Otherwise, if the string starts with a plus or minus, a string starting with the opposite sign is returned.
~
Bitwise negation (1's complement).
+
Has no effect whatsoever, even on strings.
\
Creates a reference to whatever follows it. See the perlref man page. Do not confuse this behavior with the behavior of a backslash within a string, although both forms do convey the notion of protecting the next thing from interpretation.

Binding Operators

Binding operators bind an expression to a pattern match.

=~
Binds a scalar expression to a pattern match. Certain operations search or modify the string $_ by default. This operator makes that kind of operation work on some other string. The right argument is a search pattern, substitution, or translation. The left argument is what is supposed to be searched, substituted, or translated instead of the default $_. The return value indicates the success of the operation.
!~
Performs just like =~ except the return value is logically negated.

Perl Built-In Functions

Perl supports a rich set of built-in functions. These functions can be used as terms in an expression. The two categories of functions are

The difference between the categories is their precedence relationship with a following comma. List operators take more than one argument, whereas unary operators can never take more than one argument.

In the syntax descriptions in Table 14.1, list operators that expect a list are shown with LIST as an argument. Such a list can consist of any combination of scalar arguments or list values; the list values are included in the list as if each individual element were entered at that point in the list, forming a longer single-dimensional list value. Elements of the LIST should be separated by commas.

You can use any function in Table 14.1 with or without parentheses around its arguments.

Perl Functions By Category

Table 14.1 shows the Perl functions by category. Some functions appear in more than one place. Not all of these functions are covered in detail in this chapter. I skip the functions that have no value in most CGI programs or that are more complex. Refer to the perlfunc man page or the Sams books mentioned earlier in the chapter for information about these functions.

Table 14.1. Perl functions by category.

CategoryPerl function
Functions for scalars or stringschomp, chop, chr, crypt, hex, index, lc, lcfirst, length, oct, ord, pack, q/STRING/, qq/STRING/, reverse, rindex, sprintf, substr, tr///, uc, ucfirst, y///
Regular expressions and pattern matching m//, pos, quotemeta, s///, split, study
Numeric functionsabs, atan2, cos, exp, hex, int, log, oct, rand, sin, sqrt, srand
Real @ARRAY functionspop, push, shift, splice, unshift
List data functionsgrep, join, map, qw/STRING/, reverse, sort, unpack
Real %HASH functionsdelete, each, exists, keys, values
Input and output functionsbinmode, close, closedir, dbmclose, dbmopen, die, eof, fileno, flock, format, getc, print, printf, read, readdir, rewinddir, seek, seekdir, select, syscall, sysread, syswrite, tell, telldir, truncate, warn, write
Functions for fixed length data or records pack, read, syscall, sysread, syswrite, unpack, vec
Functions for filehandles, files, or directories -X, chdir, chmod, chown, chroot, fcntl, glob, ioctl, link, lstat, mkdir, open, opendir, readlink, rename, rmdir, stat, symlink, umask, unlink, utime
Keywords related to the control of program flow caller, continue, die, do, dump, eval, exit, goto, last, next, redo, return, sub, wantarray
Scoping keywordscaller, import, local, my, package, use
Miscellaneous functionsdefined, dump, eval, formline, local, my, reset, scalar, undef, wantarray
Functions for processes and process groups alarm, exec, fork, getpgrp, getppid, getpriority, kill, pipe, qx/STRING/, setpgrp, setpriority, sleep, system, times, wait,waitpid
Keywords related to Perl modulesdo, import, no, package, require, use
Keywords related to classes and object orientation bless, dbmclose, dbmopen, package, ref, tie, tied, untie, use
Low-level socket functionsaccept, bind, connect, getpeername, getsockname, getsockopt, listen, recv, send, setsockopt, shut-down, socket, socketpair
System V interprocess communication functions msgctl, msgget, msgrcv, msgsnd, semctl, semget, semop, shmctl, shmget, shmread, shmwrite
Fetching user and group informationendgrent, endhostent, endnetent, endpwent, getgrent, getgrgid, getgrnam, getlogin, getpwent, getpwnam, getpwuid, setgrent, setpwent
Fetching network informationendprotoent, endservent, gethostbyaddr, gethostbyname, gethostent, getnetbyaddr, getnetbyname, getnetent, getprotobyname, getprotobynumber, getprotoent, getservbyname, getservbyport, getservent, sethostent, setnetent, setprotoent, setservent
Time functionsgmtime, localtime, time, times

Alphabetical Listing of Perl Functions

This section presents the basic Perl functions in alphabetical order as a reference. Not all functions in the language are included in detail here.

-X [[FILEHANDLE|EXPR]]

A file test, where X is one of the letters in the list that follows. This unary operator takes one argument, either a filename or a filehandle, and tests the associated file to see if something is true about it. If the argument is omitted, the expression tests $_, except for -t, which tests STDIN. Unless otherwise documented, this test returns 1 for TRUE, "" for FALSE, or the undefined value if the file doesn't exist. Precedence is the same as any other named unary operator, and the argument may be parenthesized like any other unary operator. The operator may be any of the following:

-r
File is readable by effective uid/gid.
-w
File is writable by effective uid/gid.
-x
File is executable by effective uid/gid.
-o
File is owned by effective uid.
-R
File is readable by real uid/gid.
-W
File is writable by real uid/gid.
-X
File is executable by real uid/gid.
-O
File is owned by real uid.
-e
File exists.
-z
File has zero size.
-s
File has nonzero size (returns size).
-f
File is a plain file.
-d
File is a directory.
-l
File is a symbolic link.
-p
File is a named pipe (FIFO).
-S
File is a socket.
-b
File is a block special file.
-c
File is a character special file.
-t
Filehandle is opened to a tty.
-u
File has setuid bit set.
-g
File has setgid bit set.
-k
File has sticky bit set.
-T
File is a text file.
-B
File is a binary file (opposite of -T).
-M
Age of file in days when script started.
-A
Same for access time.
-C
Same for inode change time.

Note that not all of the preceding operators have meaning in all operating systems. See the Perl man pages for details of using these switches.

abs VALUE

Returns the absolute value of its argument.

accept NEWSOCKET,GENERICSOCKET

Accepts an incoming socket connection, just as the UNIX accept(2) system call does. Returns the packed address if it succeeded, FALSE otherwise.

atan2 Y,X

Returns the arctangent of Y/X in the range -PI to PI.

bind SOCKET,NAME

Binds a network address to a socket, just as the bind system call does. Returns TRUE if it succeeded, FALSE otherwise. NAME should be a packed address of the appropriate type for the socket.

binmode FILEHANDLE

The file identified by FILEHANDLE is read or written in binary mode in operating systems that distinguish between binary and text files. Files that are not in binary mode have CR LF sequences translated to LF on input and LF translated to CR LF on output.

caller [EXPR]

caller returns the context of the current subroutine call. In a scalar context, it returns TRUE if a caller exists, that is, in a subroutine or eval() or require(); otherwise, it returns FALSE. In a list context, returns


($package, $filename, $line) = caller;

With EXPR, caller returns some extra information that the debugger uses to print a stack trace. The value of EXPR indicates how many call frames to go back before the current one.


($package, $filename, $line, $subroutine, $hasargs, $wantargs) = caller($i);

chdir [EXPR]

Changes the working directory to EXPR, if possible. If EXPR is omitted, it changes the working directory to the home directory. Returns TRUE upon success; otherwise, returns FALSE.

chmod LIST

Changes the permissions of a list of files. The first element of the list must be the numerical mode, which should probably be an octal number. Returns the number of files successfully changed.


$cnt = chmod 0755, 'foo', 'bar'; 

chmod 0755, @myfiles;

chomp [VARIABLE|LIST]

chomp is a slightly safer version of chop (see next entry). chomp removes any line ending that corresponds to the current value of $/ and returns the number of characters removed. It's often used to remove the newline from the end of an input record when you're worried that the final record may be missing its newline. When in paragraph mode ($/ = ""), chomp removes all trailing newlines from the string. If VARIABLE is omitted, it chomps $_.


while (<>) { 

       chomp; # avoid \n on last field 

       @array = split(/:/); 

       ... 

}

You can actually chomp anything that's an lvalue, including an assignment:


chomp($cwd = 'pwd'); 

chomp($answer = <STDIN>);

If you chomp a list, each element is chomped and the total number of characters removed is returned.

chop [VARIABLE|LIST]

Chops off the last character of a string and returns the character chopped. The primary use of chop is to remove the newline from the end of an input record. It neither scans nor copies the string. If VARIABLE is omitted, it chops $_.

If you chop a list, each element is chopped. Only the value of the last chop is returned.

Note that chop returns the last character. To return all but the last character, use


substr($ string, 0, -1)

chown LIST

Changes the owner (and group) of a list of files. The first two elements of the list must be the numerical uid and gid, in that order. Returns the number of files successfully changed.

chr NUMBER

Returns the character represented by that NUMBER in the character set. For example, chr(65) is A in ASCII.

close FILEHANDLE

Closes the file or pipe associated with the filehandle, returning TRUE only if stdio successfully flushes buffers and closes the system file descriptor.

FILEHANDLE may be an expression whose value gives the real filehandle name.

closedir DIRHANDLE

Closes a directory opened by opendir().

connect SOCKET,NAME

Attempts to connect to a remote socket, just as the connect system call does. Returns TRUE if successful, FALSE otherwise. NAME should be a packed address of the appropriate type for the socket.

continue BLOCK

Actually a flow control statement rather than a function. If a continue BLOCK is attached to a BLOCK (typically in a while or foreach), the continue statement is always executed just before the conditional is about to be evaluated again, just like the third part of a for loop in C.

cos EXPR

Returns the cosine of EXPR (expressed in radians). If EXPR is omitted, the function takes the cosine of $_.

crypt PLAINTEXT,SALT

Encrypts a string exactly like the crypt(3) function in the C library.

defined EXPR

Returns a Boolean value saying whether EXPR has a real value or not. Many operations return the undefined value under exceptional conditions. This function allows you to distinguish between an undefined null scalar and a defined null scalar with operations that might return a real null string, such as referencing elements of an array.

See also undef.

delete EXPR

Deletes the specified value from its hash array. Returns the deleted value or the undefined value if nothing was deleted. Deleting from $ENV{} modifies the environment. Deleting from an array tied to a DBM file deletes the entry from the DBM file.

The following deletes all the values of an associative array:


foreach $key (keys %ARRAY) { 

       delete $ARRAY{$key}; 

}

die LIST

Outside of an eval(), prints the value of LIST to STDERR and exits with the current value of $! (errno). If $! is 0, exits with the value of ($? > 8)> (backtick 'command' status). If ($? > 8)> is 0, exits with 255. Inside an eval(), the error message is stuffed into $@, and the eval() is terminated with the undefined value; this functionality makes die() the way to raise an exception in a script.

do BLOCK

Not really a function. Returns the value of the last command in the sequence of commands indicated by BLOCK. When modified by a loop modifier, executes the BLOCK once before testing the loop condition.

do SUBROUTINE(LIST)

A deprecated form of subroutine call. See the perlsub man page for more information on subroutines.

do EXPR

Uses the value of EXPR as a filename and executes the contents of the file as a Perl script. Its primary use is to include subroutines from a Perl subroutine library.


do 'stat.pl';

is just like


eval 'cat stat.pl';

except that it's more efficient, more concise, keeps track of the current filename for error messages, and searches all the -I libraries if the file isn't in the current directory. Both statements parse the file every time they are called.

A better way to include library modules is to use the use() and require() operators, which also do error checking and raise an exception if a problem occurs.

dump LABEL

This function causes an immediate core dump.

each ASSOC_ARRAY

Returns a two-element array consisting of the key and value for the next value of an associative array so that you can iterate over it. Entries are returned in an apparently random order. When the array is entirely read, a null array is returned. The following call to each() starts iterating again. The iterator can be reset only by reading all the elements from the array. You should not add elements to an array while you're iterating over it. Each associative array has a single iterator that all each(), keys(), and values() function calls in the program share.

eof [FILEHANDLE|()]

Returns 1 if the next read on FILEHANDLE returns end of file or if FILEHANDLE is not open. FILEHANDLE may be an expression whose value gives the real filehandle name.

An eof without an argument uses the last file read as an argument. Empty parentheses may be used to indicate the pseudofile formed of the files listed on the command line. Use eof(ARGV) or eof without the parentheses to test each file in a while (<>) loop.

eval [EXPR|BLOCK]

EXPR is parsed and executed as if it were a little Perl program. It is executed in the context of the current Perl program so that any variable settings, subroutines, or format definitions remain afterwards. The value returned is the value of the last expression evaluated; alternatively, a return statement may be used, just as with subroutines.

Note
Eval can be very dangerous in CGI programming. Do not automatically eval anything sent to you by a Web browser

If a syntax error or runtime error occurs or a die() statement is executed, eval() returns an undefined value and $@ is set to the error message. If no error occurs, $@ is guaranteed to be a null string. If EXPR is omitted, eval evaluates $_. You may omit the final semicolon, if any, from the expression.

Note that because eval() traps otherwise fatal errors, it is useful for determining whether a particular feature (such as socket() or symlink()) is implemented. It is also Perl's exception-trapping mechanism, when the die operator is used to raise exceptions.

exec LIST

The exec() function executes a system command and never returns. Use the system() function if you want it to return.

exists EXPR

Returns TRUE if the specified hash key exists in its hash array even if the corresponding value is undefined.


print "Exists\n" if exists $array{$key}; 

print "Defined\n" if defined $array{$key}; 

print "True\n" if $array{$key};

A hash element can only be TRUE if it's defined, and it can be defined if it exists, but the reverse doesn't necessarily hold true.

exit [EXPR]

Evaluates EXPR and exits immediately with that value. See also die(). If EXPR is omitted, exits with 0 status.

exp [EXPR]

Returns e (the natural logarithm base) to the power of EXPR. If EXPR is omitted, gives exp($_).

fcntl FILEHANDLE,FUNCTION,SCALAR

Implements the fcntl(2) function.

fileno FILEHANDLE

Returns the file descriptor for a filehandle. This function is useful for constructing bitmaps for select(). If FILEHANDLE is an expression, the value is taken as the name of the filehandle.

flock FILEHANDLE,OPERATION

Calls flock(2) on FILEHANDLE. See the flock(2) man page for definition of OPERATION. Returns TRUE for success, FALSE for failure. This function produces a fatal error if it is used on a machine that doesn't implement either flock(2) or fcntl(2).

fork

Does a fork(2) system call. Returns the child process ID (PID) to the parent process and 0 to the child process, or returns undef if the fork is unsuccessful.

Note
Unflushed buffers remain unflushed in both processes, which means you may need to set $| ($AUTOFLUSH in English) or call the autoflush() filehandle method to avoid duplicate output

getc FILEHANDLE

getc returns the next character from the input file attached to FILEHANDLE or a null string at end of file. If FILEHANDLE is omitted, reads from STDIN. This is not particularly efficient. It cannot be used to get unbuffered single characters, however.

getlogin

Returns the current login from /etc/utmp, if any. If null, use getpwuid().

getpeername SOCKET

Returns the packed sockaddr address of the other end of the SOCKET connection.

getpgrp PID

Returns the current process group for the specified PID and returns 0 for the current process. Raises an exception if used on a machine that doesn't implement getpgrp(2). If PID is omitted, the function returns the process group of the current process.

getppid

Returns the process ID of the parent process.

getpriority WHICH,WHO

Returns the current priority for a process, a process group, or a user. (See the getpriority(2) man page.) Raises a fatal exception if used on a machine that doesn't implement getpriority(2).

getpwnam NAME
getgrnam NAME
gethostbyname NAME
getnetbyname NAME
getprotobyname NAME
getpwuid UID
getgrgid GID
getservbyname NAME,PROTO
gethostbyaddr ADDR,ADDRTYPE
getnetbyaddr ADDR,ADDRTYPE
getprotobynumber NUMBER
getservbyport PORT,PROTO
getpwent
getgrent
gethostent
getnetent
getprotoent
getservent
setpwent
setgrent
sethostent STAYOPEN
setnetent STAYOPEN
setprotoent STAYOPEN
setservent STAYOPEN
endpwent
endgrent
endhostent
endnetent
endprotoent
endservent

All these routines perform the same functions as their counterparts in the system library.

getsockname SOCKET

Returns the packed sockaddr address of this end of the SOCKET connection.


use Socket; 

$mysockaddr = getsockname(SOCK); 

($port, $myaddr) = unpack_sockaddr_in($mysockaddr);

getsockopt SOCKET,LEVEL,OPTNAME

Returns the socket option requested or returns undefined if there is an error.

glob EXPR

Returns the value of EXPR with filename expansions such as a shell would do. This routine is the internal function implementing the <*.*> operator.

gmtime EXPR

Converts a time as returned by the time function to a nine-element array with the time localized for the standard Greenwich time zone. Typically used as follows:


($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime(time);

All array elements are numeric and come straight out of a struct tm. Specifically, $mon has the range 0...11, and $wday has the range 0...6. If EXPR is omitted, gmtime performs the equivalent of gmtime( time()).

goto [LABEL|EXPR|&NAME]

The goto-LABEL form finds the statement labeled with LABEL and resumes execution there. It may not be used to go into any construct that requires initialization, such as a subroutine or a foreach loop, or to go into a construct that is optimized away. Although the goto-LABEL form can be used to go almost anywhere else within the dynamic scope, including out of subroutines, a better method is to use some other construct such as last or die.

The goto-EXPR form expects a label name, whose scope is resolved dynamically.

The goto-&NAME form substitutes a call to the named subroutine for the currently running subroutine. This form is used by AUTOLOAD subroutines that want to load another subroutine and then pretend that the newly loaded subroutine had been called in the first place. After the goto, not even caller() is able to tell which routine was called first.

grep [BLOCK|EXPR], LIST

Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value consisting of those elements for which the expression evaluated to TRUE. In a scalar context, grep returns the number of times the expression was TRUE.

hex EXPR

Interprets EXPR as a hex string and returns the corresponding decimal value. If EXPR is omitted, the function uses $_.

index STR,SUBSTR[,POSITION]

Returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If POSITION is omitted, starts searching from the beginning of the string. The return value is based at 0 (or whatever you've set the $[ variable to). If the substring is not found, returns one less than the base, ordinarily -1.

int EXPR

Returns the integer portion of EXPR. If EXPR is omitted, uses $_.

ioctl FILEHANDLE,FUNCTION,SCALAR

Implements the ioctl(2) function.

join EXPR,LIST

Joins the separate strings of LIST or ARRAY into a single string, with fields separated by the value of EXPR, and returns the string. This routine can be used to create delimited records for inclusion in databases. For example, given a form that returns three variables that have been parsed into the hash %in, the following code


$in{'var 1'} = 'Last Name';

$in{'var 2'} = 'First Name';

$in{'var 3'} = 'Middle Name';

$dbrec = join(',', $in{'var1'}, $in{'var2'}, $in{'var3'});

results in


Last Name,First Name,Middle Name

See split.

keys ASSOC_ARRAY

Returns a normal array consisting of all the keys of the named associative array. (In a scalar context, returns the number of keys.) The keys are returned in an apparently random order, but it is the same order as either the values() or each() function produces.

This routine can be very useful in processing lists of key/value pairs. For example, given an associative array called %stuff, the following code prints the keys and their values:


foreach $key (keys %stuff) {

       print "$key = $stuff{$key}\n";

}

kill LIST

Sends a signal to a list of processes. The first element of the list must be the signal to send. Returns the number of processes successfully signaled.

last [LABEL]

The last command is like the break statement in C. It immediately exits the loop in question. If the LABEL is omitted, the command refers to the innermost enclosing loop. The continue block, if any, is not executed:


LINE: while (<STDIN>) { 

       last LINE if /^$/; # exit when done with header 

       ...

}

lc EXPR

Returns a lowercased version of EXPR.

lcfirst EXPR

Returns the value of EXPR with the first character lowercased.

length EXPR

Returns the length in characters of the value of EXPR. If EXPR is omitted, returns length of $_. Remember that unless you have reset $[, strings are zero based. Thus, length({string}) actually points one character beyond the end of the string.

link OLDFILE,NEWFILE

Creates a new filename linked to the old filename. Returns 1 for success, 0 otherwise. (Note, link might not be implemented on all operating systems.)

listen SOCKET,QUEUESIZE

Does the same thing that the listen system call does. Returns TRUE if it succeeded, FALSE otherwise.

local EXPR

A local modifies the listed variables to be local to the enclosing block, subroutine, eval{}, or do. If more than one value is listed, the list must be placed in parentheses.

localtime EXPR

Converts a time as returned by the time function to a nine-element array with the time analyzed for the local time zone. Typically used as follows:


($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);

All array elements are numeric and come straight out of a struct tm. In particular, $mon has the range 0..11 and $wday has the range 0..6. If EXPR is omitted, does localtime(time).

In a scalar context, prints out the ctime(3) value:


$now_string = localtime; # e.g. "Thu Oct 13 04:54:34 1994"

log EXPR

Returns logarithm (base e) of EXPR. If EXPR is omitted, returns log of $_.

lstat [FILEHANDLE|EXPR]

Does the same thing as the stat() function, but performs the stat function on a symbolic link instead of the file to which the symbolic link points. If symbolic links are not implemented on your system, a normal stat() occurs.

m// or //

The match operator. See the section, "Perl Regular Expressions," for more details on the match operator and its available options.

map [BLOCK LIST|EXPR,LIST]

Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value composed of the results of each such evaluation. Evaluates BLOCK or EXPR in a list context, so each element of LIST may produce zero, one, or more elements in the returned value.


@chars = map(chr, @nums);

translates a list of numbers to the corresponding characters.

mkdir FILENAME,MODE

Creates the directory specified by FILENAME with permissions specified by MODE (as modified by umask). If it succeeds, mkdir returns 1; otherwise, it returns 0 and sets $! (errno). MODE varies depending on the operating system implementation.

msgctl ID,CMD,ARG

Calls the System V IPC function msgctl(2). If CMD is &IPC_STAT, then ARG must be a variable that holds the returned msqid_ds structure. Returns values like ioctl: the undefined value for error, "0 but true" for zero, or the actual return value.

msgget KEY,FLAGS

Calls the System V IPC function msgget(2). Either returns the message queue ID or returns the undefined value if an error occurs.

msgsnd ID,MSG,FLAGS

Calls the System V IPC function msgsnd to send the message MSG to the message queue ID. MSG must begin with the long integer message type, which may be created with pack("l", $type). Returns TRUE if successful; returns FALSE if an error occurs.

msgrcv ID,VAR,SIZE,TYPE,FLAGS

Calls the System V IPC function msgrcv to receive a message from message queue ID into variable VAR with a maximum message size of SIZE. If a message is received, the message type is the first thing in VAR; the maximum length of VAR is SIZE plus the size of the message type. Returns TRUE if successful or returns FALSE if an error occurs.

my EXPR

A my declares the listed variables to be local (lexically) to the enclosing block, subroutine, eval, or do/require/use file. If more than one value is listed, the list must be placed in parentheses.

next [LABEL]

The next command is like the continue statement in C; it starts the next iteration of the loop:


LINE: while (<STDIN>) { 

       next LINE if /^#/; # discard comments 

       ... 

}

Note that if the preceding code contained a continue block, the block would be executed even on discarded lines. If the LABEL is omitted, the command refers to the innermost enclosing loop.

no Module LIST

This function is the opposite of the use function. See the use function.

oct EXPR

Interprets EXPR as an octal string and returns the corresponding decimal value. (If EXPR happens to begin with 0x, this function interprets it as a hex string instead.)

If EXPR is omitted, uses $_.

open FILEHANDLE[,EXPR]

Opens the file whose filename is given by EXPR and associates it with FILEHANDLE. If FILEHANDLE is an expression, its value is used as the name of the real filehandle wanted. If EXPR is omitted, the scalar variable of the same name as the FILEHANDLE contains the filename. The following characters have special meaning if they begin the filename:

< or nothingOpened for input
>Opened for output
>>Opened for appending

You can put a + in front of the > or < to indicate that you want both read and write access to the file. Thus, +< is usually preferred for read/write updates-the +> mode would clobber the file first. These indicators correspond to the fopen(3) modes of r, r+, w, w+, a, and a+.

If the filename begins with a vertical bar (|), the filename is interpreted as a command to which output is to be piped; and if the filename ends with a |, the filename is interpreted as a command from which input will be piped.

Opening - opens STDIN, and opening >- opens STDOUT. Open returns nonzero upon success and returns the undefined value otherwise. If the open involved a pipe, the return value is the process id (PID) of the subprocess.

opendir DIRHANDLE,EXPR

Opens a directory named EXPR for processing by readdir(), telldir(), seekdir(), rewinddir(), and closedir(). Returns TRUE if successful. DIRHANDLEs have their own namespaces separate from FILEHANDLEs.

ord EXPR

Returns the numeric ASCII value of the first character of EXPR. If EXPR is omitted, uses $_.

pack TEMPLATE,LIST

Takes an array or list of values and packs it into a binary structure, returning the string containing the structure. The TEMPLATE is a sequence of characters that give the order and type of values, as follows:

A
ASCII string, space padded
a
ASCII string, null padded
b
Bit string, ascending bit order
B
Bit string, descending bit order
h
Hex string, low nybble first
H
Hex string, high nybble first
c
Signed char value
C
Unsigned char value
s
Signed short value
S
Unsigned short value
i
Signed integer value
I
Unsigned integer value
l
Signed long value
L
Unsigned long value
n
Short in "network" order
N
Long in "network" order
v
Short, little-endian order
V
Long, little-endian order
f
Single-precision float, native format
d
Double-precision float, native format
p
Pointer to null-terminated string
P
Pointer to a structure (fixed-length string)
u
Uuencoded string
x
Null byte
X
Back up a byte

Each letter may optionally be followed by a number that gives a repeat count. With all types except a, A, b, B, h, H, and P, the pack function gobbles up that many values from the list. A * for the repeat count means to use however many items are left. The a and A types gobble just one value but pack it as a string of length count, padding with nulls or spaces as necessary. (When unpacking, A strips trailing spaces and nulls, but a does not.) Likewise, the b and B fields pack a string that many bits long. The h and H fields pack a string that many nybbles long. The P packs a pointer to a structure of the size indicated by the length. Real numbers (floats and doubles) are in the native machine format only; because of the large number of floating formats and the lack of a standard network representation, no facility for interchange has been made. Therefore, packed floating-point data written on one machine may not be readable on another-even if both use IEEE floating-point arithmetic (as the "endian-ness" of the memory representation is not part of the IEEE specification). Note that Perl uses doubles internally for all numeric calculations, and converting from double to float and back to double again inevitably loses precision (for example, unpack("f", pack("f", $foo)) does not in general equal $foo).

You can generally use the same template in the unpack function.

package NAMESPACE

Declares the compilation unit as being in the given NAMESPACE. The scope of the package declaration is from the declaration itself through the end of the enclosing block (the same scope as the local() operator).

pipe READHANDLE,WRITEHANDLE

Opens a pair of connected pipes like the corresponding system call. Note that if you set up a loop of piped processes, deadlock can occur unless you are very careful. In addition, note that Perl's pipes use stdio buffering, so you may need to set $| to flush your WRITEHANDLE after each command, depending on the application.

pop ARRAY

Pops and returns the last value of the array, shortening the array by 1. If the array is empty, returns the undefined value. If ARRAY is omitted, pops the @ARGV array in the main program and the @_ array in subroutines, just like shift().

pos SCALAR

Returns the offset of where the last m//g search left off for the variable in question. (m//g searches for all occurrences of the regular expression in a line.) Can be modified to change that offset.

print [FILEHANDLE LIST|LIST]

Prints a string or a comma-separated list of strings. Returns TRUE if successful. FILEHANDLE may be a scalar variable name, in which case the variable contains the name of or a reference to the filehandle. If FILEHANDLE is omitted, prints to standard output or to the last selected output channel. If LIST is also omitted, prints $_ to STDOUT. To set the default output channel to something other than STDOUT, use the select operation. Note that, because print takes a LIST, anything in the LIST is evaluated in a list context, and any subroutine that you call has one or more of its expressions evaluated in a list context. Also, be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print.

push ARRAY,LIST

Treats ARRAY as a stack and pushes the values of LIST onto the end of ARRAY. The length of ARRAY increases by the length of LIST. Returns the new number of elements in the array.

q[q|x|w]/STRING/

Generalized quotes.

quotemeta EXPR

Returns the value of EXPR with all regular expression metacharacters backslashed.

rand [EXPR]

Returns a random fractional number between 0 and the value of EXPR. (EXPR should be positive.) If EXPR is omitted, returns a value between 0 and 1. This function produces repeatable sequences unless srand() is invoked. See also srand.

read FILEHANDLE,SCALAR,LENGTH[,OFFSET]

Attempts to read LENGTH bytes of data into variable SCALAR from the specified FILEHANDLE. Returns the number of bytes actually read or returns undef if an error occurs. SCALAR is grown or shrunk to the length actually read. An OFFSET may be specified to place the read data at some place other than the beginning of the string. This call is actually implemented in terms of stdio's fread call. To get a true read system call, see sysread.

readdir DIRHANDLE

Returns the next directory entry for a directory opened by opendir(). If used in a list context, returns all the rest of the entries in the directory. If there are no more entries, returns an undefined value in a scalar context or a null list in a list context.

readlink EXPR

If symbolic links are implemented, readlink returns the value of a symbolic link. If not, it gives a fatal error. If a system error occurs, readlink returns the undefined value and sets $! (errno). If EXPR is omitted, uses $_.

recv SOCKET,SCALAR,LEN,FLAGS

Receives a message on a socket. Attempts to receive LENGTH bytes of data into the variable SCALAR from the specified SOCKET filehandle. Returns the address of the sender. Returns the undefined value if an error occurs. SCALAR is grown or shrunk to the length actually read. Takes the same flags as the system call of the same name.

redo [LABEL]

The redo command restarts the loop block without evaluating the conditional again. The continue block, if any, is not executed. If the LABEL is omitted, the command refers to the innermost enclosing loop.

ref EXPR

Returns a TRUE value if EXPR is a reference, FALSE otherwise. The value returned depends on what EXPR is a reference to. The built-in types that EXPR can reference include REF, SCALAR, ARRAY, HASH, CODE, and GLOB.

rename OLDNAME,NEWNAME

Changes the name of a file. Returns 1 for success, 0 otherwise. Does not work across file system boundaries.

require [EXPR]

Demands some semantics specified by EXPR or by $_ if EXPR is not supplied. If EXPR is numeric, demands that the current version of Perl ($] or $Perl_VERSION) be equal to or greater than EXPR.

Otherwise, demands that a library file be included if it hasn't already been included.

Note that the file is not included twice under the same specified name. The file must return TRUE as the last statement to indicate successful execution of any initialization code, so it's customary to end such a file with 1;.

If EXPR is a bare word, require assumes a .pm extension to enable you to load standard modules without altering your namespace.

reset [EXPR]

Generally used in a continue block at the end of a loop to clear variables and reset ?? searches so that they work again. The expression is interpreted as a list of single characters (hyphens are allowed for ranges). All variables and arrays beginning with one of those letters are reset to their pristine state. If the expression is omitted, one-match searches (?pattern?) are reset to match again. Only resets variables or searches in the current package and always returns 1.

return LIST

Returns from a subroutine or eval with the value specified. If LIST is omitted, a subroutine or eval() automatically returns the value of the last expression evaluated.

reverse LIST

In a list context, returns a list value consisting of the elements of LIST in the opposite order. In a scalar context, returns a string value consisting of the bytes of the first element of LIST in the opposite order.

rewinddir DIRHANDLE

Sets the current position to the beginning of the directory for the readdir() routine on DIRHANDLE.

rindex STR,SUBSTR[,POSITION]

Works just like index except that it returns the position of the last occurrence of SUBSTR in STR. If POSITION is specified, returns the last occurrence at or before that position.

rmdir [FILENAME]

Deletes the directory specified by FILENAME if it is empty. If it succeeds, rmdir returns 1; otherwise, rmdir returns 0 and sets $! (errno). If FILENAME is omitted, uses $_.

s///

The substitution operator. See the section "Perl Regular Expressions" for more detail on the substitution operator and its available options.

scalar EXPR

Forces EXPR to be interpreted in a scalar context and returns the value of EXPR.

seek FILEHANDLE,POSITION,WHENCE

Randomly positions the file pointer for FILEHANDLE, just like the C fseek() call of stdio. FILEHANDLE may be an expression whose value gives the name of the filehandle. The values for WHENCE are

0
Set file pointer to POSITION
1
Set file pointer to current plus POSITION
2
Set file pointer to EOF plus offset

You can use the values SEEK_SET, SEEK_CUR, and SEEK_END for this from a POSIX module. Returns 1 on success and 0 otherwise.

seekdir DIRHANDLE,POS

Sets the current position for the readdir() routine on DIRHANDLE. POS must be a value returned by telldir().

select [FILEHANDLE]

Returns the currently selected filehandle. If FILEHANDLE is supplied, select sets the current default filehandle for output. This action has two effects. First, a write or a print without a filehandle defaults to this FILEHANDLE. Second, references to variables related to output refer to this output channel.

select RBITS,WBITS,EBITS,TIMEOUT

Calls the select(2) system call with the bit masks specified, which can be constructed using fileno() and vec().

semctl ID,SEMNUM,CMD,ARG

Calls the System V IPC function semctl. If CMD is &IPC_STAT or &GETALL, then ARG must be a variable that holds the returned semid_ds structure or semaphore value array. semctl is similar to ioctl: in that both return the undefined value for error, "0 but true" for zero, or the actual return value otherwise.

semget KEY,NSEMS,FLAGS

Calls the System V IPC function semget. Returns the semaphore ID; if an error occurs, returns the undefined value.

semop KEY,OPSTRING

Calls the System V IPC function semop to perform semaphore operations such as signaling and waiting. OPSTRING must be a packed array of semop structures. Each semop structure can be generated with pack("sss", $semnum, $semop, $semflag). The number of semaphore operations is implied by the length of OPSTRING. Returns TRUE if successful; returns FALSE if an error occurs.

send SOCKET,MSG,FLAGS[,TO]

Sends a message on a socket. Takes the same flags as the system call of the same name. On unconnected sockets, you must specify a destination to send TO, in which case send does a C sendto(). Returns the number of characters sent or the undefined value if an error occurs.

setpgrp PID,PGRP

Sets the current process group for the specified PID, 0 for the current process. Produces a fatal error if used on a machine that doesn't implement setpgrp(2).

setpriority WHICH,WHO,PRIORITY

Sets the current priority for a process, a process group, or a user. Produces a fatal error if used on a machine that doesn't implement setpriority(2).

setsockopt SOCKET,LEVEL,OPTNAME,OPTVAL

Sets the socket option requested. Returns undefined if an error occurs. OPTVAL may be specified as undef if you don't want to pass an argument.

shift [ARRAY]

Shifts the first value of the array off and returns it, shortening the array by 1 and moving everything down. If the array is empty, returns the undefined value. If ARRAY is omitted, shifts the @ARGV array in the main program and the @_ array in subroutines. Shift and unshift do the same thing to the left end of an array that push and pop do to the right end.

shmctl ID,CMD,ARG

Calls the System V IPC function shmctl. If CMD is &IPC_STAT, then ARG must be a variable that holds the returned shmid_ds structure. shmctl is like ioctl: in that both return the undefined value for error, "0 but true" for zero, or the actual return value otherwise.

shmget KEY,SIZE,FLAGS

Calls the System V IPC function shmget. Returns the shared memory segment ID or returns the undefined value if an error occurs.

shmread ID,VAR,POS,SIZE

shmwrite ID,STRING,POS,SIZE

Reads or writes the System V shared memory segment ID starting at position POS for size SIZE by attaching to it, copying in/out, and detaching from it. When reading, VAR must be a variable that holds the data read. When writing, if STRING is too long, only SIZE bytes are used; if STRING is too short, nulls are written to fill out SIZE bytes. Returns TRUE if successful or FALSE if an error occurs.

shutdown SOCKET,HOW

Shuts down a socket connection in the manner indicated by HOW, which has the same interpretation as in the system call of the same name.

sin EXPR

Returns the sine of EXPR (expressed in radians). If EXPR is omitted, returns sine of $_.

sleep EXPR

sleep causes the script to sleep for EXPR seconds or forever if no EXPR is set. May be interrupted by sending the process a SIGALRM. Returns the number of seconds actually slept. sleep is not suitable for CGI programming.

socket SOCKET,DOMAIN,TYPE,PROTOCOL

Opens a socket of the specified kind and attaches it to filehandle SOCKET. DOMAIN, TYPE, and PROTOCOL are specified the same as they are for the system call of the same name. You should code use Socket; first to import the proper definitions.

socketpair SOCKET1,SOCKET2,DOMAIN,TYPE,PROTOCOL

Creates an unnamed pair of sockets in the specified domain, of the specified type. DOMAIN, TYPE, and PROTOCOL are specified the same as they are for the system call of the same name. If this function is not implemented on your computer, socketpair yields a fatal error. Returns TRUE if successful.

sort [[SUBNAME|BLOCK]] LIST

Sorts the LIST and returns the sorted list value. Nonexistent values of arrays are stripped out. If SUBNAME or BLOCK is omitted, sort sorts in standard string comparison order. If SUBNAME is specified, sort gives the name of a subroutine that returns an integer less than, equal to, or greater than 0, depending on how the elements of the array are to be ordered. (The <=> and cmp operators are extremely useful in such routines.) SUBNAME may be a scalar variable name, in which case the value provides the name of the subroutine to use. In place of a SUBNAME, you can provide a BLOCK as an anonymous, in-line sort subroutine.

splice ARRAY,OFFSET[,LENGTH[,LIST]]

Removes the elements designated by OFFSET and LENGTH from an array, and replaces them with the elements of LIST, if any. Returns the elements removed from the array. The array grows or shrinks as necessary. If LENGTH is omitted, splice removes everything from OFFSET onward.

split [/PATTERN/[,EXPR[,LIMIT]]]

Splits a string into an array of strings and returns it. If not in a list context, split returns the number of fields found and splits into the @_ array. (In a list context, you can force the split into @_ by using ?? as the pattern delimiters, but it still returns the array value.)

If EXPR is omitted, split splits the $_ string. If PATTERN is also omitted, split splits on whitespace (after skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter separating the fields. (Note that the delimiter may be longer than one character.) If LIMIT is specified and is not negative, split splits into no more than that many fields (though it may split into fewer). If LIMIT is unspecified, trailing null fields are stripped (which potential users of pop would do well to remember). If LIMIT is negative, the LIMIT is treated as if an arbitrarily large LIMIT had been specified.

A pattern matching the null string (not to be confused with a null pattern //, which is just one member of the set of patterns matching a null string) splits the value of EXPR into separate characters at each point it matches.

sprintf FORMAT,LIST

Returns a string formatted by the usual printf conventions of the C language.

sqrt EXPR

Returns the square root of EXPR. If EXPR is omitted, returns the square root of $_.

srand EXPR

Sets the random number seed for the rand operator. If EXPR is omitted, sets a random number seed based on time, for example, srand(time).

stat [FILEHANDLE|EXPR]

Returns a 13-element array giving the status information for a file, either the file opened via FILEHANDLE or the file named by EXPR. Returns a null list if the stat fails.


($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,

$atime,$mtime,$ctime,$blksize,$blocks) = stat($filename);

Not all fields are supported on all file system types. Here are the meanings of the fields:

devDevice number of file system
inoInode number
modeFile mode (type and permissions)
nlinkNumber of (hard) links to the file
uidNumeric user ID of file's owner
gidNumeric group ID of file's owner
rdevThe device identifier (special files only)
sizeTotal size of file in bytes
atimeLast access time since the epoch
mtimeLast modify time since the epoch
ctimeInode change time (not creation type!) since the epoch
blksizeSize of each block
blocksNumber of blocks

If stat is passed the special filehandle consisting of an underline, no stat is done, but the current contents of the stat structure from the last stat or file test are returned.

sub [BLOCK|NAME|NAME BLOCK]

This "function" is actually a subroutine definition. With just a NAME (and possibly prototypes), sub is just a forward declaration. Without a NAME, it's an anonymous function declaration and actually returns a value: the CODE reference of the closure you just created.

substr EXPR,OFFSET[,LEN]

Extracts a substring out of EXPR and returns it. The first character is at offset 0 or at the current setting of $[. If OFFSET is negative, substr starts the offset distance from the end of the string. If LEN is omitted, substr returns everything to the end of the string. If LEN is negative, substr leaves that many characters off the end of the string.

You can use the substr function as an lvalue, in which case EXPR must be an lvalue. If you assign something shorter than LEN, the string shrinks; and if you assign something longer than LEN, the string grows to accommodate it. To keep the string the same length, you may need to pad or chop your value using sprintf.

symlink OLDFILE,NEWFILE

Creates a new filename symbolically linked to the old filename. Returns 1 for success, 0 otherwise. On systems that don't support symbolic links, produces a fatal error at runtime.

syscall LIST

Calls the system call specified as the first element of the list, passing the remaining elements as arguments to the system call. If syscall is unimplemented, using the statement produces a fatal error. The arguments are interpreted as follows: If a given argument is numeric, the argument is passed as an int. If not, the pointer to the string value is passed. You are responsible for making sure a string is preextended long enough to receive any result that might be written into that string. If your integer arguments are not literals and have never been interpreted in a numeric context, you may need to add 0 to them to force them to look like numbers.

Note
Note that the maximum number of arguments to your system call that Perl supports is 14, which in practice should usually suffice

sysopen FILEHANDLE,FILENAME,MODE[,PERMS]

Opens the file whose filename is given by FILENAME and associates it with FILEHANDLE. If FILEHANDLE is an expression, its value is used as the name of the real filehandle wanted. This function calls the underlying operating system's open function with the parameters FILENAME, MODE, PERMS.

The possible values and flag bits of the MODE parameter are system dependent.

sysread FILEHANDLE,SCALAR,LENGTH[,OFFSET]

Attempts to read LENGTH bytes of data into a variable SCALAR from the specified FILEHANDLE, using the system call read(2). sysread bypasses stdio, so mixing this read with other kinds of reads may cause confusion. Returns the number of bytes actually read or returns undef if an error occurs. SCALAR is grown or shrunk to the length actually read. An OFFSET may be specified to place the read data at some place other than the beginning of the string.

system LIST

system LIST does exactly the same thing as exec LIST except that system does a fork first, and the parent process waits for the child process to complete. Note that argument processing varies depending on the number of arguments. The return value is the exit status of the program as returned by the wait() call. To get the actual exit value, divide by 256. See also exec. If you want to capture the output from a command, you should merely use backticks rather than system LIST.

syswrite FILEHANDLE,SCALAR,LENGTH[,OFFSET]

Attempts to write LENGTH bytes of data from the variable SCALAR to the specified FILEHANDLE, using the system call write(2). syswrite bypasses stdio, so mixing this function with prints may cause confusion. Returns the number of bytes actually written or undef if an error occurs. An OFFSET may be specified to get the write data from some place other than the beginning of the string.

tell FILEHANDLE

tell returns the current file position for FILEHANDLE. FILEHANDLE may be an expression whose value gives the name of the actual filehandle. If FILEHANDLE is omitted, assumes the file last read.

telldir DIRHANDLE

telldir returns the current position of the readdir routines on DIRHANDLE. A value may be given to seekdir to access a particular location in a directory. telldir has the same caveats about possible directory compaction as the corresponding system library routine.

tie VARIABLE,CLASSNAME,LIST

This function binds a variable to a package class that provides the implementation for the variable. VARIABLE is the name of the variable to be bound. CLASSNAME is the name of a class implementing objects of the correct type. Any additional arguments are passed to the "new" method of the class (meaning TIESCALAR, TIEARRAY, or TIE-HASH). Typically, these arguments resemble arguments that might be passed to the dbm_open function of C. The object returned by the new method is also returned by the tie function, which is useful if you want to access other methods in CLASSNAME.

tied VARIABLE

Returns a reference to the object underlying VARIABLE (the same value that was originally returned by the tie call that bound the variable to a package). Returns the undefined value if VARIABLE isn't tied to a package.

time

Returns the number of non-leap seconds since 00:00:00 UTC, January 1, 1970. Suitable for feeding to gmtime() and localtime().

times

Returns a four-element array giving the user and system times, in seconds, for this process and the children of this process.

tr///

The translation operator. See the section, "Perl Regular Expressions," for more detail on the translation operator and its available options.

truncate [FILEHANDLE|EXPR],LENGTH

Truncates the file opened on FILEHANDLE or named by EXPR to the specified length. Produces a fatal error if truncate isn't implemented on your system.

uc EXPR

Returns an uppercased version of EXPR. uc is the internal function implementing the \U escape in double-quoted strings. Should respect any POSIX setlocale() settings.

ucfirst EXPR

ucfirst returns the value of EXPR with the first character in uppercase. This function is the internal function implementing the \u escape in double-quoted strings. Should respect any POSIX set-locale() settings.

umask [EXPR]

Sets the umask for the process and returns the old one. If EXPR is omitted, it merely returns the current umask.

undef [EXPR]

Undefines the value of EXPR, which must be an lvalue. Use only on a scalar value, an entire array, or a subroutine name (using &). (Using undef() will probably not do what you expect on most predefined variables or DBM list values.) Always returns the undefined value. You can omit the EXPR, in which case nothing is undefined, but you still get an undefined value that you could, for instance, return from a subroutine.

unlink LIST

Deletes a list of files and returns the number of files successfully deleted.

unpack TEMPLATE,EXPR

unpack does the reverse of pack: It takes a string representing a structure and expands it into a list value, returning the array value. (In a scalar context, unpack merely returns the first value produced.) TEMPLATE has the same format as it has in the pack function.

untie VARIABLE

Breaks the binding between a variable and a package. (See tie.)

unshift ARRAY,LIST

Does the opposite of a shift (or the opposite of a push, depending on how you look at it). Prepends the list to the front of the array and returns the new number of elements in the array.

use Module [LIST]

Imports some semantics into the current package from the named module, generally by aliasing certain subroutine or variable names into your package.

utime LIST

Changes the access and modification times on each file of a list of files. The first two elements of the list must be the numerical access and modification times, in that order. Returns the number of files successfully changed.

values ASSOC_ARRAY

Returns a normal array consisting of all the values of the named associative array. (In a scalar context, returns the number of values.) The values are returned in an apparently random order, but it is the same order as either the keys or each function.

vec EXPR,OFFSET,BITS

Treats the string in EXPR as a vector of unsigned integers and returns the value of the bit field specified by OFFSET. BITS specifies the number of bits that are reserved for each entry in the bit vector and must be a power of two from 1 to 32. vec may also be assigned to by an assignment statement, in which case you must use parentheses to give the expression the correct precedence.

waitpid PID,FLAGS

Waits for a particular child process to terminate and returns the pid of the deceased process or -1 if there is no such child process. The status is returned in $?.

warn LIST

warn produces a message on the standard error stream (STDERR) just like die but doesn't exit or raise an exception.

write [[FILEHANDLE|EXPR]]

Writes a formatted record (possibly multiline) to the specified file, using the format associated with that file. By default, the format for a file is the one with the same name as the filehandle, but the format for the current output channel (see the select function) may be set explicitly by assigning the name of the format to the $~ variable.

If FILEHANDLE is unspecified, output goes to the current default output channel, which starts out as STDOUT but may be changed by the select operator. If the FILEHANDLE is an EXPR, then the expression is evaluated and the resulting string is used to look up the name of the FILEHANDLE at runtime. For more on formats, see the perlform man page for more information.

y///

The translation operator. See the following section for more detail on the translation operator and its available options.

Perl Regular Expressions

Perl supports powerful regular expression parsing, which can be used with the following matching operators.

m// (//)Match operator
s///Substitution operator
tr///Translation operators
y/// 

The matching operators can have various modifiers, some of which relate to the interpretation of the regular expression inside. These are

I
Perform case-insensitive pattern matching.
m
Treat the string as multiple lines.
s
Treat the string as a single line.
x
Extend your pattern's legibility with whitespace and comments.

These expressions are usually written as the /x modifier, even though the delimiter in question might not actually be a slash. In fact, any of these modifiers may also be embedded within the regular expression using the new (?...) construct described later in this section.

The /x modifier itself needs a little more explanation. It tells the regular expression parser to ignore whitespace that is not backslashed or within a character class. You can use the /x modifier to break your regular expression into (slightly) more readable parts. The # character is also treated by the expression as a metacharacter introducing a comment, just as in ordinary Perl code. Taken together, these features go a long way toward making Perl 5 a readable language. (Note that this feature is not available in Perl 4.)

Regular Expressions

The patterns used in pattern matching are regular expressions such as those supplied in the version 8 regexp routines.

In particular, the following metacharacters have their standard meanings from the UNIX egrep:

\
Quote the next metacharacter.
^
Match the beginning of the line.
.
Match any character (except newline).
$
Match the end of the line (or before newline at the end).
|
Alternation.
()
Grouping.
[]
Character class.

By default, the ^ character is guaranteed to match only at the beginning of the string, and the $ character matches only at the end (or before the newline at the end). Perl does certain optimizations with the assumption that the string contains only one line. Embedded newlines are not matched by ^ or $. However, you might want to treat a string as a multiline buffer, so that the ^ matches after any newline within the string and $ matches before any newline. At the cost of a little more overhead, you can do this matching by using the /m modifier on the pattern match operator. (Older programs did this matching by setting $*, but this practice is not encouraged in Perl 5.)

To facilitate multiline substitutions, the . character never matches a newline unless you use the /s modifier, which tells Perl to pretend the string is a single line-even if it isn't. The /s modifier also overrides the setting of $*, in case you have some (badly behaved) older code (such as that written for versions of Perl before version 5) that sets $* in another module.

The following standard quantifiers are recognized by regular expressions:

*Match 0 or more times.
+Match 1 or more times.
?Match 1 or 0 times.
{n}Match exactly n times.
{n,}Match at least n times.
{n,m}Match at least n but not more than m times.

If a curly bracket occurs in any other context, it is treated as a regular character. The * modifier is equivalent to {0,}, the + modifier to {1,}, and the ? modifier to {0,1}. n and m are limited to integral values less than 65,536.

By default, a quantified subpattern is "greedy"; that is, it matches as many times as possible without causing the rest of the pattern not to match. The standard quantifiers are all greedy in that they match as many occurrences as possible (given a particular starting location) without causing the pattern to fail. If you want it to match the minimum number of times possible, follow the quantifier with a ?. Note that the meanings don't change, only the "gravity":

*?Match 0 or more times.
+?Match 1 or more times.
??Match 0 or 1 time.
{n}?Match exactly n times.
{n,}?Match at least n times.
{n,m}?Match at least n but not more than m times.

Because patterns are processed as double-quoted strings, the following metacharacters are also expanded:

\t
Tab
\n
Newline
\r
Return
\f
Form feed
\v
Vertical tab
\a
Alarm (bell)
\e
Escape
\0nn
Octal character
\xnn
Hex character
\c[
Control character
\l
Lowercase next character
\u
Uppercase next character
\L
Lowercase until \E
\U
Uppercase until \E
\E
End case modification
\Q
Quote regexp metacharacters until \E

In addition, Perl defines the following metacharacters:

\w
Match a word character (alphanumeric plus "_").
\W
Match a non-word character.
\s
Match a whitespace character.
\S
Match a non-whitespace character.
\d
Match a digit character.
\D
Match a non-digit character.

Note that \w matches a single alphanumeric character, not a whole word. To match a word, you say \w+. You may use \w, \W, \s, \S, \d, and \D within character classes (although not at either end of a range).

Perl defines the following zero-width assertions:

\b
Match a word boundary.
\B
Match a non-word boundary.
\A
Match only at beginning of a string.
\Z
Match only at end of a string (or before newline at the end).
\G
Match only where previous m//g left off.

A word boundary is defined as a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W. (Within character classes, \b represents backspace rather than a word boundary.) The \A and \Z are just like ^ and $ except that they won't match multiple times when the /m modifier is used, whereas ^ and $ match at every internal line boundary. To match the actual end of the string, not ignoring newline, you can use \Z(?!\n).

When the bracketing construct ( ... ) is used, \<digit> matches the \<digit>th substring. Outside of the pattern, always use $ instead of \ in front of the digit. (Although the \<digit> notation can on rare occasions work outside the current pattern, you should not depend on it. See the warning that follows.) The scope of $<digit> (and $`, $&, and ) extends to the end of the enclosing BLOCK or eval string or to the next successful pattern match, whichever comes first. If you want to use parentheses to delimit a subpattern (for example, a set of alternatives) without saving it as a subpattern, follow the ( with a ?.

You can have as many parentheses as you want. If you have more than nine substrings, the variables $10, $11, and so on refer to the corresponding substring. Within the pattern, \10, \11, and so on refer to substrings if at least that many left parentheses occurred before the back reference. Otherwise (for backward compatibility), \10 is the same as \010 (a backspace), and \11 the same as \011 (a tab) and so on. (\1 through \9 are always back references.)

$+ returns whatever the last bracket match matched. $& returns the entire matched string. (In Perl versions prior to version 5, $0 used to return the same thing, but not any more.) $` returns everything before the matched string. returns everything after the matched string.


s/^([^ ]*) *([^ ]*)/$2 $1/; # swap first two words 

if (/Time: (..):(..):(..)/) { 

       $hours = $1; 

       $minutes = $2; 

       $seconds = $3; 

}

You will note that all backslashed metacharacters in Perl are alphanumeric, such as \b, \w, and \n, unlike some other regular expression languages. Anything that looks like \\, \(, \), \<, \>, \{, or \} is always interpreted as a literal character, not as a metacharacter. This convention makes it simple to quote a string that you want to use for a pattern but that you are afraid might contain metacharacters. Simply quote all the nonalphanumeric characters:


$pattern =~ s/(\W)/\\$1/g;

You can also use the built-in quotemeta function to quote a string. An even easier way to quote metacharacters within the match operator is to say


/$unquoted\Q$quoted\E$unquoted/

Perl 5 defines a consistent extension syntax for regular expressions. The syntax is a pair of parentheses with a question mark as the first character within the parentheses (this construct produced a syntax error in Perl 4). The character after the question mark gives the function of the extension. Several extensions are already supported:

(?#text)A comment. The text is ignored. If the /x switch is used to enable whitespace formatting, a simple # will suffice.
(?:regexp)This groups things like "()" but doesn't make back references as "()" does.
(?=regexp)A zero-width positive look-ahead assertion.
(?!regexp)A zero-width negative look-ahead assertion.
(?imsx)One or more embedded pattern-match modifiers. This modifier is particularly useful for patterns that are specified in a table somewhere, some of which want to be case-sensitive and some of which don't. The case-insensitive patterns merely need to include (?i) at the front of the pattern.

Perl Subroutines

Like many languages, Perl provides for user-defined subroutines. These subroutines may be located anywhere in the main program; loaded in from other files via the do, require, or use keywords; or even generated on-the-fly using eval or anonymous subroutines (closures). You can even call a function indirectly using a variable containing its name or a code reference to it, as in $var = \&function.

The Perl model for function calls and return values is simple: All functions are passed as parameters a single flat list of scalars, and all functions likewise return to their caller a single flat list of scalars. Any arrays or hashes in these call and return lists collapse, losing their identities-but you may always use pass-by-reference instead to avoid this situation. Both call and return lists may contain as many or as few scalar elements as you'd like.

Any arguments passed to the routine come in as the array @_. If you call a function with two arguments, those are stored in $_[0] and $_[1]. The array @_ is a local array, but its values are implicit references to the actual scalar parameters. The return value of the subroutine is the value of the last expression evaluated. Alternatively, you can use a return statement to specify the returned value and exit the subroutine. If you return one or more arrays or hashes, they are flattened together into one large indistinguishable list.

Perl does not have named formal parameters, but in practice, all you do is assign a list of parameters to a my list. Any variables you use in the function that aren't declared private are global variables.

In the following example, $max is local to the subroutine max because it is declared in a my list:


sub max { 

       my $max = shift(@_); 

       foreach $foo (@_) { 

              $max = $foo if $max < $foo; 

       } 

       return $max; 

} 



$bestday = max($mon,$tue,$wed,$thu,$fri);

In the following example, $lookahead is a global variable that is set both in the mainline code and in the subroutine get_line:


# get a line, combining continuation lines 

# that start with whitespace 

sub get_line { 

       $thisline = $lookahead; # GLOBAL VARIABLES!! 

       LINE: while ($lookahead = <STDIN>) { 

              if ($lookahead =~ /^[ \t]/) { 

                     $thisline .= $lookahead; 

              } else { 

                     last LINE; 

              } 

       } 

       $thisline; 

} 



$lookahead = <STDIN>; # get first line 

while ($_ = get_line()){ 

       ... 

}

Use array assignment to a local list to name your formal arguments:


sub maybeset { 

       my($key, $value) = @_; 

       $Foo{$key} = $value unless $Foo{$key}; 

}

The technique of assigning variables using a my list also has the effect of turning call-by-reference into call-by-value because the assignment copies the values. Otherwise, a function is free to do in-place modifications of @_ and change its callers' values.


upcase_in($v1, $v2); # this changes $v1 and $v2 



sub upcase_in { 

       for (@_) { 

              tr/a-z/A-Z/ 

       } 

}

You aren't allowed to modify constants in this way, of course. If an argument were actually literal and you tried to change it, you'd take an exception.

You will, of course, be safer if you write the upcase_in() function to return a copy of its parameters instead of changing them in place.

You can call a subroutine using the & prefix. The & is optional in Perl 5 and so are the parentheses if the subroutine has been predeclared. Note, however, that the & is not optional when you're just naming the subroutine, such as when it's used as an argument to defined() or undef(). Nor is the & optional when you want to do an indirect subroutine call with a subroutine name or reference using the &$ subref() or &{$subref}() constructs. (See the perlref man page for more on subroutine naming.)

Subroutines can be called recursively. If a subroutine is called using the & form, the argument list is optional, and if omitted, no @_ array is set up for the subroutine. Instead, the @_ array at the time of the call is visible to the subroutine. This setting of the @_ array is an efficiency mechanism that new users might want to avoid.

Private Variables via my()

A my declares the listed variables to be confined (lexically) to the enclosing block, subroutine, eval, or do/require/use file. If more than one value is listed, the list must be placed in parentheses. All listed elements must be legal lvalues. Only alphanumeric identifiers may be lexically scoped-magical built-ins like $/ must currently be localized with local instead.

Unlike dynamic variables created by the local statement, lexical variables declared with my are totally hidden from the outside world, including any called subroutines (even if it's the same subroutine called from itself or elsewhere-every call gets its own copy).

Temporary Values via local()

A local() modifies its listed variables to be local to the enclosing block, (or subroutine, eval{}, or do) and any blocks called from within that block. A local() gives temporary values to global variables. This technique is known as dynamic scoping. Lexical scoping is done with my, which works more like C's auto declarations.

Note
In general, you should be using my instead of local because it's faster and safer. Exceptions to this include the global punctuation variables, filehandles, formats, and direct manipulation of the Perl symbol table itself. Format variables often use local though, as do other variables whose current value must be visible to called subroutines

If more than one variable is given to local(), they must be placed in parentheses. All listed elements must be legal lvalues. This operator works by saving the current values of those variables in its argument list on a hidden stack and restoring them upon exiting the block, subroutine, or eval. Consequently, called subroutines can also reference the local variable, but not the global one. The argument list may be assigned if desired, which allows you to initialize your local variables. (If no initializer is given for a particular variable, it is created with an undefined value.) This technique is commonly used to name the parameters to a subroutine.

Because local() is a runtime command, it gets executed every time through a loop. In releases of Perl previous to 5.0, local() used more stack storage each time until the loop was exited. Perl now reclaims the space each time through, but declaring your variables outside the loop is still more efficient than using local().

A local is simply a modifier on an lvalue expression. When you assign to a localized variable, the local stays the same regardless of whether its list is viewed as a scalar or an array.

Passing Symbol Table Entries (typeglobs)

Sometimes you want to pass the name of an array, rather than its value, to a subroutine so that the subroutine can modify the global copy of it rather than work with a local copy. In Perl, you can refer to all objects of a particular name by prefixing the name with * (for example, *foo). This mechanism is often known as a typeglob because the * on the front can be considered a wildcard match for all the funny prefix characters on variables and subroutines and such.

When evaluated, the typeglob produces a scalar value that represents all the objects of that name, including any filehandle, format, or subroutine. When assigned, a typeglob causes the name mentioned to refer to whatever * value was assigned to it.

Note that scalars are already passed by reference, so you can modify scalar arguments without using this mechanism by referring explicitly to $_[0], $_[1], and so on. You can modify all the elements of an array by passing all the elements as scalars, but you have to use the * mechanism (or the equivalent reference mechanism) to push, pop, or change the size of an array. It is certainly faster to pass the typeglob (or reference).

Pass By Reference

If you want to pass more than one array or hash into a function-or return them from a function-and have them maintain their integrity, you must use an explicit pass by reference. First, you need to understand references as detailed in the perlref man page.

Prototypes

As of the 5.002 release of Perl, if you declare


sub mypush (\@@)

then mypush() takes the same arguments that push() does. The declaration of the function to be called must be visible at compile time. The prototype only affects the interpretation of new-style calls to the function, where new-style is defined as not using the & character. In other words, if you call mypush like a built-in function, then it behaves like a built-in function. If you call mypush like an old-fashioned subroutine, then it behaves like an old-fashioned subroutine. From this rule, you can see that prototypes have no influence on subroutine references like \&foo or on indirect subroutine calls like &{$sub-ref}.

Overriding Built-In Functions

Although you can override many built-in functions, you should only do so occasionally and for good reason. For example, a package attempting to emulate missing built-in functionality on a non-UNIX system might override a built-in function. Discussion of this capability is beyond the scope of this chapter.

What's Next?

Chapter 15, "Perl in Internet Applications," presents a tutorial of Perl programming in the context of developing a CGI application. The programs show basic Perl programming and allow you to modify them to gain practice in coding in Perl.

Summary

You have learned a great deal of information about the Perl language in this chapter. Perl is a complex, full-featured scripting language, and a complete treatment of the language requires thousands of pages. This chapter is intended as a reference chapter to the Perl language to help you use it for Internet programming. It provides references to most of the common language elements that you would use in programming Web pages. The next chapter rounds out the explanation of Perl for Internet programming by presenting techniques and examples using many of the functions and features detailed in this chapter.