Wednesday, August 6, 2008

Logical Operators

Logical operators are such things as OR, NOT, AND. They all evaluate expressions. The expression evaluates to true, or false. Exactly what criteria for evaluation are used depends on the operator.

or
The or operator works as follows:
open STUFF, $stuff or die "Cannot open $stuff for read :$!";

This line means -- if the operation for opening STUFF fails, then do something else. Another example:
$_=shift;

/^R/ or print "Doesn't start with R\n";

If the regular expression is false, then whatever is on the left side of the or is printed. As you know, shift works on @ARGV if no target is given, or @_ inside a subroutine.

Perl has two OR operators. One is the now familiar or and the other is || .




Precedence: What comes First
To understand the difference between the two we need to talk about precedence. Precedence means priority, order, importance. A good example is:
perl -e"print 2+8

which we know equals 10. But if we add:
perl -e"print 2+8/2

Now, will this be 2+8 == 10, divided by 2 == 5? Or maybe 8/2 == 4, plus 2 == 6?

Precedence is about what is done first. In the example above, you can see that the division is done first, then the addition. Therefore, division has a higher precedence that addition.

You can force the issue with parens:

perl -e"print ((2+8)/2)

which forces Perl, kicking and screaming, to evaluate 2+8 then divide the result by 2.

So what has this to do with logical operators? Well, the main difference between or and || is precedence.

In the example below, we attempt to assign two variables to non-existent elements of an array. This will fail:

@list=qw(a b c);

$name1 = $list[4] or "1-Unknown";

$name2 = $list[4] || "2-Unknown";

print "Name1 is $name1, Name2 is $name2\n";

print "Name1 exists\n" if defined $name1;
print "Name2 exists\n" if defined $name2;


The output is interesting. The variable $name2 has been created, albeit with a false value. However, $name1 does not exist. The reason is all about precedence. The or operator has a lower precedence than || .

This means or looks at the entire expression on its left hand side. In this case, that is $name1 = $list[4] . If it is true, it gets done. If it is false, it is not and the right hand side is evaluated, and the left hand side is ignored as if it never existed. In the example above, once the left side is found to be false, then all the right side evaluates is "1-Unknown" which may be true but doesn't produce any output.

In the case of || , which has a higher precedence, the code immediately on the left of the operator is evaluated. In this case, that is $list[4]. This is false, so the code immediately to the right is evaluated. But, the original code on the left which was not evaluated, $name2 = is not forgotten. Therefore, the expression evaluated to $name2 = "2-Unknown".

The example below should help clarify things:

@list=qw(a b c);

$ele1 = $list[4] or print "1 Failed\n";
$ele2 = $list[4] || print "2 Failed\n";

print <ele1 :$ele1:

ele2 :$ele2:

PRT

The two failure codes are both printed, but for different reasons. The first is printed because we are assigning $ele1 a false value, so the result of the operation is false. Therefore, the right hand side is evaluated.

The second is printed because $list[4] itself false. Yet, as you can see, $ele2 exists. Any idea why?

The reason is that the result of print "2-Failed\n" has been assigned to $ele2. This is successful, and therefore returns 1.

Another example:

$file='not-there.txt';

open FILE, $file || print "1: Can't open file:$!\n";

open FILE, $file or print "2: Can't open file:$!\n";


In the first example, the error message is not printed. This is because $file is evaluating to true. However, in the second example, or looks at the entire expression, not just what is immediately to the left and takes action on the result of evaluating the entire left hand side, not just the expression immediately to its left.

You can fix things with parens:

$file='not-there.txt';

open FILE, $file || print "1: Can't open file:$!\n";

open FILE, $file or print "2: Can't open file:$!\n";

open (FILE, $file) || print "3: Can't open file:$!\n";


like so, but why bother when you have a perfectly good operator in or ? You could apply parens elsewhere:
@list=qw(a b c);

$name1 = $list[4] or "1-Unknown";

($name2 = $list[4]) || "2-Unknown";

print "Name1 is $name1, Name2 is $name2\n";

print "Name1 exists\n" if defined $name1;
print "Name2 exists\n" if defined $name2;

Now, ($name2 = $list[4]) is evaluated as a complete expression, not just as $list[4] is evaluated as a complete expression, not just as $list[4], so we get exactly the same result as if we used or .

And
now for something similar. And. Logical AND operators evaluate two expressions, and return true only if both are true. Contrast this with OR, which returns true only of one or more of the two expressions are true. Perl has a few AND operators.

The first type of AND we will look at is && :

@list=qw(a b c);

print "List is:@list\n";

if ($list[0] eq 'x' && $list[2]++ eq 'd') {
print "True\n";
} else {
print "False\n";
}

print "List is:@list\n";

The output here is false. It is clear that $list[0] does not equal x . As AND statements can only return true if both expressions being evaluated are true, then as the first statement is false this is an obvious non-starter and perl decides it need not continue to the second statement. Entirely sensible.

The second type of AND statement is & . This is similar to && . See if you can work out what the difference is using this example:

@list=qw(a b c);

print "List is:@list\n";

if ($list[0] eq 'x' & $list[2]++ eq 'd') {
print "True\n";
} else {
print "False\n";
}

print "List is:@list\n";

The difference is that the second part of the expression is evaluated no matter what the result of the first part is. Despite the fact that the AND statement cannot possibly return true, perl goes ahead and evaluates the second part of the statement anyway, hence $list[2] ends up as d .

The third AND which we will look at is and . This behaves in the same way as && but is lower precedence. Therefore, all the guidelines about || and or apply.


Other Logical Operators
Perl has not , which works like ! except for low precedence. If you are wondering where you have seen ! before, what about:
$x !~/match/;

if ($t != 5) {

as two examples. There is also Exclusive OR, or XOR. This means:
If one expression is true, XOR returns true
If both expressions are false, XOR returns false
If both expressions are true, XOR returns false (the crucial difference from OR)
This needs an example. Jane and Sonia are two known troublemakers, with a reputation for throwing good beer around, going topless at inappropriate moments and singing out of tune to the karaoke machine. You only want to let one of them into your party, and instead of a big muscle-bound bouncer you have this perl script on the door:
($name1,$name2)=@ARGV;

if ($name1 eq 'Jane' xor $name2 eq 'Sonia') {
print "OK, allowed\n";
} else {
print "Sorry, not allowed\n";
}

I would suggest running it thus:
perl script.pl Jane Karen
(one true, one false)

perl script.pl Jim Sonia
(one true, one false)

perl script.pl Jane Sonia
(both true)

perl script.pl Jim Sam
(both false)

Well, the script is not perfect as a doorman, as all Jane and Sonia have to do is type their names in lowercase, but hopefully it demonstrated xor .

One thing to beware of is:

$_=shift;

print "OK\n" unless not(!/r/i || /o/i & /p/ or /q/);

over-complication, and believe me the above is not as complicated as it could be. Take the time to understand what you want to do. Perl provides a plethora of logical operands so you really don't have any excuse for not writing legible code. The above can be written a lot more concisely and clearly. As well as a lot more obscurely :-)

@ARGV

Bondage and Discipline

Perl is a very flexible language. It is designed as a hacking tool, for quick sysadmin magic. It can do quite a bit more besides, but being small and powerful is a core Perl feature. Earlier on I said Perl is not a bondage and discipline language -- to qualify that, it doesn't have to be. However, there is a time and place for everything.

For tiny scripts you don't want to be declaring variables, typecasting and generally spending more time obeying rules than you do getting the job done. So, Perl doesn't force you to do all of these good programming practices. However, not all your programs are going to be five-minute hacks. Some will be pretty large. Therefore, some Discipline is in order.

Perl has two primary methods of enforcing discipline. They are:

-w for Warnings
use strict;


-w
Consider for a moment this little program:
@input=@ARGV;

$outfile='outfile.txt';
open OUT, ">$outfile" or die "Can't open $outfile for write:$!\n";

$input2++;
$delay=2 if $input[0] eq 'sleep';

sleep $delay;

print "The first element of \@input is $input[0]\n";
print OUY "Slept $delay!\n";

It doesn't do much. Just prints out the first argument supplied, and demonstrates the uninspiring sleep function. The program itself is full of holes, and it is only a few lines. How many errors can you spot? Try and count them. When you are finished, execute the program with error-checks enabled:
perl -w script.pl hello

Perl finds quite a few errors. The -w switch finds, among other heinous sins:
Variables used only once. In the example, $input2 is used only once. It is a useless variable.
Filehandles used incorrectly. With print OUY I'm trying to print to a non-existent filehandle. With -w an alarm is raised, as it would be if I tried to write to a filehandle which was read-only.
Use of uninitialised variables. The variable $delay is uninitialised if 'sleep' is not the first parameter. Making variables spring into the air on demand is not good programming practice. They should be defined carefully first.
So, generally, -w is a Good Thing. It forces you to write cleaner code. So use it, but don't be afraid not to for very short programs.



Shebang
You know that you can turn warnings on with -w on the command line. You can also turn them on within the script itself. For that matter, you can give perl any command line option within the script itself. For example:
perl script.pl hello

to execute this:
#!perl -w

@input=@ARGV;

$outfile='outfile.txt';
open OUT, ">$outfile" or die "Can't open $outfile for write:$!\n";

$input2++;
$delay=2 if $input[0] eq 'sleep';

sleep $delay;

print "The first element of \@input is $input[0]\n";
print OUY "Slept $delay!\n";

has the same effect as:
perl -w script.pl hello

It may be more convenient for you to put the flag inside the script. It doesn't have to be just -w , it can be any argument Perl supports. Run
perl -h
for a full list.

The first line, #!perl -w is the shebang line. This is derived from UNIX, where Perl was first developed. UNIX systems make a script executable by changing an attribute. The operating system then loads the file and works out how to execute it -- in this case by looking at the first line, then loading the perl interpreter. Windows systems know that all files with a certain extension must be passed to a certain program for execution, eg all .bat files are passed to command.com, and all .xls files are passed to Excel. The point of all this being that you don't need a shebang line, but it doesn't hurt.



use strict;
So what's strict and how do you use it? The module strict restricts 'unsafe constructs', according to the perldocs. The strict module is a pragma, which is a hint that must be obeyed. Like when your girlfriend says 'oh, that ring is *far* too expensive'.

There is no need to be frightened about unsafe code if you don't mind endless hours of debugging unstructured programs. When you enable the strict module, the three things that Perl becomes strict about are:

Variables 'vars'
References 'refs'
Subroutines 'subs'
This tutorial doesn't presently cover references (and let's hope I remember to remove this sentence if I do cover it in later versions) so we won't worry about refs.

Strict variables are useful. Essentially, this means that all variables must be declared, that is defined before use rather than springing into existence as required. Furthermore, each variable must be defined with my or fully qualified. This is an example of a program that is not strict, and should be executed something like this:

perl script.pl "Alain James Smith";

where the "" enclose the string as a single parameter as opposed to three separate space-delimited parameters.
#use strict; # uncomment after running a couple of times

$name=shift; # shifts @ARGV if no arguments supplied

print "The name is $name\n";
$inis=&initials($name);

$luck=int(rand(10)) if $inis=~/^(?:[a-d]|[n-p]|[x-z])/i;

print "The initials are $inis, lucky number: $luck\n";

sub initials {
my $name=shift;
$initials.=$1 while $name=~/(\w)\w+\s?/g;
return $initials;
}

By now you should be able to work out what the above does. When you uncomment the use strict; pragma, and re-run the program, you will get output something like this:
Global symbol "$name" requires explicit package name at n1.pl line 3.
Global symbol "$inis" requires explicit package name at n1.pl line 6.
Global symbol "$luck" requires explicit package name at n1.pl line 8.
Global symbol "$initials" requires explicit package name at n1.pl line 14.
Execution of n1.pl aborted due to compilation errors.

These warnings mean Perl is not exactly clear about what the scope of variables is. If Perl is not clear, you might not be either. So you need to be explicit about your variables, which means either declaring them with my so they are restricted to the current block, or referring to them with their fully qualified name. An example, using both methods:
use strict;

$MAIN::name=shift; # shifts @ARGV if no arguments supplied

print "The name is ",$MAIN::name,"\n";
my $inis='';
my $luck='';

$inis=&initials($MAIN::name);

$luck=int(rand(10)) if $inis=~/^(?:[a-d]|[n-p]|[x-z])/i;

print "The initials are $inis, lucky number: $luck\n";

sub initials {
my $name=shift;
my $initials;
$initials.=$1 while $name=~/(\w)\w+\s?/g;
return $initials;
}


The my variables in the subroutine are nothing new. The my variables outside the subroutine are. If you think about it, the main program itself is also a kind of block, and therefore variables can be lexically scoped to be visible only within the block.

The other interesting bit is the $MAIN::name business. This, as you might expect, is the fully qualified name of the variable. The first part is the package name, in this case MAIN. The second part is the actual variable name. Personally, I've never needed to refer to a variable this way. I'm not saying you'll never use the syntax, but I would suggest that knowing this is not on a perl students Top 10 list of Things to Master.

The important thing about use strict is that it does enforce more discipline than you have been used to, and for all but the smallest of programs, that is most definitely a Good Thing.

Debugging

Sooner or later you'll need to do some fairly hairy debugging. It will be later if you are using strict , -w and writing your subroutines properly, but the moment will come.

When it does you'll be poring over code, probably late at night, wondering where the hell the problem is. Some techniques I find useful are:

Print your variables and other information out at frequent intervals.
Split difficult components of the program out into small, throwaway scripts. Get these working, then copy the results back into the main program.
# Comment frequently.
Eventually, you'll be stuck. Such is the price of progress. In this case, Perl's own debugger can be invaluable. Run this code as normal first:
$name=shift;

print "Logon name creation program\n:Converting '$name'\n";

print &logname($name),"\n\n";

print "Program ended at", scalar(localtime),"\n";

sub logname {
my $name=shift;
my @namebits;
my ($logon,$inital);
@namebits=split /\s+/,$name;
($inital)=$name=~/(\w)/;
$logon=$inital.$namebits[$#namebits];
$logon=lc $logon;
return $logon;
}

We'll run it with the debugger so you can watch perl at work while it runs:
perl -d script.pl "Peter Dakin";

and you are into the debugger, which should look something like this:
c:\scripts\db.pl>perl -d db.pl "Peter Dakin"

Loading DB routines from perl5db.pl version 1.0401
Emacs support available.

Enter h or `h h' for help.

main::(db.pl:1): $name=shift;
DB<1>

db.pl Name of script being executed
1 Line number of script that is just about to be executed.
$name=shift; The code that is just about to be executed.


Type s for a single step and press enter. The code $name=shift; will be executed, and perl waits for your next command. Keep inputting s until the program terminates.

This by itself is useful as you see the subroutine flow, but if you enter h for help you'll see a bewildering range of debug options. I won't detail them all here, but some of the ones I find most useful are:
n Executes main program, but skips subroutine calls. The subroutine is executed, but you aren't stepped through it. Try using n instead of s .
/xx/ Searches through program for xx
p Prints, for example p @namebits, p $name
Enter Pressing the Enter key (inputting a carriage return) repeats the last n or s command.
perlcode You can type any perl code in and it will be evaluated, and have a effect on your program. In the example below I remove spaces from $name. Inputs in bold:
main::(db.pl:1): $name=shift;
DB<1> s
main::(db.pl:3): print "Logon name creation program\n:Converting '$name'\n";
DB<1> $name=~s/\s//g;

DB<2> print $name
MarkGray
DB<3>




There are many, many more debugger options which are worth becoming familiar with. Type h for a full list.

Modules

An introduction
Subroutines are oft-used pieces of code. They exist so you can re-use the code and not have to constantly rewrite it.

A module is, in principle, similar to a subroutine. It is also an oft-used piece of code. The difference is that modules don't live in your program, they are their own separate script outside your code. For example, you might write a routine to send email. You could then use this code in ten, a hundred, a thousand different programs just by referencing the original program.

As you would expect, the basic Perl package includes a large number of modules. These have been written by people who had a need for the code, made it a module and released it into the big wide world. Many of these modules have been debugged, improved and documented by yet more people. To quote the OpenSource mantra, all bugs are shallow under the scrutiny of every programmer.

Aside from the many modules included with Perl there are hundreds more available on CPAN, the Comprehensive Perl Archive Network. Refer to your documentation for details.




File::Find -- using a module
An example of a module included with Perl is File::Find. There are several modules under the File::Find section, such as File::Basetree, File::Compare and File::Stat.

This is an example of how File::Find can be used:

use File::Find;

$dir1='/some/dir/with/lots/of/files';
$dir2='/another/directory/';

find(\&wanted, $dir1,$dir2);

sub wanted {
print "Found it $File::Find::dir/$_\n" if /^[a-d]/i;

}

The first line is the most important. The use function loads the File::Find module. Now, all the power and functionality of File::Find is available for use. Such as the find function. This accepts two basic parameters:
The name of a subroutine, usually wanted which defines what you want to do with the list of files being returned. The filename will be in $_.
A list of directories to be searched. Subdirectories will also be searched.
The subroutine wanted simply prints the directory the file was found in if the filename begins with a,b,c or d. Make your own regex to suit. The line $File::Find::dir means the $dir variable in the module $File::Find. This is explained further in the next section.

Note -- the \&wanted parameter is a reference to a subroutine. Essentially, this means that the code in File::Find knows where to find the &wanted subroutine. It is basically like shortcuts under Windows 9x and NT4, instead of actual files (but the UNIX Perl people would slaughter me for that, so be quiet).




ChangeNotify
Another example is Win32::ChangeNotify. As you might expect there are a number of Win32-specific modules, and ChangeNotify is one of them. It waits until a something changes in a directory, then acts. What it waits for and what it does are up to you, for example:
use Win32::ChangeNotify;

$Path='/downloads';
$WatchSubTree=0;
$Events='FILE_NAME';
$browser='E:/progs/netscape/Communicator/program/netscape.exe';
$changes=0;

$notify = Win32::ChangeNotify->new($Path,$WatchSubTree,$Events);

while (1) {
print "- ",scalar(localtime)," $changes so far to $Path.\n";
$notify->wait;
++$changes;
print "- ",scalar(localtime), " Launching $browser...\n";
system("$browser $Path");
$notify->reset;
}


Again, the module is incorporated into the program with use . An object referred to by the variable $notify is created. The parameters passed are the path to be watched, whether we want to watch subtrees, and what sort of events we want to be notified about, in this case only filename changes.

Then, we enter a loop which continues while 1 is true -- which will be forever.

The program pauses when the wait method of the $notify notify object is called. Only when there is a change to the directory, then the rest of the subroutine completes, launching the browser. We have to reset the $notify object.

There is some pretty frightening stuff about objects in the explanation. But you don't actually need to understand anything about objects. Just read the documentation, and experiment.

You can use as many modules as you like in one program. As they are all written with carefully scoped variables you need not worry about programmers using the same variable names in different modules. Now you *really* appreciate scoping!



Your Very Own Module
You too can write your own modules. It is easy. First, we will create the fantastic bit of code that we want to re-use everywhere. First, we'll write a normal Perl program:
$name=shift;

print &logname($name);

sub logname {
my $name=shift;
my @namebits;
my ($logon,$inital);
@namebits=split /\s+/,$name;
($inital)=$name=~/(\w)/;
$logon=$inital.$namebits[$#namebits];
$logon=lc $logon;
return $logon;
}

Execute like so; perl script.pl "Nick Bladon"

The script itself is nothing amazing. The lc function stands for LowerCase, or probably lOWERcASE -- you can see what it does.

In order to turn it into a module carry out the following steps:

Find out where your copy of Perl is installed, for example c:\progs\perl.
Within that directory there should be a lib directory.
Make a directory within lib, for example c:\progs\perl\lib\RMP\
Now we'll make the module. Remember, a module is just code you are going to reuse. So we don't need all of the above example. Just this bit:
sub logname {
my $name=shift;
my @namebits;
my ($logon,$inital);
@namebits=split /\s+/,$name;
($inital)=$name=~/(\w)/;
$logon=$inital.$namebits[$#namebits];
$logon=lc $logon;
return $logon;
}

1;

The bit that has been added is the 1 at the bottom. Why? Perl requires that all modules return true. We know that a subroutine always returns the value of the last expression evaluated. As 1 evaluates to true, that'll do.

You need to save this as logon.pm in your newly created directory under lib. The pm stands for Perl Module.

That's it. A module created. To use, just make a normal Perl script such as:

use RMP::logon;

$name=shift;

print logname($name);

and hey presto! Module power is yours!

You don't have to create your own subdirectory within lib but I would advise it for the sake of neatness. And as you might expect, there is a lot more to learn about modules but this is supposed to be a basic tutorial, so that's enough for the time being.

Subroutines and Parameters

In Perl, subroutines are functions are subroutines. If you like, a subroutine is a user defined function. It's a bit like calling a script a program, or a program a script. For the purposes of this tutorial we'll refer to functions as subroutines, except when we call them functions. Hope that's made the point.

For the purposes of this section we will develop a small program which, by the end, will demonstrate how subroutines work. It also serves to demonstrate how many programs are built, namely a little at a time, in manageable sections. At least, that method works for me. engines.

The chosen theme is gliding. That's aeroplanes without engines. A subject close to every glider pilot's heart is how far they can fly from the altitude they are at. Our program will calculate this. To make it easy we'll assume the air is perfectly calm. Wind would be a complication we don't need, especially when in a crowded lift.

What we need in order to calculate the distance we can fly is:

How high we are (in feet)
How many metres we travel forward for every metre we drop. This is the glide ratio, for example 24:1 would mean travelling 24 metres forward for every 1 metre of height lost.

Obviously input is needed. We can either prompt the user or grab the input from the command line. The latter is easier so we'll just look at @ARGV for the command line parameters. Like so:

($height,$angle)=@ARGV; # @ARGV is the command line parameters

$distance=$height*$angle; # an easy calculation

print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

The above should be executed thus:
perl yourscript.pl 5000 24

or whatever your script is called, with whatever parameters you choose to use. I'm a poet and I don't even know it.

That works. What about a slight variation? The pilot does have some control over the glide ratio, for example he can fly faster but at a penalty of a lesser glide ratio. So we should perhaps give a couple of options either side of the given parameters:

($height,$angle)=@ARGV;

$distance=$height*$angle;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

$angle++; # add 1 to $angle
$distance=$height*$angle;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

$angle-=2; # subtract 2 from $angle so it is 1 less than the original
$distance=$height*$angle;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

That's cumbersome code. We repeat exactly the same statement. This wastes space, and if we want to change it there are three changes to be made. A better option is to put it into a subroutine:
($height,$angle)=@ARGV;

&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

$angle++;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

$angle-=2;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

sub howfar { # sub subroutinename
$distance=$height*$angle;
}

This is a basic subroutine, and you could stop here and have learnt a very useful technique for programming. Now, when changes are made they are made in one place. Less work, less chances of errors. Improvements can always be made. For example, pilots outside Eastern Europe generally measure height in feet, and glider pilots are usually concerned with how many kilometres they travel over the ground. So we can adapt our program to accept height in feet and output the distance in kilometres:
($height,$angle)=@ARGV;

$height/=3.2; # divide feet by 3.2 to get metres

&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

$angle++;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

$angle-=2;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

sub howfar {
$distance=$height*$angle;
}

When you run this you'll probably get a result which involves a fair few digits after the decimal point. This is messy, and we can fix this by the int function, which in Perl and most other languages returns a number as an integer, ie without those irritating numbers after the decimal point.

You might have also noticed a small bit of Bad Programming Practice slipped into the last example. It was the evil Constant, the '3.2' used to convert feet to metres. Why, I don't hear you ask, is this bad? Surely the conversion will never change?

It won't change, but our use of it might. We may decide that it should be 3.208 instead of 3.2. We may decide to convert from feet to nautical miles instead. You don't know what could happen. Therefore, code with flexibility in mind and that means avoiding constants.

The new improved version with int and constant removed:

($height,$ratio)=@ARGV;
$cnv1=3.2; # now it is a variable. Could easily be a cmd line
# parameter too. We have the flexibility.
$height =int($height/$cnv1); # divide feet by 3.2 to get metres

&howfar;
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";

$ratio++;
&howfar;
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";

$ratio-=2;
&howfar;
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";

sub howfar {
$distance=int($height*$ratio);
}

We could of course build the print statement into the subroutine, but I usually separate output presentation from the calculation. Again, that means it is easier to modify later on.

Something else we can improve about this code is the use of the $ratio variable. We are having to keep track of what we do to it -- first add one, then subtract two in order to subtract one from the original input. In this case it is fairly easy, but with a complex program it can be difficult, and you don't want to be creating lots of variables just to track one input, for example $ratio1 , $ratio2 etc.




Parameters
One solution is to pass the subroutine parameters. In the best tradition of American columnists, who seem to have a particular affection for this phrase, 'Here's how:'
($height,$ratio)=@ARGV;
$cnv1=3.2;

&howfar($height,$ratio);
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";

&howfar($height,$ratio+1);
print "With a glide ratio of ",$ratio+1,":1 you can fly $distance from $height\n";

&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from $height\n";

sub howfar {
print "The parameters passed to this subroutine are @_\n";
($ht,$rt)=@_;
$ht =int($ht/$cnv1);
$distance=int($ht*$rt);
}

Quite a few things have changed here. Firstly, the subroutine is being called with parameters. These are a comma-delimited list in parens after the subroutine call. The two parameters are $height and $ratio.

The parameters end up in the subroutine as the @_ array. Being an array, they are in the same order as passed. All the usual array operations work. All we will do is assign the contents of the array to two variables.

We have also moved the conversion function into the subroutine, because we want to put all the code for generating the distance into one place.



Namespaces
We cannot use the variable names $height and $ratio because we modify them in the subroutine and that will affect the main program. So we choose new ones to do the operation on. Finally, a small change is made to the print output.

This approach works well enough for our small program here. For larger programs, having to think of new variable names all the time is difficult. It would be even more difficult if different programmers were working on different sections of the program. It would be impossible if a program were written, then an extension created by another person somewhere else, and that same extension had to be used by many people in many different programs. Obviously, the risk of using the same variable name is too great. There are only so many logical names out there.

There is a solution. Imagine you own a house with two gardens. You have two identical dogs, one in the front garden, one in the back garden. Bear with me, this is relevant. Both dogs are called Rover, because their owner lacks imagination.

When you go to the front garden and call 'Rover!!!' or open a can of dog food, the dog in the front garden comes running. Similarly, you go to the back garden, call your dog and the other dog bounces up to you.

You have two dogs, both called Rover, and you can change either one of them. Wash one, neuter the other -- it doesn't matter, but both are dogs and both have the same name. Changes to one won't affect the other. You don't get them confused because they are in different places, in two different namespaces.




Variable Scope
To bring things back to Perl, a short diversion is necessary to illustrate the point with actual Perl code instead of canine metaphors:
$name='Rover';
$pet ='dog';
$age =3;

print "$name the $pet is aged $age\n";

{
my $age =4; # run this again, but comment this line out
my $name='Spot'; # and this one
$pet ='cat';

print "$name the $pet is aged $age\n";
}

print "$name the $pet is aged $age\n";

This is pretty straightforward until we get to the { . This marks the start of a block. One feature of a block is that it can have its own namespace. Variables declared, in other words initialised, within that block are just normal variables, unless they are declared with my .

When variables are declared with my they are visible inside the block only. Also, any variable which has the same name outside the block is ignored. Points to note from the example above:

The two my variables appear to overwrite the variables of the same name from outside the block.
The two original variables aren't really overwritten because as we prove after the block has ended, they haven't been touched.
The variable $pet is accessible inside and outside the block as usual. Of course, if we declare it with my then things will change.


my Variables
So there we have it. Namespaces. They work for all the other types of variable too, like arrays and hashes. This is how you can write code and not care about what other people use for variable names -- you just declare everything with my and have your own private party. Our original program about gliding can be improved now:
($height,$ratio)=@ARGV;
$cnv1=3.2;

&howfar($height,$ratio);
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";

&howfar($height,$ratio+1);
print "With a glide ratio of ",$ratio+1,":1 you can fly $distance from $height\n";

&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from $height\n";

sub howfar {
my ($height,$ratio)=@_;
$height =int($height/$cnv1);
$distance=int($height*$ratio);
}


The only change is that the parameters to the subroutine, ie the contents of the array @_ , are declared with my . This means they are now only visible within that block. The block happens to also be a subroutine. Outside of the block, the original variables are still accessible. At this point I'll introduce the technical term, which is lexical scoping. That means the variable is confined to the block -- it is only visible within the block.

We still have to be concerned with what variables we use inside the subroutine. The variable $distance is created in the subroutine and used outside of it. With larger programs this will cause exactly the same problem as before -- you have to be careful that the subroutine variables you use are the same ones as outside the subroutine. For all the same reasons as before, like two different people working on the code and use of custom extensions to Perl, that can be difficult.

The obvious solution is to declare $distance with my , and thus lexically scope it. If we do this, then how do we get the result of the subroutine? Like so:


($height,$ratio)=@ARGV;
$cnv1=3.2;

$distance=&howfar($height,$ratio); # run this again and delete '$distance='
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";

$distance=&howfar($height,$ratio+1);
print "With a glide ratio of ",$ratio+1,":1 you can fly $distance from $height\n";

$distance=&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from $height\n";

sub howfar {
my ($height,$ratio)=@_;
my $distance;
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000); # output result in kilometres not metres
}

First change -- $distance is declared with my . Secondly, the result of the subroutine is assigned to a variable, which is also named $distance. However, it is a $distance in a different namespace. Remember the two gardens. You may wish to delete the $distance= from the first assignment and re-run the code. The only other change is one to change the output from meters to kilometres.

We have now achieved a sort of Black Box effect, where the subroutine is given input and creates output. We pass the subroutine two numbers, which may or may not be variables. We assign the output of the subroutine to a variable. We care not what goes on inside the subroutine, what variables it uses or what magic it performs. This is how subroutines should operate. The only exception is the variable $cnv1. This is declared in the main body of the program but also used in the subroutine. This has been done in case we need to use the variable elsewhere. In larger programs it would be a good idea to pass it to subroutines along with the other parameters too.




Multiple Returns
That's all the major learning out the way with. The next step is relatively easy, but we need to add new functionality to the program in order to demonstrate it. What we will do is work out how long it will take the glider pilot to fly the distance. For this calculation, we need to know his airspeed. That can be a third parameter. The actual calculation will be part of howfar. An easy change:
($height,$ratio,$airspeed)=@ARGV;
$cnv1=3.2;
$cnv2=1.8;

($distance,$time)=&howfar($height,$ratio,$airspeed);
print "Glide ratio $ratio:1, $distance from $height taking $time\n";

($distance,$time)=&howfar($height,$ratio+1,$airspeed);
print "Glide ratio ",$ratio+1,":1, $distance from $height taking $time\n";

($distance,$time)=&howfar($height,$ratio-1,$airspeed);
print "Glide ratio ",$ratio-1,":1, $distance from $height taking $time\n";

sub howfar {
my ($height,$ratio,$airspeed)=@_;
my ($distance,$time); # how to 'my' multiple variables
$airspeed*=$cnv2; # convert knots to kmph
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000);
$time =int($distance/($airspeed/60)); # simple time conversion
# print "Time:$time, Distance:$distance\n"; # uncomment this later
}

This doesn't work correctly. First, the changes. The result from howfar is now assigned to two variables. Subroutines return a list, and so assigning to some scalar variables between parens separated by commas will work. This is exactly the same as reading the command line arguments from @ARGV .

We are also passing a new parameter, $airspeed. There is a another conversion and a one-line calculation to provide the amount of minutes it will take to fly $distance.

If you look carefully, you can perhaps work out what the problem is. There was a clue in the Regex section, when /e was explained.

The problem is that Perl returns the result of the last expression evaluated. In this case, the last expression is the one calculating $time, so the value $time is returned, and it is the only value returned. Therefore, the value of $time is assigned to $distance, and $distance itself doesn't actually get a value at all.

Re-run the program but this time uncomment the line in the subroutine which prints $distance and $time. You'll noticed the value is 1, which means that the expression was successful. Perl is faithfully returning the value of the last expression evaluated.

This is all well and good, but not what we need. What is required is a method of telling Perl what needs to be returned, rather than what Perl thinks would be a good idea:

($height,$ratio,$airspeed)=@ARGV;
$cnv1=3.2;
$cnv2=1.8;

($distance,$time)=&howfar($height,$ratio,$airspeed);
print "Glide ratio $ratio:1, $distance from $height taking $time\n";

($distance,$time)=&howfar($height,$ratio+1,$airspeed);
print "Glide ratio ",$ratio+1,":1, $distance from $height taking $time\n";

($distance,$time)=&howfar($height,$ratio-1,$airspeed);
print "Glide ratio ",$ratio-1,":1, $distance from $height taking $time\n";

sub howfar {
my ($height,$ratio,$airspeed)=@_;
my ($distance,$time); # how lexically scope multiple variables
$airspeed*=$cnv2; # convert knots to kmph
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000); # output result in kilometres not metres
$time =int($distance/($airspeed/60)); # simple time conversion
return ($distance,$time); # explicit return
}

A simple fix. Now, we tell Perl what to return, with the aptly named return function. With this function we have complete control over what is returned and when. It is quite usual to use if statements to control different return values, but we won't bother with that here.

There is a subtle flaw in the program above. It is not backwards compatible with the old method of calling the subroutine. Run this:

($height,$ratio,$airspeed)=@ARGV;
$cnv1=3.2;
$cnv2=1.8;

($distance,$time)=&howfar($height,$ratio,$airspeed);
print "Glide ratio $ratio:1, $distance from $height taking $time\n";

($distance,$time)=&howfar($height,$ratio+1,$airspeed);
print "Glide ratio ",$ratio+1,":1, $distance from $height taking $time\n";

$distance=&howfar($height,$ratio-1); # old way of calling it
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from $height\n";

sub howfar {
my ($height,$ratio,$airspeed)=@_;
my ($distance,$time);
$airspeed*=$cnv2;
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000);
$time =int($distance/($airspeed/60));
return ($distance,$time);
}

A division by 0 results third time around. This is of course because $airspeed doesn't exist, so of course it will effectively be 0. Making your subroutines backwards compatible is important in large programs, or if you are writing an add-in module for other people to use. You can't expect everyone to retrofit additional parameters to their subroutine calls just because you decided to be a bit creative one day.

There are many ways to fix the problem, and this is just one:

($height,$ratio,$airspeed)=@ARGV;
$cnv1=3.2;
$cnv2=1.8;

($distance,$time)=&howfar($height,$ratio,$airspeed);
print "Glide ratio $ratio:1, $distance from $height taking $time\n";

($distance,$time)=&howfar($height,$ratio+1,$airspeed);
print "Glide ratio ",$ratio+1,":1, $distance from $height taking $time\n";

$distance=&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from $height\n";

print "Direct print: ",join ",",&howfar(5000,55,60)," not bad for no engine!\n";

sub howfar {
my ($height,$ratio,$airspeed)=@_;
my ($distance,$time); # how to 'my' multiple variables
$airspeed*=$cnv2; # convert knots to kmph
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000); # output result in kilometres not metres
if ($airspeed > 0) {
$time =int($distance/($airspeed/60));
return ($distance,$time);
} else {
return $distance;
}
}

Here we just test the $airspeed to ensure we won't be doing any divisions by 0. It also affects what we return. There is also a new print statement, which shows that you don't need to assign to intermediate variables, or even pass variables as parameters. Constants, evil things that they are, work just as well. I already mentioned this, but a demonstration doesn't hurt. Unless you work for an electric chair manufacturer.

The astute reader.....:-) Every time I read that I wonder what I've missed. Usually something obscure which the author knows nobody will ever notice, but likes to belittle the reader. No exception here! Anyway, you may be wondering why this would not have sufficed instead of the if statement:

sub howfar {
my ($height,$ratio,$airspeed)=@_;
my ($distance,$time); # how to 'my' multiple variables
$airspeed*=$cnv2; # convert knots to kmph
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000); # output result in kilometres not metres
$time =int($distance/($airspeed/60)) if $airspeed > 0;
return ($distance,$time);
}

After all, the first item returned is $distance, so therefore it should be the first one assigned via:
$distance=&howfar($height,$ratio-1);

and $time should just disappear into the bit bucket.

The answer lies with scalars and lists. We are returning a list, but assigning it to a scalar. What happens when you do that? The scalar takes on the last value of the list. The last value of the list being returned is of course $time, which is has been declared but not otherwise touched. Therefore, it is nothing and appears as such on the printed statement. A small program to demonstrate that point:

$word=&wordfunc("Greetings");
print "The word is $word\n";

(@words)=&wordfunc("Bonjour");
print "The words are @words\n";

sub wordfunc {
my $word=shift; # when in a subroutine, shifts @_ if no target specified
my @words; # how to my an array
@words=split //,$word; # splits on the nothings between each letter
($first,$last)=($words[0],$words[$#words]); # see section on Arrays if required
return ($first,$last); # Returns just the first and last
}

As you can see, the first call prints the letter 's', which is the last element of the list that is returned. You could of course use a list consisting of just one element:
($word)=&wordfunc("Greetings");

Now we are assigning a list to a list, so perl starts at the first element and keeps assigning till it runs out of elements. The parens turns a lonely scalar into an element of a list. You might consider always assigning the results of subroutines this way, as you never know when the subroutine might change. I know I've just evangelised about how subroutines shouldn't change, but if you take care and the subroutine write takes care, there definitely won't be any problems!

That's about it for good old my . There is a lot more to learn about it but that's enough to get started. You now know about a little about variable visibility, and I don't mean changeable weather.




Local
There is one more function that I'd like to draw to your attention, and we'll launch straight into the demonstration:
@words=@ARGV;

print "Output Field Separator is :$,:\n";
print '1. Words:', @words, "\n";

&change;

$,='_';

print "\nOutput Field Separator is :$,:\n";
print '2. Words:', @words, "\n";

&change;

sub change {
print ' Words:', @words, "\n";
}
which should be executed something like this:
perl test.pl sarcasm is the lowest form of wit

The special variable $, defines what Perl should print in between lists it is given. By default, it is nothing. So the first two prints should have no spaces between the words. Then we assign '_' to $, so the next prints have underscores between the words.
If we want to use a different value for $, in the change subroutine, and not disturb the main value, we have a little problem. This problem cannot be solved by my because global variables like $, cannot at this time be lexically scoped. So, we could manually do it:

@words=@ARGV;

print "Output Field Separator is :$,:\n";
print '1. Words:', @words, "\n";

&change;

$,="_";

print "\nOutput Field Separator is :$,:\n";
print '2. Words:', @words, "\n";

&change;

sub change {
$save=$,;
$,='*';
print ' Words:', @words, "\n";
$,=$save;
}

That works, but it is messy. Perl has a special function for occasions of this nature, called local . An example of local in action:

@words=@ARGV;

print "Output Field Separator is :$,:\n";
print '1. Words:', @words, "\n";

&change;

$,="_";

print "\nOutput Field Separator is :$,:\n";
print '2. Words:', @words, "\n";

&change;

sub change {
local $,="!-!";
print ' Words:', @words, "\n";
}

You can try it with my instead but it won't work. I'm sure you'll try it anyway, I know you learn things the hard way otherwise you a) wouldn't be programming computers and b) wouldn't be using this tutorial to do it.

The local function works in a similar way to my , but assigns temporary values to global variables. The my function creates new variables that have the same name. The distinction is important, but the reasons require perl proficiency beyond the scope of this humble tutorial. In practice, the difference is:

lexically scoped variables (those declared with my )are faster than non-lexically scoped variables.
local variables are visible to called subroutines.
my doesn't work on global variables like $, so you must use local .


Returning arrays
So that's the end of subroutines and parameters. Would you believe I have only scratched the surface? There are closures, prototypes, autoloading and references to learn. Not, however, in this tutorial. At least not yet. I'll finish with one last demonstration. You may have noticed that Perl returns one long list from subroutines. This is fine, but suppose you want two separate lists, for example two arrays? This is one way to do it:
($w1,$w2)=&wordfunc("Hello World"); # Assign the array references to scalars

print "@$w1 and @$w2\n"; # deference, ie access, the arrays referred to
#print "$w1 and $w2\n"; # uncomment this next time round

sub wordfunc {
my $phrase=shift;
my (@words,@word1,@word2); # declare three variables lexically
@words=split /\s+/,$phrase; # split the phrase on whitespace
@word1=split //,$words[0]; # create array of letters from the first word
@word2=split //,$words[1]; # and the second
return (\@word1,\@word2); # return references to the two arrays -- scalars
}

There is a lot going on there. It should be clear up until the return statement. As we know, Perl only returns a single list. So, we make Perl return a list of the arrays it has just created. Not the actual arrays themselves, but references to the arrays. A bit like a shopping list is a just a bit of paper, not the actual goods itself. The reference is created by use of the \ backslash.

Having returned two array references they are assigned to scalar variables. If you uncomment the second print line you'll see two references to arrays.

The next problem is how to dereference the references, or access the arrays. The construct @$xxx does that for us. I know I said I wouldn't cover references, and I haven't -- that is just a useful trick.

This little section is not designed as a complete guide, it is just a taster of things to come. Perl is immensely powerful. If you think something can't be done, the problem is likely to be it is beyond your ability, not that of Perl.

Oneliners

A short example
You'll have noticed Perl packs a lot of power into a small amount of code. You can feed Perl code directly on the command line. This is known as a oneliner, for obvious reasons. An example:
perl -e"for (55..75) { print chr($_) }"

The -e switch tells Perl that a command is following. The command must be enclosed in doublequotes, not singles as on Unix. The command itself in this case simply prints the ASCII code for the number 55 to 75 inclusive.



File access
This is a simple find routine. As it uses a regex, it is infinitely superior to NT's findstr :
perl -e"while (<>) {print if /^[bv]/i}" shop.txt

Remember, the while (<>) construct will open whatever is in @ARGV . In this case, we have supplied shop.txt so it is opened and we print lines that begin with either 'b' or 'v'.
That can be made shorter. Run perl -h and you'll see a whole list of switches. The one we'll use now is -n , which puts a while (<>) { } loop around whatever code you supply with -e . So:

perl -ne"print if /^[bv]/i" shop.txt

which does exactly the same as the previous program, but uses the -n switch to put a while (<>) loop around whatever other commands are supplied.
A slightly more sophisticated version:

perl -ne"printf \"$ARGV : %3s : $_\",$. if /^[bv]/i" shop.txt

which demonstrates that doublequotes must be escaped.



Modifying files with a oneliner and $^I
If you don't remember $^I then please review the section on Files before proceeding. When you're ready, copy shop.txt to shop2.txt .

perl -i.bk -ne"printf \"%4s : $_\",$." shop2.txt

The -i switch primes the inplace edit operator. We still need -n .
If you had a typical quoted email message such as:

>> this is what was said
>> blah blah
> blaaaaahhh

The new text

and you wanted to remove the >, then:
perl -i.bk -pe"s/^>+ ?//" email.txt

does the trick. Regex recap -- the caret matches what follows to the beginning of the string, the + means one or more (no, we do not use * which means 0 or more), then we will match one space with \s , but it is not necessary for the space to be there for the match to be successful, hence ? .

What is new in terms of oneliners is the use of -p , which does exactly the same thing as -n except that it adds a print statement too. In case you were wondering why the previous example used -n and this one uses -p -- the previous example uses prints data with printf, whereas this example doesn't have an explicit print statement so we provide one with -p .

Some other useful oneliners -- a calculator and a ASCII number lookup:

perl -e"print 50/200+2"
perl -e"for (50..90) { print chr($_) }"

There are plenty more oneliners, and they are an essential part of any sysadmin's toolbox. The two examples below are functionally equivalent but the lower one is perhaps a little more readable:
perl -e"for $i (50..90) { print chr($i),\" is $i\n\" }"

perl -e"for $i (50..90) { print chr($i),qq| is $i\n| }

Whatever follows qq is used as a delimiter, instead of having to escape the backslash. I learnt this from the Perl-Win32-Users mailing list (see top) - I think it was Lennart Borgman who pointed it out. He also mentioned that you don't need the closing doublequote. Saves a little typing.

External Commands

Some ways to...
Perl can start external commands. There are five main ways to do this:
system
exec
Command Input, also known as `backticks`
Piping data from a process
Quote execute
We'll compare system and exec first.

Exec
Poor old exec is broken on Perl for Win32. What it should do is stop running your Perl script and start running whatever you tell it to. If it can't start the external process, it should return with an error code. This doesn't work properly under Perl for Win32. The exec function does work properly on the standard Perl distribution.



System
This runs an external command for you, then carries on with the script. It always returns, and the value it returns goes into $? . This means you can test to see if the program worked. Actually you are testing to see if it could be started, what the program does when it runs is outside your control if you use system .

This example demonstrates system in action. Run the 'vol' command from a command prompt first if you are not familiar with it. Then run the 'vole' command. I'm assuming you have no cute furry executables called vole on your system, or at least in the path. If you do have an executable called 'vole', be creative and change it.

system("vole");

print "\n\nResult: $?\n\n";

system("vol");

print "\n\nResult: $?\n\n";

As you can see, a successful system call returns 0. An unsuccessful one returns a value which you need to divide by 256 to get the real return value. Also notice you can see the output. And because system returns, the code after the first system call is executed. Not so with exec, which will terminate your perl script if it is successful. Perl's usual use of single and double quotes applies as per variable interpolation.

Backticks
These `` are different again to system and exec. They also start external processes, but return the output of the process. You can then do whatever you like with the output. If you aren't sure where backticks are on your keyboard, try the top left, just left of the 1 key. Often around there. Don't confuse single quotes '' with backticks `` .
$volume=`vol`;

print "The contents of the variable \$volume are:\n\n";

print $volume;

print "\nWe shall regexise this variable thus :\n\n";

$volume=~m#Volume in drive \w is (.*)#;

print "$1\n";

As you can see here, the Win32 vol command is executed. We just print it out, escaping the $ in the variable name. Then a simple regex, using # as a delimiter just in case you'd forgotten delimiters don't have to be / .

When to use external calls
Before you get carried away with creating elaborate scripts based on the output from NT's net commands, note there are plenty of excellent modules out there which do a very good job of this sort of thing, and that any form of external process call slows your script. Also note there are plenty of built in functions such as readdir which can be used instead of `dir` . You should use Perl functions where possible rather than calling external programs because Perl's functions are:

portable (usually, but there are exceptions). This means you can write a script on your Mac PowerBook, test it on an NT box and then use it live on your Unix box without modifying a single line of code;
faster, as every external process significantly slows your program;
don't usually require regexing to find the result you want;
don't rely on output in a particular format, which might be changed in the next version of your OS or application;
are more likely to be understood by a Perl programmer -- for example, $files=`ls`; on a Unix box means little to someone that doesn't know that ls is the Unix command for listing files, as dir is in Windows.
Don't start using backticks all over the place when system will do. You might get a very large return value which you don't need, and will consequently slurp lots of memory. Just use them when you actually want to check the returned strings.

Opening a Process
The problem with backticks is that you have to wait for the entire process to complete, then analyse the entire return code. This is a big problem if you have large return codes or slow processes. For example, the DOS command tree. If you aren't familiar with this command, run a DOS/command prompt, switch to the root directory (C:\ ) and type tree. Examine the wondrous output.
We can open a process, and pipe data in via a filehandle in exactly the same way you would read a file. The code below is exactly the same as opening a filehandle on a file, with two exceptions:

We use an external command, not a filename. That's the process name, in this case, tree.
A pipe, ie | is appended to the process name.
open TRIN, "tree c:\\ /a |" or die "Can't see the tree :$!";

while () {
print "$. $_";
}

Note the | which denotes that data is to be piped from the specified process. You can also pipe data to a process by using | as the first character.
As usual, $. is the line number. What we can do now is terminate our tree early. Environmentally unsound, but efficient.

open TRIN, "tree c:\\ /a |" or die "Can't see the tree :$!";

while () {
printf "%3s $_", $.;
last if $. == 10;
}

As soon as $. hits 10 we shut the process off by exiting the loop. Easy.

Except, maybe it won't. What if this was a long program, and you forgot about that particular line of code which exits the loop? Suppose that $. somehow went from 9 to 11, or was assigned to? It would never reach 10. So, to be safe

open TRIN, "tree c:\\ /a |" or die "Can't see the tree :$!";

while () {
printf "%3s $_", $.;
last if $. >= 10;
}

exit your loops in a paranoid manner, unless you really mean only to exit when at line ten. For maximum safety, maybe you should create your own counter variable because $. is a global variable. I'm not necessarily advocating doing any of the above, but I am suggested these things are considered.

You might notice the presence of a new keyword - printf . It works like print , but formats the string before printing. The formatting is controlled by such parameters as %3s , which means "pad out to a total of three spaces". After the doublequoted string comes whatever you want to be printed in the format specified. Some examples follow. Just uncomment each line in turn to see what it does. There is a lot of new stuff below, but try and work out what is happening. An explanation follows after the code.

$windir=$ENV{'WINDIR'}; # yes, you can access the environment variables !

$x=0;

opendir WDIR, "$windir" or die "Can't open $windir !!! Panic : $!";

while ($file= readdir WDIR) {
next if $file=~/^\./; # try commenting this line to see why it is there

$age= -M "$windir/$file"; # -M returns the age in days
$age=~s/(\d*\.\d{3}).*/$1/; # hmmmmm

#### %4.4d - must take up 4 columns, and pad with 0s to make up space
#### and minimum width is also 4
#### %10s - must take up 10 columns, pad with spaces
# printf "%4.4d %10s %45s \n", $x, $age, $file;

#### %-10s - left justify
# printf "%4.4d %-10s %-45s \n", $x, $age, $file;

#### %10.3 - use 10 columns, pad with 0s if less than 3 columns used
# printf "%4.4d %10.3d %45s \n", $x, $age, $file;

$x++;

last if $x==15; # we don't want to go through all the files :-)
}

There are some intentionally new functions there. When you start hacking Perl (actually, you already started if you have worked through this far) you'll see a lot of example code. Try and understand the above, then read the explanation below.
Firstly, all environment variables can be accessed and set via Perl. They are in the %ENV hash. If you aren't sure what environment variables are, refer to your friendly Microsoft documentation or books. The best known environment variable is path, and you can see its value and that of all other environment variables by simply typing set at your command prompt.

The regex /^\./ bounces out invalid entries before we bother do any processing on them. Good programming practice. What it matches is "anything that begins with '.'". The caret anchors the match to the beginning of the string, and as . is a metacharacter it has to be escaped.

Perl has several tests to apply on files. The -M test returns the age in days. See the documentation for similar tests. Note that the calls to readdir return just the file, not the complete pathname. As you were careful to use a variable for the directory to be opened rather than hardcoding it (horrors) it is no trouble to glue it together by using doublequotes.

Try commenting out $age=~s/(\d*\.\d{3}).*/$1/ and note the size of $age . It could do with a trim. Just for regex practice, we make it a little smaller. What the regex does is:

start capturing with (
look for 0 or more digits \d*
then a . (escaped)
followed by three digits \d{3}
and that's all we want to capture so the parens are closed. )
Finally, everything else in the string is matched .* where . is any character (almost) and * 0 or more. This is pretty much guaranteed to match to the end of the line
Having matched the entire string (and put part of it into $1 by using parens) we simply replace the string with what we have matched.
Easy !
Mention should also be made of sprintf , which is exactly like printf except it doesn't print. You just use it to format strings, which you can do something with later. For example :

open TRIN, "tree c:\\ /a |" or die "Can't see the tree :$!";

while () {
$line= sprintf "%3s $_", $.;
print $line;
last if $. == 10;
}



Quote execute
@opts=qw(w on ad oe b);

for (@opts) {
$result=qx(dir /$_);
print "dir /$_ resulted in:\n$result",'-' x 79;
sleep 1;
}

Anything within qx( ) is executed, and duly variable interpolated. This sample also demonstrated qw which is 'quote words', so the elements of @opts are delimited by word boundaries, not the usual commas. You can also use for instead of foreach if you want to save typing four character for the sake of legibility.

You may have noticed that system outputs the result of the command to the screen whereas qx does not. Each to its own.

Grep and Map

Grep
If you want to search a list, and create another list of things you found, grep is one solution. This is an example, which also demonstrates join again :
@stuff=qw(flying gliding skiing dancing parties racing); # quote-worded list

@new = grep /ing/, @stuff; # Creates @new, which contains elements of @stuff
# matching with 'ing' in them.

print join ":",@stuff,"\n"; # first makes one string out of the elements of @stuff, joined
# with ':' , then prints it, then prints \n

print join ":",@new,"\n";

Remember qw means 'quote words', so word boundaries are used as delimiters instead. The grep function must be fed a list on the right hand side. On the left side, you may assign the results to a list or a scalar variable. Assigning to a list gives you each actual element, and to a scalar gives you the number of matches found:
@stuff=qw(flying gliding skiing dancing parties racing);

$new = grep /ing/, @stuff;

print join ":",@stuff,"\n";

print "Found $new elements of \@stuff which matched\n";

If you decide to modify the elements on their way through grep , you actually modify the original list. Be careful out there.
@stuff=qw(flying gliding skiing dancing parties racing);

@new = grep s/ing//, @stuff;

print join ":",@stuff,"\n";
print join ":",@new,"\n";

To determine what actually matches you can either use an expression or a block. Up to now we've been using expressions, but when things become more complicated use a block:
@stuff=qw(flying gliding skiing dancing parties racing);

@new = grep { s/ing// if /^[gsp]/ } @stuff;

print join ":",@stuff,"\n";
print join ":",@new,"\n";

Try removing the braces and you'll get an error. Notice that the comma before the list has gone. It is now obvious where the expression ends, as it is inside a block delimited with { } . The regex says if the element begins with g, s or p, then remove ing. The result is only assigned to @new if the expression is completely true - 'parties' does begin with p, so that works, but s/ing// fails so the overall result is false, and the value is not assigned to @new .

Map
Map works the same way as grep , in that they both iterate over a list, and return a list. There are two important differences however:
grep returns the value of everything it evaluates to be true;
map returns the results of everything it evaluates.
As usual, an example will assist the penny in dropping, clear the fog and turn on the light (if not make my metaphors easier to understand):
@stuff=qw(flying gliding skiing dancing parties racing);

print "There are ",scalar(@stuff)," elements in \@stuff\n";
print join ":",@stuff,"\n";

@mapped = map /ing/, @stuff;
@grepped = grep /ing/, @stuff;

print "There are ",scalar(@stuff)," elements in \@stuff\n";
print join ":",@stuff,"\n";

print "There are ",scalar(@mapped)," elements in \@mapped\n";
print join ":",@mapped,"\n";

print "There are ",scalar(@grepped)," elements in \@grepped\n";
print join ":",@grepped,"\n";

You can see that @mapped is just a list of 1's. Notice that there are 5 ones whereas there are six elements in the original array, @stuff. This is because @mapped contains the true results of map -- in every case the expression /ing/ is successful, except for 'parties'.

In that case there the expression is false, so the result is discarded. Contrast this action with the grep function, which returns the actual value, but only if it is true. Try this:

@letters=(a,b,c,d,e);

@ords=map ord, @letters;
print join ":",@ords,"\n";

@chrs=map chr, @ords;
print join ":",@chrs,"\n";

This uses the ord function to change each letter into its ASCII equivalent, then the chr function convert ASCII numbers to characters. If you change map to grep in the example above, you can see that nothing appears to happen. What is happening is that grep is trying the expression on each element, and if it succeeds (is true) it returns the element, not the result. The expression succeeds for each element, so each element is returned in turn. Another example:
@stuff=qw(flying gliding skiing dancing parties racing);

print join ":",@stuff,"\n";

@mapped = map { s/(^[gsp])/$1 x 2/e } @stuff;
@grepped = grep { s/(^[gsp])/$1 x 2/e } @stuff;

print join ":",@stuff,"\n";
print join ":",@mapped,"\n";
print join ":",@grepped,"\n";

Recapping on regex, what that does is match any element beginning with g, s or p, and replace it with the same element twice. The caret ^ forces a match at the beginning of the string, the [square brackets] denote a character class, and /e forces Perl to evaluate the RHS as an expression.
The output from this is a mixture of 1 and nothing for map , and a three-element array called @grepped from grep. Yet another example:

@mapped = map { chop } @stuff;
@grepped = grep { chop } @stuff;

The chop function removes the last character from a string, and returns it. So that's what you get back from ^ , the result of the expression. The grep function gives you the mangled remains of the original value.

Writing your own grep and map functions
Finally, you can write your own functions:

@stuff=qw(flying gliding skiing dancing parties racing);

print join ":",@stuff,"\n";

@mapped = map { &isit } @stuff;
@grepped = grep { &isit } @stuff;

print join ":",@mapped,"\n";
print join ":",@grepped,"\n";

sub isit {
($word)=/(^.*)ing/;

if (length $word == 3) {
return "ok";
} else {
return 0;
}
}

The subroutine isit first grabs everything up until 'ing', puts it into $word , then returns 'ok' if the there are three characters in $word . If not, it returns the false value 0. You can make these subroutines (think of them as functions) as complex as you like.
Sometimes it is very useful to have map return the actual value, rather than the result. The answer is easy, but not obvious. Remember that subroutines return the value of the last expression evaluated? So, in this case, do blocks. What if the expression was, very simply:

@grepstuff=@mapstuff=qw(flying gliding skiing dancing parties racing);

print join " ",map { s/(^[gsp])/$1 x 2/e } @mapstuff;
print "\n";
print join " ",grep { s/(^[gsp])/$1 x 2/e } @grepstuff;

Now, make sure $_ is the last thing evaluated:
@grepstuff=@mapstuff=qw(flying gliding skiing dancing parties racing);

print join " ",map { s/(^[gsp])/$1 x 2/e;$_} @mapstuff;
print "\n";
print join " ",grep { s/(^[gsp])/$1 x 2/e } @grepstuff;

and there you have it. Now you understand that you can go and impress your friends, but please don't count on success.

Sorting

A Simple Sort
If I was reading this I'd be wondering about sorting. Wonder no more, and behold:

foreach (sort keys %countries) {
print "The key $_ contains $countries{$_}\n";
}

Spot the difference. Yes, sort crept in there. If you want the list sorted backwards, some cunning is called for. This is suitably foxy:
foreach (reverse sort keys %countries) {
print "The key $_ contains $countries{$_}\n";
}

Perl is just so difficult at times, don't you think ? This works because:
keys returns a list
sort expects a list -- and gets one from keys , and sorts it
reverse also expects a list, so it gets one and returns it
then the whole list is foreach 'd over.
This is a quick example to make sure the meaning of reverse is clear:
print "Enter string to be reversed: ";
$input=;

@letters=split //,$input; # splits on the 'nothings' in between each character of $input

print join ":", @letters; # joins all elements of @letters with \n, prints it
print reverse @letters; # prints all of @letters, but sdrawkcab )-:

Perl's list operators can just feed directly to each other, saving many lines of code but also decreasing readability to those that aren't Perl-literate:
print "Enter string to be reversed: ";
print join ":",reverse split //,$_=;

This section is about sorting, so enough of reverse . Time to go forwards instead.



Numeric Sorting -- How Sort Really Works
That's easy alphabetical sorting by the keys. If you had a hash of international access numbers like this one:

%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');

foreach (sort keys %countries) {
print "The key $_ contains $countries{$_}\n";
}

You might want to sort numerically. In that case, you need to understand how Perl's sort function works.

The sort function compares two variables, $a and $b . They must be called $a and $b otherwise it won't work. One chap published a book with stolen code, and he changed $a and $b to $x and $y. He obviously didn't test the program because it would have failed and he would have noticed. And this book was really published ! Don't believe everything you read in books -- but web tutorials are always 100% truthful :-)

Back to sorting. $a and $b are compared, and the result is:

1 if $a is greater than $b
-1 if $b is greater than $a
0 if $a and $b are equal
So as long as the sort function gets one of those three values back it is happy. This means we can write our own sort routines, and feed them to sort. For example, we know the default sort is alphabetical. But if we write this:
%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');

foreach (sort supersort keys %countries) {
print "$_ $countries{$_}\n";
}

sub supersort {
if ($a > $b) {
return 1;
} elsif ($a < $b) {
return -1;
} else {
return 0;
}
}

then it works correctly. Of course, there is an easier way. The 'spaceship' operator <=> . It does exactly what the supersort subroutine does, namely return 1, -1 or 0 depending on the comparison of two given values.
So we can write the above much more easily as:

%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');

foreach (sort { $a <=> $b } keys %countries) {
print "$_ $countries{$_}\n";
}

Notice the { } braces, which define the contents as the subroutine sort must use. Pretty short subroutine. There is a companion operator to <=> , namely cmp which does exactly the same thing but of course compares the values as strings, not numbers.Remember, if you are comparing numbers, your comparison operator should contain non-alphas, if you are comparing strings the operator should contains alphas only. And don't talk to strangers.

Anyway, you now have enough knowledge to sort a hash by value instead of keys. Suppose your pointy haired manager bounced up to you and demanded a hash sorted by value ? What would you do ? OK, what should you do ?

Well, we could just sort the values.

foreach (sort values %countries) {

But Pointy Hair wants the keys too. And if you have a value you can't find the key.
So we have to iterate over the keys. But just because we are iterating over the keys doesn't mean to say we have to hand the keys over to sort . What about:

%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');

foreach (sort { $countries{$a} cmp $countries{$b} } keys %countries) {
print "$_ $countries{$_}\n";
}

beautifully simple. If you want a reverse sort transpose $a and $b .



Sorting Multiple Lists
You can sort several lists at the same time:

%countries=('976','Mongolia','52','Mexico','212','Morocco','64','New Zealand','33','France');
@nations=qw(China Hungary Japan Canada Fiji);

@sorted= sort values %countries, @nations;

foreach (@nations, values %countries) {
print "$_\n";
}

print "#----\n";

foreach (@sorted) {
print "$_\n";
}

This sorts @nations and the values from %countries into a new array.

The example also demonstrates that you can foreach over more than one list value -- each list is processed in turn. How I discovered that particular trick with Perl is instructive. I just tried it. If you think you should be able to do something with Perl, try it. Adhere to the syntax and conventions you will be familiar with from experience, in this case delimiting a list with commas, and try it. I'm always finding new shortcuts just by experimentation.

Reading Directories

Globbing
For this exercise, I suggest creating another directory where you have at least two text files and two or more binary files. Copy a couple of .dll files from your WINDIR directory if you need to, those will do for the binaries, and save a couple of random text files. Size doesn't matter, in this case.

Then run this, giving the directory as the command line argument:

$dir=shift; # shifts @ARGV, the command line arguments after the script name

chdir $dir or die "Can't chdir to $dir:$!\n" if $dir;

while (<*>) {
print "Found a file: $_\n" if -T;
}


The chdir function changes perl's working directory. You should, as ever, test to see if it worked or not. In this case we only try and change directory if $dir is true.

The <*> construct reads all files from a given directory, and prints if it passes the file test -T , which returns true if the file is a non-binary, ie text file. You can be more specific:

$dir =shift;
$type='txt';

chdir $dir or die "Can't chdir to $dir:$!\n" if $dir;

while (<*.$type>) {
print "Found a file: $_\n";
}

like so. But, there is a better way to read from directories. The method above is rather slow and inflexible.

readdir : How to read from directories
Instead, there is readdir . Another version of the previous example:
$dir= shift || '.';

opendir DIR, $dir or die "Can't open directory $dir: $!\n";

while ($file= readdir DIR) {
print "Found a file: $file\n";
}

The first difference is the first line, which essentially says if shift is false, then $dir = ., which is of course the current directory. Then, the directory is opened and we have the chance to trap the error. It is assigned a filehandle. The readdir function reads each file into $file. There is no while () { construct.

We can also apply the text file test. Run this, once without entering a directory and the second time with entering a directory path other than the one the script is in:

$dir= shift || '.';

opendir DIR, $dir or die "Can't open directory $dir: $!\n";

while ($file= readdir DIR) {
print "Found a file: $file\n" if -T $file ;
}

Firstly, because the filename is now not in $_ we have to explicitly apply the -T test to it with -T $file.

Why did this not work the second time? Look at the code carefully. You are testing $file. If perl doesn't get a fully qualified pathname, it assumes you are still in the directory the script was run from, or that of the last successful chdir . Not necessarily where you are readdir'ing from. So, to fix it:


print "Found a file: $dir/$file\n" if -T "$dir/$file" ;


where we now specify the pathname, both in the printout and in the file test itself. The "" are used because otherwise perl tries to divide $file by $dir.

Try running this on a directory with only a few files in it:

$dir= shift || '.';

opendir DIR, $dir or die "Can't open directory $dir: $!\n";

while ($file= readdir DIR) {
print "Found a file: '$file'\n";
}

Notice that two files are found which have interesting names, namely . and .. . These two files are the current, and lower directory respectively. Nothing new, they have always been there -- run the DOS command dir if you don't believe me. You don't usually want to know about them, so:
while ($file= readdir DIR) {
next if $file=~/^\./;
print "Found a file: '$file'\n";
}

is the usual workaround. You can use scalar context to dump everything to a list of some description:
$dir= shift || '.';

opendir DIR, $dir or die "Can't open directory $dir: $!\n";

@files=readdir(DIR);

print "@files";

but that includes the . files, so it is best to ensure they aren't included:
@files=grep !/^\./, readdir(DIR);

We haven't met -T yet, but for the moment just remember it searches a list and if it returns true, lets the variable pass. In this case, if it doesn't begin with . then that's true so it goes into @files.

There are other commands associated with reading directories, which tell you where in a directory you are, and then where to go to return. You should be aware of their existence, because you never know when you might need them. The one other command of use is closedir , which closes a directory. Optional, but recommended for clarity.

Associative Arrays

The Basics
Very, very useful. First, a quick recap on arrays. Arrays are an ordered list of scalar variables, which you access by their index number starting at 0. The elements in arrays always stay in the same order.

Hashes are a list of scalars, but instead of being accessed by index number, they are accessed by a key. The tables below illustrate the point:


@myarray
Index No. Value
0 The Netherlands
1 Belgium
2 Germany
3 Monaco
4 Spain
%myhash
Key Value
NL The Netherlands
BE Belgium
DE Germany
MC Monaco
ES Spain



So if we want 'Belgium' from @myarray and also from %myhash , it'll be:

print "$myarray[1]";
print "$myhash{'BE'}";

Notice that the $ prefix is used, because it is a scalar variable. Despite the fact it is part of a list, it is still a scalar variable. The hash syntax is simply to use braces { } instead of square brackets.

So why use hashes ? When you want to look something up by a keyword. Suppose we wanted to create a program which returns the name of the country when given a country code. We'd input ES, and the program would come back with Spain.

You could do it with arrays. It would be messy however. One possible approach:

create @country , and give it values such as 'ES,Spain'
Itierate over the entire array and
split each element of the array, and check the first result to see if it matches the input
If so, return the index
@countries=('NL,The Netherlands','BE,Belgium','DE,Germany','MC,Monaco','ES,Spain');

print "Enter the country code:";
chop ($find=);

foreach (@countries) {
($code,$name)=split /,/;
if ($find=~/$code/i) {
print "$name has the code $code\n";
}
}

Complex and slow. We could also store a reference to another array in each element of @countries , but that is not efficient. Whatever way we choose, you still need to search the whole thing. And what if @countries is a big array ? See how much easier a hash is:

A Hash in Action
%countries=('NL','The Netherlands','BE','Belgium','DE','Germany','MC','Monaco','ES','Spain');

print "Enter the country code:";
chop ($find=);

$find=~tr/a-z/A-Z/;
print "$countries{$find} has the code $find\n";

Very easy. All we need to do is make sure everything is in uppercase with tr and we are there. Notice the way %countries is defined - exactly the same as a normal array, except that the values are put into the hash in key/value pairs.




When you should use hashes
So why use arrays ? One excellent reason is because when an array is created, its variables stay in the same order you created them in. With a hash, perl reorders elements for quick access. Add print %countries; to the end of that program above and run it. See what I mean ? No recognisable sequence at all. It's like trying to herd cats. If you were writing code that stored a list of variables over time and you wanted it back in the order you found it in, don't use a hash.

Finally, you should know that each key of a hash must be unique. Stands to reason, if you think about it. You are accessing the hash via keys, so how can you have two keys named 'NL' or something ? If you do define a certain key twice, the second value overwrites the first. This is a feature, and useful. The values of a hash can be duplicates, but never the keys.

If you want to assign to a hash, there is of course no concept of push , pop and splice etc. Instead:


Hash Hacking Functions
Assigning $countries{PT}='Portugal';
Deleting delete $countries{NL};


Accessing Your Hash

Assuming you keep the same %countries hash as above, here are some useful ways to access it:

All the keys print keys %countries;
All the values print values %countries;

A Slice of Hash :-) print @countries{'NL','BE'};
How many elements ? print scalar(keys %countries);
Does the key exist ? print "It's there !\n" if exists $countries{'NL'};


Well, that last one is not an access as a such but useful anyway.




More Hash Access: Iteration, keys and values
You may have noticed that keys and values return a list. And we can iterate over a list, using foreach :
foreach (keys %countries) {
print "The key $_ contains $countries{$_}\n";
}

which is useful. Note how any list can be fed to foreach , and off it goes. As usual, there is another way to do the above:
while (($code,$name)=each %countries) {
print "The key $code contains $name\n";
}

The each function returns each key/value pair of the hash, and is slightly faster. In this example we assign them to a list (you spotted the parens ?) and away we go. Eventually there are no more pairs, which returns false to the while loop and it stops.

If you are into brevity, both the above can be accomplished in a single line:

print "The key $code contains $name\n" while ($code,$name)=each %countries;

print "The key $_ contains $countries{$_}\n" foreach keys %countries;


Note -- this won't win any prizes for easily readable code by non-programmers of Perl.

Files


Opening
Perl is very good at handling files. Create, in your perl scripts directory c:\scripts, a file called stuff.txt. Copy the following into it :

The Main Perl Newsgroup:comp.lang.perl.misc
The Perl FAQ:http://www.perl.com/faq/
Where to download perl:http://www.activestate.com/

Now, to open and do things with this file. First, we must open the file and assign it to a filehandle. All operations will be done on the file via the filehandle. Earlier, we used as a filehandle - we read from it.
$stuff="c:\scripts\stuff.txt";

open STUFF, $stuff;

while () {
print "Line number $. is : $_";
}

What this script does is fail. What is should do is open the file defined in $stuff , assign it to the filehandle STUFF and then, while there are still lines left in the file, print the line number $. and the current line.



An unforgivable error
It fails. That's not so bad, everything fails sometimes. What is unforgivable is NOT CHECKING THE ERROR CODE !
This is a better version:

open STUFF, $stuff or die "Cannot open $stuff for read :$!";

If the open operation fails, the or means that the code on the RHS (right hand side) is evaluated. Perl dies. This means it exits the script, performs a post-mortem which it writes up into $! and tells you the line number at which it died. Just because $! contains useful information doesn't mean to say it is automagically printed, in true perl fashion. Usually you will wish to avail yourself of the information inside as it is of great help when working out why something is not going according to plan. The moral of the chapter is:
Always check your return codes !


\\ or / in pathnames -- your choice
The problem should now be apparent. The backslashes, being escape characters, are not displayed. There are two ways to fix this:

Escape the backslashes, like so $stuff="c:\\scripts\\stuff.txt";
Convert backslashes into forward slashes : $stuff="c:/scripts/stuff.txt";
The forward slashes are the preferred option, even under Win32, because you can then port the script direct to Unix or other platforms (assuming you don't use drive letters), and it is less typing. If you wish to use Perl to start external processes then you must use the \\ method, but this variable will be used only in a Perl program, not as a parameter to start an external program. Changing the $stuff variable results in a working script. Always check your return codes !

Reading a file
$stuff="c:/scripts/stuff.txt";

open STUFF, $stuff or die "Cannot open $stuff for read :$!";

while () {
print "Line $. is : $_";
}

A little more detail on what is happening here. The file is opened for read. You can append and write too. You don't have to use a variable, but I always do because it is then easy to change and easy to insert into the or die section, and it is easy to change later on. Hardcoding things is not the best way to write a maintainable and flexible program. Just ask the Year 2000 people about code that lived a little longer than the authors imagined :-).
open STUFF, "c:/scripts/stuff.txt" or die "Cannot open stuff.txt for read :$!";

is just as good but more work if you want to change anything.
The line input operator (that's the angle brackets <> reads from the beginning of the file up until and including the first newline. The read data goes into $_ , and you can do what you want with it there. On the next iteration of the loop data is read from where the last read left off, up to the next newline. And so on until there is no more data. When that happens the condition is false and the loop terminates. That's the default behaviour, but we can change this.

This means that you can open a 200Mb file in perl and run through it without having to load the entire file into memory. 200Mb of memory is quite a bit. If you really want to load the entire 200Mb file into one variable, Perl lets you. Limits are not the Perl Way.

The special variable $. is the current line number, starting at 1.

As usual, there is a quicker way to do the previous program.

$STUFF="c:/scripts/stuff.txt";

open STUFF or die "Cannot open $STUFF for read :$!";

while () {
print "Line $. is : $_";
}

This saves a little bit of typing, but does tie your filehandle to the variable name. In fact, that entire program could be compressed further, but that's for later.

If you are really into shortness, try this:

$STUFF="c:/scripts/stuff.txt";

open STUFF or die "Cannot open $STUFF for read :$!";

print "Line $. is : $_" while ();





Writing to a File


A simple write
$out="c:/scripts/out.txt";

open OUT, ">$out" or die "Cannot open $out for write :$!";

for $i (1..10) {
print OUT "$i : The time is now : ",scalar(localtime),"\n";
}

Note the addition of > to the filename. This opens it for writing. If we want to print to the file we now just specify the filehandle name. You print to the filehandle, which is a gateway to the file.

Filehandles don't have to be capitalised, but it is wise. All Perl functions are lowercase, and Perl is case-sensitive. So if you choose uppercase names they are guaranteed not to conflict with current or future function words.

And a neat way to grab the date sneaked in there too. You should be aware that writing to a file overwrites the file. It does not append data! However, you may append:


Appending
$out="c:/scripts/out.txt";

&printfile;

open OUT, ">>$out" or die "Cannot open $out for append :$!";

print OUT 'The time is now : ',scalar(localtime),"\n";

close OUT;

&printfile;

sub printfile {
open IN, $out or die "Cannot open $out for read :$!";
while () {
print;
}
close IN;
}

This script demonstrates subroutines again, and how to append to a file, that is write additional data at the end. The close function is introduced here. This, well, closes a filehandle. You don't have to close a filehandle - just leave it open until the script finishes, or the next open command to the same filehandle will close it for you.

@ARGV: Command Line Arguments
Perl has a special array called @ARGV . This is the list of arguments passed along with the script name on the command line. Run the following perl script as:

perl myscript.pl hello world how are you


foreach (@ARGV) {
print "$_\n";
}

Another useful way to get parameters into a program -- this time without user input. The relevance to filehandles is as follows. Run the following perl script as:
perl myscript.pl stuff.txt out.txt

while (<>) {
print;
}

Short and sweet ? If you don't specify anything in the angle brackets, whatever is in @ARGV is used instead. And after it finishes with the first file, it will carry on with the next and so on. You'll need to remove non-file elements from @ARGV before you use this.
It can be shorter still:

perl myscript.pl stuff.txt out.txt

print while <>;

Read it right to left. It is possible to shorten it even further !
perl myscript.pl stuff.txt out.txt

print <>;

This takes a little explanation. As you know, many things in Perl, including filehandles, can be evaluated in list or scalar context. The result that is returned depends on the context.
If a filehandle is evaluated in scalar context, it returns the first line of whatever file it is reading from. If it is evaluated in list context, it returns a list, the elements of which are the lines of the files it is reading from.

The print function is a list operator, and therefore evaluates everything it is given in list context. As the filehandle is evaluated in list context, it is given a list !

Who said short is sweet? Not my girlfriend, but that's another story. The shortest scripts are not usually the easiest to understand, and not even always the quickest. Aside from knowing what you want to achieve with the program from a functional point of view, you should also know wheter you are coding for maximum performance, easy maintenance or whatever -- because chances those goals may be to some extent mutually exclusive.


Modifying a File with $^I
One of the most frequent Perl tasks is to open a file, make some changes and write it back to the original filename. You already have enough knowledge to do this. The steps would be:

Make a backup copy of the file
Open the file for read
Open a new temporary file for write
Go through the read file, and write it and any changes to the temp file
When finished, close both files
Delete the original file
Rename the temp file to the original filename
If you have managed to get this far and assiduously work through the examples, the above will be child's play. Play if you want, but there is a Better Way.
Make sure you have data in c:\scripts\out.txt then run this:

@ARGV="c:/scripts/out.txt";

$^I=".bk"; # let the magic begin

while (<>) {
tr/A-Z/a-z/; # another new function sneaked in
print; # this goes to the temp filehandle, ARGVOUT,
# not STDOUT as usual, so don't mess with it !
}

So, what's happening? First, we load up @ARGV with the name of a file. It doesn't matter how @ARGV is loaded. We could have shifted the code from the command line.
The $^I is a special variable. You knew that just by looking at it. It's name is the Inplace Edit variable, and when it has a value the effects are:

The name of the file to be in-placed edited is taken from the first element of @ARGV. In this case, that is c:/scripts/out.txt. The file is renamed to its existing name plus the value of $^I, ie out.txt.bk.
The file is read as usual by the diamond operator <>, placing a line at a time into $_.
A new filehandle is opened, called ARGVOUT, and no prizes for guessing it is opened on a file called out.txt. The original out.txt is renamed.
The print prints automatically to ARGVOUT, not STDOUT as it would usually.
At the end of the operation you have neatly edited the file and made a backup. If you don't want a backup, assign a null string to $^I but don't go crying on any mailing lists if you lose data.
The usual method of in-place editing would involve just printing everything back where it came from until your regex finds whatever needs changing. You could of course slurp the whole file into memory and play with it there, which could be a lot easier but if you are dealing with files of more than a few megabytes this is probably not a feasible approach.

Now take a look at out.txt . Notice how all capital letters have been transliterated into lowercase. This is the tr operator at work, which is more efficient than regex for changing single characters. But that's only a small part of the tr function's value to the world. More later.

You should also have an out.txt.bk file. And finally, notice the way @ARGV has been created. You don't have to create it from the command line arguments -- it can be treated like an ordinary array, for that is what it is.




$/ -- Changing what is read into $_
On a different note, what if your input file is doesn't look like this:
Beer
Wine
Pizza
Catfood

which is nicely delimited with a newline each time, but like this:
shorts
t-shirt
blouse

pizza
beer
wine
catfood

Viz
Private Eye
The Independent
Byte

toothpaste
soap
towel

which is delimited by TWO newlines, not one. You don't have to save the above as shop.txt, but if you don't, the examples will be difficult to follow.
Now, if you want each set of items as elements in an array you'll have to do something like this:

$SHOP="shop.txt";
$x=0;

open SHOP or die "Can't open $SHOP for read: $!\n";

while () {
if (/^\n/) { # does line begin with newline ?
$x++; # if so, increment $x. Rest of if statement not executed.
} else {
$list[$x].=$_; # glue $_ on the end of whatever is in $list[$x], using a .
}
}

foreach (@list) {
print "Items are:\n$_\n\n";
}

which works, but there is a much easier way to do it. You knew I was going to say that.
$SHOP="shop.txt";
$/="\n\n";

open SHOP or die "Can't open $SHOP for read: $!\n";

while () {
push (@list, $_);
}

foreach (@list) {
print "Items are:\n$_\n\n";
}

The $/ variable is a special variable (it even looks special). It is the Default Input Record Separator. Remember the operation of the angle brackets being to read a file in up until the next newline ? Time to come clean. What the angle bracket actually do is read up until whatever $/ is set to. It is set to a newline by default.
So if we set it to two newlines, as above, then it reads up until it finds two consecutive newlines, then puts the data into $_ This makes the program a lot shorter and quicker. You can set $/ to just about anything, not just a newline. If you want to hack this list for example:

Tea:Beer:Wine:Pizza:Catfood:Coffee:Chicken:Salmon:Icecream
you could just leave $/ as a newline and slurp it into memory in one go, but imagine the above items are a list of clothes that your girlfriend wants to buy or a list of clothes your boyfriend should have thrown away by now. Either are going to be really big files, and you might not want to read it all into memory in one go. So set $/=":"; and all will be well. There are also read and seek functions, but they aren't covered here. Those are useful for files where you read in a precise number of bytes.
We'll go back to the last example for a moment. It is useful to know how to read just one line (well, up to $/ ) at a time:

$SHOP="shop.txt";
$/="\n\n";

open SHOP or die "Can't open $SHOP for read: $!\n";

$clothes=; # everything up until the first occurrence of $/ into $clothes

$food=; # everything from first occurrence of $/ to the second into $food

print "We need...\n",$clothes,"...and\n",$food;

And now we know that, there is a even quicker way to achieve the aim of the original program :

$SHOP="shop.txt";
$/="\n\n";

open SHOP or die "Can't open $SHOP for read: $!\n";

@list=; # dumps *all* of $SHOP into @list, not just one line.

foreach (@list) {
print "Items are:\n$_\n\n";
}

and you don't need to grab it all :
@list[0..2]=

. We haven't mentioned list context for a while. Whether the line input operator <> returns a single value or a list depends on the context you use it in. When you supply @xxxxx then this must be a list. If you supply $xxxxx then that's a scalar variable. You can force it into list context by using parens.
The two lines below are provided so you can paste them into the above program. They demonstrate how parens force list context. Remember to replace the foreach with something that prints the variables.

($first, $second) = ;
$first, $second = ;




HERE Docs
The problem:
print "This is a long line of text which might be too long to fit on just one line\n";
print "and I was right, it was too long to fit on one line. In fact, it looks like it\n";
print "might very well take up to FOUR, yes FOUR lines to print. That's four print\n";
print "statements, which takes up even more room. But wait! I'm wrong! It will take\n";
print "FIVE lines to print this statement! Or is that six lines? I'm not sure....\n";

The solution:
$var='variable interpolated';

print <This is a long line of text which might be too long to fit on just one line
and I was right, it was too long to fit on one line. In fact, it looks like
it might very well take up to FOUR, yes FOUR lines to print.

That's four print statements, which takes up even more room. But wait! I'm
wrong! It will take FIVE lines to print this statement! Or maybe six lines?
I'm not sure....but anyway, just to prove this can be $var.
PRT

That's called a 'here' document and you don't need to use PRT, you can use whatever you like within reason. You don't need to put in explicit newlines, although if you do they perform as usual. Now you know about here docs you can stop wearing the print function out by calling it every couple of lines. You don't have to use here docs to print to files, just anywhere you'd normally put a more than one print statement.