Wednesday, August 6, 2008

Bondage and Discipline

Perl is a very flexible language. It is designed as a hacking tool, for quick sysadmin magic. It can do quite a bit more besides, but being small and powerful is a core Perl feature. Earlier on I said Perl is not a bondage and discipline language -- to qualify that, it doesn't have to be. However, there is a time and place for everything.

For tiny scripts you don't want to be declaring variables, typecasting and generally spending more time obeying rules than you do getting the job done. So, Perl doesn't force you to do all of these good programming practices. However, not all your programs are going to be five-minute hacks. Some will be pretty large. Therefore, some Discipline is in order.

Perl has two primary methods of enforcing discipline. They are:

-w for Warnings
use strict;


-w
Consider for a moment this little program:
@input=@ARGV;

$outfile='outfile.txt';
open OUT, ">$outfile" or die "Can't open $outfile for write:$!\n";

$input2++;
$delay=2 if $input[0] eq 'sleep';

sleep $delay;

print "The first element of \@input is $input[0]\n";
print OUY "Slept $delay!\n";

It doesn't do much. Just prints out the first argument supplied, and demonstrates the uninspiring sleep function. The program itself is full of holes, and it is only a few lines. How many errors can you spot? Try and count them. When you are finished, execute the program with error-checks enabled:
perl -w script.pl hello

Perl finds quite a few errors. The -w switch finds, among other heinous sins:
Variables used only once. In the example, $input2 is used only once. It is a useless variable.
Filehandles used incorrectly. With print OUY I'm trying to print to a non-existent filehandle. With -w an alarm is raised, as it would be if I tried to write to a filehandle which was read-only.
Use of uninitialised variables. The variable $delay is uninitialised if 'sleep' is not the first parameter. Making variables spring into the air on demand is not good programming practice. They should be defined carefully first.
So, generally, -w is a Good Thing. It forces you to write cleaner code. So use it, but don't be afraid not to for very short programs.



Shebang
You know that you can turn warnings on with -w on the command line. You can also turn them on within the script itself. For that matter, you can give perl any command line option within the script itself. For example:
perl script.pl hello

to execute this:
#!perl -w

@input=@ARGV;

$outfile='outfile.txt';
open OUT, ">$outfile" or die "Can't open $outfile for write:$!\n";

$input2++;
$delay=2 if $input[0] eq 'sleep';

sleep $delay;

print "The first element of \@input is $input[0]\n";
print OUY "Slept $delay!\n";

has the same effect as:
perl -w script.pl hello

It may be more convenient for you to put the flag inside the script. It doesn't have to be just -w , it can be any argument Perl supports. Run
perl -h
for a full list.

The first line, #!perl -w is the shebang line. This is derived from UNIX, where Perl was first developed. UNIX systems make a script executable by changing an attribute. The operating system then loads the file and works out how to execute it -- in this case by looking at the first line, then loading the perl interpreter. Windows systems know that all files with a certain extension must be passed to a certain program for execution, eg all .bat files are passed to command.com, and all .xls files are passed to Excel. The point of all this being that you don't need a shebang line, but it doesn't hurt.



use strict;
So what's strict and how do you use it? The module strict restricts 'unsafe constructs', according to the perldocs. The strict module is a pragma, which is a hint that must be obeyed. Like when your girlfriend says 'oh, that ring is *far* too expensive'.

There is no need to be frightened about unsafe code if you don't mind endless hours of debugging unstructured programs. When you enable the strict module, the three things that Perl becomes strict about are:

Variables 'vars'
References 'refs'
Subroutines 'subs'
This tutorial doesn't presently cover references (and let's hope I remember to remove this sentence if I do cover it in later versions) so we won't worry about refs.

Strict variables are useful. Essentially, this means that all variables must be declared, that is defined before use rather than springing into existence as required. Furthermore, each variable must be defined with my or fully qualified. This is an example of a program that is not strict, and should be executed something like this:

perl script.pl "Alain James Smith";

where the "" enclose the string as a single parameter as opposed to three separate space-delimited parameters.
#use strict; # uncomment after running a couple of times

$name=shift; # shifts @ARGV if no arguments supplied

print "The name is $name\n";
$inis=&initials($name);

$luck=int(rand(10)) if $inis=~/^(?:[a-d]|[n-p]|[x-z])/i;

print "The initials are $inis, lucky number: $luck\n";

sub initials {
my $name=shift;
$initials.=$1 while $name=~/(\w)\w+\s?/g;
return $initials;
}

By now you should be able to work out what the above does. When you uncomment the use strict; pragma, and re-run the program, you will get output something like this:
Global symbol "$name" requires explicit package name at n1.pl line 3.
Global symbol "$inis" requires explicit package name at n1.pl line 6.
Global symbol "$luck" requires explicit package name at n1.pl line 8.
Global symbol "$initials" requires explicit package name at n1.pl line 14.
Execution of n1.pl aborted due to compilation errors.

These warnings mean Perl is not exactly clear about what the scope of variables is. If Perl is not clear, you might not be either. So you need to be explicit about your variables, which means either declaring them with my so they are restricted to the current block, or referring to them with their fully qualified name. An example, using both methods:
use strict;

$MAIN::name=shift; # shifts @ARGV if no arguments supplied

print "The name is ",$MAIN::name,"\n";
my $inis='';
my $luck='';

$inis=&initials($MAIN::name);

$luck=int(rand(10)) if $inis=~/^(?:[a-d]|[n-p]|[x-z])/i;

print "The initials are $inis, lucky number: $luck\n";

sub initials {
my $name=shift;
my $initials;
$initials.=$1 while $name=~/(\w)\w+\s?/g;
return $initials;
}


The my variables in the subroutine are nothing new. The my variables outside the subroutine are. If you think about it, the main program itself is also a kind of block, and therefore variables can be lexically scoped to be visible only within the block.

The other interesting bit is the $MAIN::name business. This, as you might expect, is the fully qualified name of the variable. The first part is the package name, in this case MAIN. The second part is the actual variable name. Personally, I've never needed to refer to a variable this way. I'm not saying you'll never use the syntax, but I would suggest that knowing this is not on a perl students Top 10 list of Things to Master.

The important thing about use strict is that it does enforce more discipline than you have been used to, and for all but the smallest of programs, that is most definitely a Good Thing.

No comments: