Wednesday, August 6, 2008

Subroutines and Parameters

In Perl, subroutines are functions are subroutines. If you like, a subroutine is a user defined function. It's a bit like calling a script a program, or a program a script. For the purposes of this tutorial we'll refer to functions as subroutines, except when we call them functions. Hope that's made the point.

For the purposes of this section we will develop a small program which, by the end, will demonstrate how subroutines work. It also serves to demonstrate how many programs are built, namely a little at a time, in manageable sections. At least, that method works for me. engines.

The chosen theme is gliding. That's aeroplanes without engines. A subject close to every glider pilot's heart is how far they can fly from the altitude they are at. Our program will calculate this. To make it easy we'll assume the air is perfectly calm. Wind would be a complication we don't need, especially when in a crowded lift.

What we need in order to calculate the distance we can fly is:

How high we are (in feet)
How many metres we travel forward for every metre we drop. This is the glide ratio, for example 24:1 would mean travelling 24 metres forward for every 1 metre of height lost.

Obviously input is needed. We can either prompt the user or grab the input from the command line. The latter is easier so we'll just look at @ARGV for the command line parameters. Like so:

($height,$angle)=@ARGV; # @ARGV is the command line parameters

$distance=$height*$angle; # an easy calculation

print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

The above should be executed thus:
perl yourscript.pl 5000 24

or whatever your script is called, with whatever parameters you choose to use. I'm a poet and I don't even know it.

That works. What about a slight variation? The pilot does have some control over the glide ratio, for example he can fly faster but at a penalty of a lesser glide ratio. So we should perhaps give a couple of options either side of the given parameters:

($height,$angle)=@ARGV;

$distance=$height*$angle;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

$angle++; # add 1 to $angle
$distance=$height*$angle;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

$angle-=2; # subtract 2 from $angle so it is 1 less than the original
$distance=$height*$angle;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

That's cumbersome code. We repeat exactly the same statement. This wastes space, and if we want to change it there are three changes to be made. A better option is to put it into a subroutine:
($height,$angle)=@ARGV;

&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

$angle++;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

$angle-=2;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

sub howfar { # sub subroutinename
$distance=$height*$angle;
}

This is a basic subroutine, and you could stop here and have learnt a very useful technique for programming. Now, when changes are made they are made in one place. Less work, less chances of errors. Improvements can always be made. For example, pilots outside Eastern Europe generally measure height in feet, and glider pilots are usually concerned with how many kilometres they travel over the ground. So we can adapt our program to accept height in feet and output the distance in kilometres:
($height,$angle)=@ARGV;

$height/=3.2; # divide feet by 3.2 to get metres

&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

$angle++;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

$angle-=2;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

sub howfar {
$distance=$height*$angle;
}

When you run this you'll probably get a result which involves a fair few digits after the decimal point. This is messy, and we can fix this by the int function, which in Perl and most other languages returns a number as an integer, ie without those irritating numbers after the decimal point.

You might have also noticed a small bit of Bad Programming Practice slipped into the last example. It was the evil Constant, the '3.2' used to convert feet to metres. Why, I don't hear you ask, is this bad? Surely the conversion will never change?

It won't change, but our use of it might. We may decide that it should be 3.208 instead of 3.2. We may decide to convert from feet to nautical miles instead. You don't know what could happen. Therefore, code with flexibility in mind and that means avoiding constants.

The new improved version with int and constant removed:

($height,$ratio)=@ARGV;
$cnv1=3.2; # now it is a variable. Could easily be a cmd line
# parameter too. We have the flexibility.
$height =int($height/$cnv1); # divide feet by 3.2 to get metres

&howfar;
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";

$ratio++;
&howfar;
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";

$ratio-=2;
&howfar;
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";

sub howfar {
$distance=int($height*$ratio);
}

We could of course build the print statement into the subroutine, but I usually separate output presentation from the calculation. Again, that means it is easier to modify later on.

Something else we can improve about this code is the use of the $ratio variable. We are having to keep track of what we do to it -- first add one, then subtract two in order to subtract one from the original input. In this case it is fairly easy, but with a complex program it can be difficult, and you don't want to be creating lots of variables just to track one input, for example $ratio1 , $ratio2 etc.




Parameters
One solution is to pass the subroutine parameters. In the best tradition of American columnists, who seem to have a particular affection for this phrase, 'Here's how:'
($height,$ratio)=@ARGV;
$cnv1=3.2;

&howfar($height,$ratio);
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";

&howfar($height,$ratio+1);
print "With a glide ratio of ",$ratio+1,":1 you can fly $distance from $height\n";

&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from $height\n";

sub howfar {
print "The parameters passed to this subroutine are @_\n";
($ht,$rt)=@_;
$ht =int($ht/$cnv1);
$distance=int($ht*$rt);
}

Quite a few things have changed here. Firstly, the subroutine is being called with parameters. These are a comma-delimited list in parens after the subroutine call. The two parameters are $height and $ratio.

The parameters end up in the subroutine as the @_ array. Being an array, they are in the same order as passed. All the usual array operations work. All we will do is assign the contents of the array to two variables.

We have also moved the conversion function into the subroutine, because we want to put all the code for generating the distance into one place.



Namespaces
We cannot use the variable names $height and $ratio because we modify them in the subroutine and that will affect the main program. So we choose new ones to do the operation on. Finally, a small change is made to the print output.

This approach works well enough for our small program here. For larger programs, having to think of new variable names all the time is difficult. It would be even more difficult if different programmers were working on different sections of the program. It would be impossible if a program were written, then an extension created by another person somewhere else, and that same extension had to be used by many people in many different programs. Obviously, the risk of using the same variable name is too great. There are only so many logical names out there.

There is a solution. Imagine you own a house with two gardens. You have two identical dogs, one in the front garden, one in the back garden. Bear with me, this is relevant. Both dogs are called Rover, because their owner lacks imagination.

When you go to the front garden and call 'Rover!!!' or open a can of dog food, the dog in the front garden comes running. Similarly, you go to the back garden, call your dog and the other dog bounces up to you.

You have two dogs, both called Rover, and you can change either one of them. Wash one, neuter the other -- it doesn't matter, but both are dogs and both have the same name. Changes to one won't affect the other. You don't get them confused because they are in different places, in two different namespaces.




Variable Scope
To bring things back to Perl, a short diversion is necessary to illustrate the point with actual Perl code instead of canine metaphors:
$name='Rover';
$pet ='dog';
$age =3;

print "$name the $pet is aged $age\n";

{
my $age =4; # run this again, but comment this line out
my $name='Spot'; # and this one
$pet ='cat';

print "$name the $pet is aged $age\n";
}

print "$name the $pet is aged $age\n";

This is pretty straightforward until we get to the { . This marks the start of a block. One feature of a block is that it can have its own namespace. Variables declared, in other words initialised, within that block are just normal variables, unless they are declared with my .

When variables are declared with my they are visible inside the block only. Also, any variable which has the same name outside the block is ignored. Points to note from the example above:

The two my variables appear to overwrite the variables of the same name from outside the block.
The two original variables aren't really overwritten because as we prove after the block has ended, they haven't been touched.
The variable $pet is accessible inside and outside the block as usual. Of course, if we declare it with my then things will change.


my Variables
So there we have it. Namespaces. They work for all the other types of variable too, like arrays and hashes. This is how you can write code and not care about what other people use for variable names -- you just declare everything with my and have your own private party. Our original program about gliding can be improved now:
($height,$ratio)=@ARGV;
$cnv1=3.2;

&howfar($height,$ratio);
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";

&howfar($height,$ratio+1);
print "With a glide ratio of ",$ratio+1,":1 you can fly $distance from $height\n";

&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from $height\n";

sub howfar {
my ($height,$ratio)=@_;
$height =int($height/$cnv1);
$distance=int($height*$ratio);
}


The only change is that the parameters to the subroutine, ie the contents of the array @_ , are declared with my . This means they are now only visible within that block. The block happens to also be a subroutine. Outside of the block, the original variables are still accessible. At this point I'll introduce the technical term, which is lexical scoping. That means the variable is confined to the block -- it is only visible within the block.

We still have to be concerned with what variables we use inside the subroutine. The variable $distance is created in the subroutine and used outside of it. With larger programs this will cause exactly the same problem as before -- you have to be careful that the subroutine variables you use are the same ones as outside the subroutine. For all the same reasons as before, like two different people working on the code and use of custom extensions to Perl, that can be difficult.

The obvious solution is to declare $distance with my , and thus lexically scope it. If we do this, then how do we get the result of the subroutine? Like so:


($height,$ratio)=@ARGV;
$cnv1=3.2;

$distance=&howfar($height,$ratio); # run this again and delete '$distance='
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";

$distance=&howfar($height,$ratio+1);
print "With a glide ratio of ",$ratio+1,":1 you can fly $distance from $height\n";

$distance=&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from $height\n";

sub howfar {
my ($height,$ratio)=@_;
my $distance;
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000); # output result in kilometres not metres
}

First change -- $distance is declared with my . Secondly, the result of the subroutine is assigned to a variable, which is also named $distance. However, it is a $distance in a different namespace. Remember the two gardens. You may wish to delete the $distance= from the first assignment and re-run the code. The only other change is one to change the output from meters to kilometres.

We have now achieved a sort of Black Box effect, where the subroutine is given input and creates output. We pass the subroutine two numbers, which may or may not be variables. We assign the output of the subroutine to a variable. We care not what goes on inside the subroutine, what variables it uses or what magic it performs. This is how subroutines should operate. The only exception is the variable $cnv1. This is declared in the main body of the program but also used in the subroutine. This has been done in case we need to use the variable elsewhere. In larger programs it would be a good idea to pass it to subroutines along with the other parameters too.




Multiple Returns
That's all the major learning out the way with. The next step is relatively easy, but we need to add new functionality to the program in order to demonstrate it. What we will do is work out how long it will take the glider pilot to fly the distance. For this calculation, we need to know his airspeed. That can be a third parameter. The actual calculation will be part of howfar. An easy change:
($height,$ratio,$airspeed)=@ARGV;
$cnv1=3.2;
$cnv2=1.8;

($distance,$time)=&howfar($height,$ratio,$airspeed);
print "Glide ratio $ratio:1, $distance from $height taking $time\n";

($distance,$time)=&howfar($height,$ratio+1,$airspeed);
print "Glide ratio ",$ratio+1,":1, $distance from $height taking $time\n";

($distance,$time)=&howfar($height,$ratio-1,$airspeed);
print "Glide ratio ",$ratio-1,":1, $distance from $height taking $time\n";

sub howfar {
my ($height,$ratio,$airspeed)=@_;
my ($distance,$time); # how to 'my' multiple variables
$airspeed*=$cnv2; # convert knots to kmph
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000);
$time =int($distance/($airspeed/60)); # simple time conversion
# print "Time:$time, Distance:$distance\n"; # uncomment this later
}

This doesn't work correctly. First, the changes. The result from howfar is now assigned to two variables. Subroutines return a list, and so assigning to some scalar variables between parens separated by commas will work. This is exactly the same as reading the command line arguments from @ARGV .

We are also passing a new parameter, $airspeed. There is a another conversion and a one-line calculation to provide the amount of minutes it will take to fly $distance.

If you look carefully, you can perhaps work out what the problem is. There was a clue in the Regex section, when /e was explained.

The problem is that Perl returns the result of the last expression evaluated. In this case, the last expression is the one calculating $time, so the value $time is returned, and it is the only value returned. Therefore, the value of $time is assigned to $distance, and $distance itself doesn't actually get a value at all.

Re-run the program but this time uncomment the line in the subroutine which prints $distance and $time. You'll noticed the value is 1, which means that the expression was successful. Perl is faithfully returning the value of the last expression evaluated.

This is all well and good, but not what we need. What is required is a method of telling Perl what needs to be returned, rather than what Perl thinks would be a good idea:

($height,$ratio,$airspeed)=@ARGV;
$cnv1=3.2;
$cnv2=1.8;

($distance,$time)=&howfar($height,$ratio,$airspeed);
print "Glide ratio $ratio:1, $distance from $height taking $time\n";

($distance,$time)=&howfar($height,$ratio+1,$airspeed);
print "Glide ratio ",$ratio+1,":1, $distance from $height taking $time\n";

($distance,$time)=&howfar($height,$ratio-1,$airspeed);
print "Glide ratio ",$ratio-1,":1, $distance from $height taking $time\n";

sub howfar {
my ($height,$ratio,$airspeed)=@_;
my ($distance,$time); # how lexically scope multiple variables
$airspeed*=$cnv2; # convert knots to kmph
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000); # output result in kilometres not metres
$time =int($distance/($airspeed/60)); # simple time conversion
return ($distance,$time); # explicit return
}

A simple fix. Now, we tell Perl what to return, with the aptly named return function. With this function we have complete control over what is returned and when. It is quite usual to use if statements to control different return values, but we won't bother with that here.

There is a subtle flaw in the program above. It is not backwards compatible with the old method of calling the subroutine. Run this:

($height,$ratio,$airspeed)=@ARGV;
$cnv1=3.2;
$cnv2=1.8;

($distance,$time)=&howfar($height,$ratio,$airspeed);
print "Glide ratio $ratio:1, $distance from $height taking $time\n";

($distance,$time)=&howfar($height,$ratio+1,$airspeed);
print "Glide ratio ",$ratio+1,":1, $distance from $height taking $time\n";

$distance=&howfar($height,$ratio-1); # old way of calling it
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from $height\n";

sub howfar {
my ($height,$ratio,$airspeed)=@_;
my ($distance,$time);
$airspeed*=$cnv2;
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000);
$time =int($distance/($airspeed/60));
return ($distance,$time);
}

A division by 0 results third time around. This is of course because $airspeed doesn't exist, so of course it will effectively be 0. Making your subroutines backwards compatible is important in large programs, or if you are writing an add-in module for other people to use. You can't expect everyone to retrofit additional parameters to their subroutine calls just because you decided to be a bit creative one day.

There are many ways to fix the problem, and this is just one:

($height,$ratio,$airspeed)=@ARGV;
$cnv1=3.2;
$cnv2=1.8;

($distance,$time)=&howfar($height,$ratio,$airspeed);
print "Glide ratio $ratio:1, $distance from $height taking $time\n";

($distance,$time)=&howfar($height,$ratio+1,$airspeed);
print "Glide ratio ",$ratio+1,":1, $distance from $height taking $time\n";

$distance=&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from $height\n";

print "Direct print: ",join ",",&howfar(5000,55,60)," not bad for no engine!\n";

sub howfar {
my ($height,$ratio,$airspeed)=@_;
my ($distance,$time); # how to 'my' multiple variables
$airspeed*=$cnv2; # convert knots to kmph
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000); # output result in kilometres not metres
if ($airspeed > 0) {
$time =int($distance/($airspeed/60));
return ($distance,$time);
} else {
return $distance;
}
}

Here we just test the $airspeed to ensure we won't be doing any divisions by 0. It also affects what we return. There is also a new print statement, which shows that you don't need to assign to intermediate variables, or even pass variables as parameters. Constants, evil things that they are, work just as well. I already mentioned this, but a demonstration doesn't hurt. Unless you work for an electric chair manufacturer.

The astute reader.....:-) Every time I read that I wonder what I've missed. Usually something obscure which the author knows nobody will ever notice, but likes to belittle the reader. No exception here! Anyway, you may be wondering why this would not have sufficed instead of the if statement:

sub howfar {
my ($height,$ratio,$airspeed)=@_;
my ($distance,$time); # how to 'my' multiple variables
$airspeed*=$cnv2; # convert knots to kmph
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000); # output result in kilometres not metres
$time =int($distance/($airspeed/60)) if $airspeed > 0;
return ($distance,$time);
}

After all, the first item returned is $distance, so therefore it should be the first one assigned via:
$distance=&howfar($height,$ratio-1);

and $time should just disappear into the bit bucket.

The answer lies with scalars and lists. We are returning a list, but assigning it to a scalar. What happens when you do that? The scalar takes on the last value of the list. The last value of the list being returned is of course $time, which is has been declared but not otherwise touched. Therefore, it is nothing and appears as such on the printed statement. A small program to demonstrate that point:

$word=&wordfunc("Greetings");
print "The word is $word\n";

(@words)=&wordfunc("Bonjour");
print "The words are @words\n";

sub wordfunc {
my $word=shift; # when in a subroutine, shifts @_ if no target specified
my @words; # how to my an array
@words=split //,$word; # splits on the nothings between each letter
($first,$last)=($words[0],$words[$#words]); # see section on Arrays if required
return ($first,$last); # Returns just the first and last
}

As you can see, the first call prints the letter 's', which is the last element of the list that is returned. You could of course use a list consisting of just one element:
($word)=&wordfunc("Greetings");

Now we are assigning a list to a list, so perl starts at the first element and keeps assigning till it runs out of elements. The parens turns a lonely scalar into an element of a list. You might consider always assigning the results of subroutines this way, as you never know when the subroutine might change. I know I've just evangelised about how subroutines shouldn't change, but if you take care and the subroutine write takes care, there definitely won't be any problems!

That's about it for good old my . There is a lot more to learn about it but that's enough to get started. You now know about a little about variable visibility, and I don't mean changeable weather.




Local
There is one more function that I'd like to draw to your attention, and we'll launch straight into the demonstration:
@words=@ARGV;

print "Output Field Separator is :$,:\n";
print '1. Words:', @words, "\n";

&change;

$,='_';

print "\nOutput Field Separator is :$,:\n";
print '2. Words:', @words, "\n";

&change;

sub change {
print ' Words:', @words, "\n";
}
which should be executed something like this:
perl test.pl sarcasm is the lowest form of wit

The special variable $, defines what Perl should print in between lists it is given. By default, it is nothing. So the first two prints should have no spaces between the words. Then we assign '_' to $, so the next prints have underscores between the words.
If we want to use a different value for $, in the change subroutine, and not disturb the main value, we have a little problem. This problem cannot be solved by my because global variables like $, cannot at this time be lexically scoped. So, we could manually do it:

@words=@ARGV;

print "Output Field Separator is :$,:\n";
print '1. Words:', @words, "\n";

&change;

$,="_";

print "\nOutput Field Separator is :$,:\n";
print '2. Words:', @words, "\n";

&change;

sub change {
$save=$,;
$,='*';
print ' Words:', @words, "\n";
$,=$save;
}

That works, but it is messy. Perl has a special function for occasions of this nature, called local . An example of local in action:

@words=@ARGV;

print "Output Field Separator is :$,:\n";
print '1. Words:', @words, "\n";

&change;

$,="_";

print "\nOutput Field Separator is :$,:\n";
print '2. Words:', @words, "\n";

&change;

sub change {
local $,="!-!";
print ' Words:', @words, "\n";
}

You can try it with my instead but it won't work. I'm sure you'll try it anyway, I know you learn things the hard way otherwise you a) wouldn't be programming computers and b) wouldn't be using this tutorial to do it.

The local function works in a similar way to my , but assigns temporary values to global variables. The my function creates new variables that have the same name. The distinction is important, but the reasons require perl proficiency beyond the scope of this humble tutorial. In practice, the difference is:

lexically scoped variables (those declared with my )are faster than non-lexically scoped variables.
local variables are visible to called subroutines.
my doesn't work on global variables like $, so you must use local .


Returning arrays
So that's the end of subroutines and parameters. Would you believe I have only scratched the surface? There are closures, prototypes, autoloading and references to learn. Not, however, in this tutorial. At least not yet. I'll finish with one last demonstration. You may have noticed that Perl returns one long list from subroutines. This is fine, but suppose you want two separate lists, for example two arrays? This is one way to do it:
($w1,$w2)=&wordfunc("Hello World"); # Assign the array references to scalars

print "@$w1 and @$w2\n"; # deference, ie access, the arrays referred to
#print "$w1 and $w2\n"; # uncomment this next time round

sub wordfunc {
my $phrase=shift;
my (@words,@word1,@word2); # declare three variables lexically
@words=split /\s+/,$phrase; # split the phrase on whitespace
@word1=split //,$words[0]; # create array of letters from the first word
@word2=split //,$words[1]; # and the second
return (\@word1,\@word2); # return references to the two arrays -- scalars
}

There is a lot going on there. It should be clear up until the return statement. As we know, Perl only returns a single list. So, we make Perl return a list of the arrays it has just created. Not the actual arrays themselves, but references to the arrays. A bit like a shopping list is a just a bit of paper, not the actual goods itself. The reference is created by use of the \ backslash.

Having returned two array references they are assigned to scalar variables. If you uncomment the second print line you'll see two references to arrays.

The next problem is how to dereference the references, or access the arrays. The construct @$xxx does that for us. I know I said I wouldn't cover references, and I haven't -- that is just a useful trick.

This little section is not designed as a complete guide, it is just a taster of things to come. Perl is immensely powerful. If you think something can't be done, the problem is likely to be it is beyond your ability, not that of Perl.

No comments: