Wednesday, August 6, 2008

Arrays

Lists, herds -- what are arrays?
Perl has two types of array, associative arrays (hashes) and arrays. Both types are lists. A list is just a collection of variables referred to as the collection, not as individual elements.

You can think of Perl's lists as a herd of animals. List context refers to the entire herd, scalar context refers to a single element. A list is a herd of variables. The variables don't have to be all of the same type -- you might have a herd of ten sheep, three lions and two wolves. It would probably be just three lions and one wolf before long, but bear with me. In the same way, you might have a Perl list of three scalar variables, two array elements and ten hash elements.

Certain types of lists are known by certain names. Just as a herd of sheep is called a flock, a herd of lions is called a pride, a herd of wolves is called a pack and a herd of managers a confusion, some types of Perl list have a special names.




Basic Array Work
For example, an array is an ordered list of scalar variables. This list can be referred to as a whole, or you can refer to individual elements in the list. The program below defines a an array, called @names . It puts five values into the array.

@names=("Muriel","Gavin","Susanne","Sarah","Anna");

print "The elements of \@names are @names\n";
print "The first element is $names[0] \n";
print "The third element is $names[2] \n";
print 'There are ',scalar(@names)," elements in the array\n";



Firstly, notice how we define @names . As it is in a list context, we are using parens. Each value is comma separated, which is Perl's default list delimiter. The double quotes are not necessary, but as these are string values it makes it easier to read and change later on.

Next, notice how we print it. Simply refer to it as a whole, that is in list context.. List context means referring to more than one element of a list at a time. The code print @names; will work perfectly well too. But....

I usually learn something about Perl every time I work with it. When running a course, a student taught me this trick which he had discovered:


@names=("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon");

print @names;
print "\n";
print "@names";

When a list is placed inside doublequotes, it is space delimited when interpolated. Useful.

If we want to do anything with the array as a list, that is doing something with more than one value, then refer to the array as @array . That's important. The @ prefix is used when you want to refer to more than one element of a list.

When you refer to more than one, but not all elements of an array that is known as a slice . Cake analogies are appropriate. Pie analogies are probably healthier but equally accurate.




Elements of Arrays
Arrays are not much use unless we can get to individual elements. Firstly, we are dealing with a single element of the list, so we cannot use @ which refers to multiple elements of the array. It is a single, scalar variable, so $ is used. Secondly, we must specify which element we want. That's easy - $array[0] for the first, $array[1] for the second and so forth. Array indexes start at 0, unless you do something which is so highly deprecated ('deprecated' means allowed, usually for backwards compatibility, but disapproved of because there are better ways) I'm not even going to mention it.

Finally, we force what is normally list context (more than one element) into scalar context (single element) to give us the amount of elements in the array. Without the scalar , it would be the same as the second line of the program.


How to refer to elements of an array
Please understand this:


$myvar="scalar variable";
@myvar=("one","element","of","an","array","called","myvar");

print $myvar; # refers to the contents of a scalar variable called myvar
print $myvar[1]; # refers to the second element of the array myvar
print @myvar; # refers to all the elements of array myvar


The two variables $myvar and @myvar are not, in any way, related. Not even distantly. Technically, they are in different namespaces.

Going back to the animal analogy, it is like having a dog named 'Myvar' and a goldfish called 'Myvar'. You'll never get the two mixed up because when you call 'Myvar !!!!' or open a can of dog food the 'Myvar' dog will come running and goldfish won't. Now, you couldn't have two dogs called 'Myvar' and in the same way you can't have two Perl variables in the same namespace called 'Myvar'.



More ways to access arrays
The element number can be a variable.
print "Enter a number :";
chomp ($x=);

@names=("Muriel","Gavin","Susanne","Sarah","Anna");

print "You requested element $x who is $names[$x]\n";

print "The index number of the last element is $#names \n";


This is useful. Notice the last line of the example. It returns the index number of the last element. Of course you could always just do this $last=scalar(@names)-1; but this is more efficient. It is an easy way to get the last element, as follows:

print "Enter the number of the element you wish to view :";
chomp ($x=);

@names=("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon");

print "The first two elements are @names[0,1]\n";
print "The first three elements are @names[0..2]\n";
print "You requested element $x who is $names[$x-1]\n"; # starts at 0
print "The elements before and after are : @names[$x-2,$x]\n";
print "The first, second, third and fifth elements are @names[0..2,4]\n";

print "a) The last element is $names[$#names]\n"; # one way
print "b) The last element is @names[-1]\n"; # different way


It looks complex, but it is not. Really. Notice you can have multiple values separated by a comma. As many as you like, in whatever order. The range operator .. gives you everything between and including the values. And finally look at how we print the last element - remember $#names gives us a number ? Simply enclose it inside square brackets and you have the last element.
Do also note that because element accesses such as [0,1] are more than one variable, we cannot use the scalar prefix, namely the $ symbol. We are accessing the array in list context, so we use the @ symbol. Doesn't matter that it is not the entire array. Remember, accessing more than one element of an array but not the entire array is called a slice. I won't go over the food analogies again.




For Loops


A for Loop demonstrated
All well and good, but what if we want to load each element of the array in turn ? Well, we could build a for loop like this:
@names=("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon");

for ($x=0; $x <= $#names; $x++) {
print "$names[$x]\n";
}

which sets $x to 0, runs the loop once, then adds one to $x , checks it is less than $#names , if so carries on. By the way, that was your introduction to for loops. Just to go into a little detail there, the for loop has three parts to it:
Initialisation
Test Condition
Modification
In this case, the variable $x is initialised to 0. It is immediately tested to see if it is smaller than, or equal to $#names . If that is true, then the block is executed once. Critically, if it is not true the block is not executed at all.
Once the block has been executed, the modification expression is evaluated. That's $x++ . Then, the test condition is checked to see if the block should be executed or not.


For loops with .. , the range operator
There is a another version:

for $x (0 .. $#names) {
print "$names[$x]\n";
}

which takes advantage of the range operator .. (two dots together). This simply gives $x the value of 0, then increments $x by 1 until it is equal to $#names .

foreach
For true beauty we must use foreach .

foreach $person (@names) {
print "$person";
}

This goes through each element ('iterates', another good technical word to use) of @names , and assigns each element in turn to the variable $person . Then you can do what you like with the variable. Much easier. You can use
for $person (@names) {
print "$person";
}

if you want. Makes no difference at all, aside from a little clarity.

The infamous $_
In fact, that gets shorter. And now I need to introduce you to $_ , which is the Default Input and Pattern Searching Variable.

foreach (@names) {
print "$_";
}

If you don't specify a variable to put each element into, $_ is used instead as it is the default for this operation, and many, many others in Perl. Including the print function :
foreach (@names) {
print ;
}

As we haven't supplied any arguments to print , $_ is printed as default. You'll be seeing a lot of $_ in Perl. Actually, that statement is not exactly true. You will be seeing lot of places where $_ is used, but quite often when it is used, it is not actually written. In the above example, you don't actually see $_ but you know it is there.

A Premature End to your loop
A loop, by its nature, continues. If that didn't make sense, start reading this sentence again.

The old jokes are the best, aren't they?

The joke above is a loop. You continue re-reading the sentence until you realise I'm trying to be funny. Then you exit the loop. Or maybe somebody doesn't exit it. Whatever, loops always run until the expression they are testing returns false. In the case of the examples above, a false value is returned when all the elements of the array have been cycled through, and the loop ends.

If you want an everlasting loop, just test an condition you know will always be true:

while (1) {
$x++;
print "$x: Did you know you can press CTRL-C to interrupt a perl program?\n";
}

Another way to exit a loop is a simple foreach over the elements, as we have seen. But if we don't know when we want to exit a loop? For example, suppose we want to print out a list of names but stop when we find one with a particular title? You are throwing a huge party, someone is allergic to vodka, and this person has drunk from the punch bowl despite being assured by someone holding two empty bottles of Absolut that he was just using the bottles to convey yet more orange juice into said punch bowl. So you need a doctor, and so you write a Perl script to find one from the list of attendees, wanting the doctor's name to be the last item printed:
@names=('Mrs Smith','Mr Jones','Ms Samuel','Dr Jansen','Sir Philip');

foreach $person (@names) {
print "$person\n";
last if $person=~/Dr /;
}

The last operator is our friend. Don't worry about the /Dr / business -- that is a regular expression which we cover next. All you need to know is that it returns true if the name begins with 'Dr '. When it does return true, last is operated and the loop ends early.




A little more control over the premature ending: Labels
So that's easy enough. But wait! We need a medical, human-fixer type doctor, not just anyone with a PhD. So, the same principle applies in this example here:
@names =('Mrs Smith','Mr Jones','Ms Samuel','Dr Jansen','Sir Philip');
@medics =('Dr Black','Dr Waymour','Dr Jansen','Dr Pettle');

foreach $person (@names) {
print "$person\n";
if ($person=~/Dr /) {
foreach $doc (@medics) {
print "\t$doc\n";
last if $doc eq $person;
}
}
}

Aside from showing one way to indent your code, this also demonstrates a nested loop. A nested loop is a loop within a loop. What happens is that the @names array is searched for a 'Dr ', and if it is found then the @medics array is searched to make sure the doctor is a human-fixing doctor not a professor of physics or something. The regular expression has been shifted into an if statement, where it works nicely as it only returns true or false.

The problem with the code is that after we find our medical doctor we want it to stop. But it doesn't. It only stops the loop it is in, so Dr Pettle never gets printed. However, the code just carries on with Sir Philip who is terribly sorry old chap, but can't be of any bally use at all, what ho! What we need is a way to break out of the entire loop from within a nest. Like so:

@names =('Mrs Smith','Mr Jones','Ms Samuel','Dr Jansen','Sir Philip');
@medics =('Dr Black','Dr Waymour','Dr Jansen','Dr Pettle');

LBL: foreach $person (@names) {
print "$person\n";
if ($person=~/Dr /) {
foreach $doc (@medics) {
print "\t$doc\n";
last LBL if $doc eq $person;
}
}
}

Only two changes here. We have defined a label, namely LBL. Instead of breaking out from the current loop, which is the default, we specify a label to break out to, which is in the outer loop. This works with as many nested loops as your brain can handle. You don't have to use uppercase names but for namespace reasons it is recommended, and you can call your labels whatever you please. I was just being unimaginative with the name of LBL, feel free to invent labels called DORIS or MATILDA if that's what floats your personal boat.



Changing the Elements of an Array
So we have @names . We want to change it. Run this:
print "Enter a name :";
chomp ($x=);

@names=("Muriel","Gavin","Susanne","Sarah");

print "@names\n";

push (@names, $x);

print "@names\n";

Fairly self explanatory. The push function just adds a value on to the end of the array. Of course, Perl being Perl, it doesn't have to be just the one value:
print "Enter a name :";
chop ($x=);

@names=("Muriel","Gavin","Susanne","Sarah");
@cities=("Brussels","Hamburg","London","Breda");

print "@names\n";

push (@names, $x, 10, @cities[2..4]);

print "@names\n";

This is worth looking at in more detail. It appears there is no fifth element of @cities , as referred to by @cities[2..4] .
Actually, there is a fifth element. Add this to the end of the example :

print "There are ",scalar(@names)," elements in \@names\n";

There appear to be 8 elements in @names . However, we have just proved there are in fact 9. The reason there are 9 is that we referred to non-existent elements of @cities , and Perl has quite happily extended @names to suit. The array @cities remains unchanged. Try poping the array if you don't believe me.
So that's push . Now for some...


Jiggerypokery with Arrays
@names=("Muriel","Gavin","Susanne","Sarah");
@cities=("Brussels","Hamburg","London","Breda");

&look;

$last=pop(@names);
unshift (@cities, $last);

&look;

sub look {
print "Names : @names\n";
print "Cities: @cities\n";
}

Now we have two arrays. The pop function removes the last element of an array and returns it, which means you can do something like assign the returned value to a variable. The unshift function adds a value to the beginning of the array. Hope you didn't forget that &subroutinename calls a subroutine. Presented below are the functions you can use to work with arrays:

A table of array hacking functions
push Adds value to the end of the array
pop Removes and returns value from end of array
shift Removes and returns value from beginning of array
unshift Adds value to the beginning of array


Now, accessing other elements of arrays. May I present the splice function ?


Splice
@names=("Muriel","Sarah","Susanne","Gavin");

&look;

@middle=splice (@names, 1, 2);

&look;

sub look {
print "Names : @names\n";
print "The Splice Girls are: @middle\n";
}

The first argument for splice is an array. Then second is the offset. The offset is the index number of the list element to begin splicing at. In this case it is 1. Then comes the number of elements to remove, which is sensibly 1 or more in this case. You can set it to 0 and perl, in true perl style, won't complain. Setting to 0 is handy because splice can add elements to the middle of an array, and if you don't want any deleted 0 is the number to use. Like so:
@names=("Muriel","Gavin","Susanne","Sarah");
@cities=("Brussels","Hamburg","London","Breda");

&look;

splice (@names, 1, 0, @cities[1..3]);

&look;

sub look {
print "Names : @names\n";
print "Cities: @cities\n";
}

Notice how the assignment to @middle has gone -- it is no longer relevant.

If you assign the result of a splice to a scalar then:

@names=("Muriel","Sarah","Susanne","Gavin");

&look;

$middle=splice (@names, 1, 2);

&look;

sub look {
print "Names : @names\n";
print "The Splice Girls are: $middle\n";
}

then the scalar is assigned the last element removed, or undef if it doesn't work at all.

The splice function is also a way to delete elements from an array. In fact, a discussion of :

No comments: