Wednesday, August 6, 2008

Associative Arrays

The Basics
Very, very useful. First, a quick recap on arrays. Arrays are an ordered list of scalar variables, which you access by their index number starting at 0. The elements in arrays always stay in the same order.

Hashes are a list of scalars, but instead of being accessed by index number, they are accessed by a key. The tables below illustrate the point:


@myarray
Index No. Value
0 The Netherlands
1 Belgium
2 Germany
3 Monaco
4 Spain
%myhash
Key Value
NL The Netherlands
BE Belgium
DE Germany
MC Monaco
ES Spain



So if we want 'Belgium' from @myarray and also from %myhash , it'll be:

print "$myarray[1]";
print "$myhash{'BE'}";

Notice that the $ prefix is used, because it is a scalar variable. Despite the fact it is part of a list, it is still a scalar variable. The hash syntax is simply to use braces { } instead of square brackets.

So why use hashes ? When you want to look something up by a keyword. Suppose we wanted to create a program which returns the name of the country when given a country code. We'd input ES, and the program would come back with Spain.

You could do it with arrays. It would be messy however. One possible approach:

create @country , and give it values such as 'ES,Spain'
Itierate over the entire array and
split each element of the array, and check the first result to see if it matches the input
If so, return the index
@countries=('NL,The Netherlands','BE,Belgium','DE,Germany','MC,Monaco','ES,Spain');

print "Enter the country code:";
chop ($find=);

foreach (@countries) {
($code,$name)=split /,/;
if ($find=~/$code/i) {
print "$name has the code $code\n";
}
}

Complex and slow. We could also store a reference to another array in each element of @countries , but that is not efficient. Whatever way we choose, you still need to search the whole thing. And what if @countries is a big array ? See how much easier a hash is:

A Hash in Action
%countries=('NL','The Netherlands','BE','Belgium','DE','Germany','MC','Monaco','ES','Spain');

print "Enter the country code:";
chop ($find=);

$find=~tr/a-z/A-Z/;
print "$countries{$find} has the code $find\n";

Very easy. All we need to do is make sure everything is in uppercase with tr and we are there. Notice the way %countries is defined - exactly the same as a normal array, except that the values are put into the hash in key/value pairs.




When you should use hashes
So why use arrays ? One excellent reason is because when an array is created, its variables stay in the same order you created them in. With a hash, perl reorders elements for quick access. Add print %countries; to the end of that program above and run it. See what I mean ? No recognisable sequence at all. It's like trying to herd cats. If you were writing code that stored a list of variables over time and you wanted it back in the order you found it in, don't use a hash.

Finally, you should know that each key of a hash must be unique. Stands to reason, if you think about it. You are accessing the hash via keys, so how can you have two keys named 'NL' or something ? If you do define a certain key twice, the second value overwrites the first. This is a feature, and useful. The values of a hash can be duplicates, but never the keys.

If you want to assign to a hash, there is of course no concept of push , pop and splice etc. Instead:


Hash Hacking Functions
Assigning $countries{PT}='Portugal';
Deleting delete $countries{NL};


Accessing Your Hash

Assuming you keep the same %countries hash as above, here are some useful ways to access it:

All the keys print keys %countries;
All the values print values %countries;

A Slice of Hash :-) print @countries{'NL','BE'};
How many elements ? print scalar(keys %countries);
Does the key exist ? print "It's there !\n" if exists $countries{'NL'};


Well, that last one is not an access as a such but useful anyway.




More Hash Access: Iteration, keys and values
You may have noticed that keys and values return a list. And we can iterate over a list, using foreach :
foreach (keys %countries) {
print "The key $_ contains $countries{$_}\n";
}

which is useful. Note how any list can be fed to foreach , and off it goes. As usual, there is another way to do the above:
while (($code,$name)=each %countries) {
print "The key $code contains $name\n";
}

The each function returns each key/value pair of the hash, and is slightly faster. In this example we assign them to a list (you spotted the parens ?) and away we go. Eventually there are no more pairs, which returns false to the while loop and it stops.

If you are into brevity, both the above can be accomplished in a single line:

print "The key $code contains $name\n" while ($code,$name)=each %countries;

print "The key $_ contains $countries{$_}\n" foreach keys %countries;


Note -- this won't win any prizes for easily readable code by non-programmers of Perl.

No comments: