Wednesday, August 6, 2008

Grep and Map

Grep
If you want to search a list, and create another list of things you found, grep is one solution. This is an example, which also demonstrates join again :
@stuff=qw(flying gliding skiing dancing parties racing); # quote-worded list

@new = grep /ing/, @stuff; # Creates @new, which contains elements of @stuff
# matching with 'ing' in them.

print join ":",@stuff,"\n"; # first makes one string out of the elements of @stuff, joined
# with ':' , then prints it, then prints \n

print join ":",@new,"\n";

Remember qw means 'quote words', so word boundaries are used as delimiters instead. The grep function must be fed a list on the right hand side. On the left side, you may assign the results to a list or a scalar variable. Assigning to a list gives you each actual element, and to a scalar gives you the number of matches found:
@stuff=qw(flying gliding skiing dancing parties racing);

$new = grep /ing/, @stuff;

print join ":",@stuff,"\n";

print "Found $new elements of \@stuff which matched\n";

If you decide to modify the elements on their way through grep , you actually modify the original list. Be careful out there.
@stuff=qw(flying gliding skiing dancing parties racing);

@new = grep s/ing//, @stuff;

print join ":",@stuff,"\n";
print join ":",@new,"\n";

To determine what actually matches you can either use an expression or a block. Up to now we've been using expressions, but when things become more complicated use a block:
@stuff=qw(flying gliding skiing dancing parties racing);

@new = grep { s/ing// if /^[gsp]/ } @stuff;

print join ":",@stuff,"\n";
print join ":",@new,"\n";

Try removing the braces and you'll get an error. Notice that the comma before the list has gone. It is now obvious where the expression ends, as it is inside a block delimited with { } . The regex says if the element begins with g, s or p, then remove ing. The result is only assigned to @new if the expression is completely true - 'parties' does begin with p, so that works, but s/ing// fails so the overall result is false, and the value is not assigned to @new .

Map
Map works the same way as grep , in that they both iterate over a list, and return a list. There are two important differences however:
grep returns the value of everything it evaluates to be true;
map returns the results of everything it evaluates.
As usual, an example will assist the penny in dropping, clear the fog and turn on the light (if not make my metaphors easier to understand):
@stuff=qw(flying gliding skiing dancing parties racing);

print "There are ",scalar(@stuff)," elements in \@stuff\n";
print join ":",@stuff,"\n";

@mapped = map /ing/, @stuff;
@grepped = grep /ing/, @stuff;

print "There are ",scalar(@stuff)," elements in \@stuff\n";
print join ":",@stuff,"\n";

print "There are ",scalar(@mapped)," elements in \@mapped\n";
print join ":",@mapped,"\n";

print "There are ",scalar(@grepped)," elements in \@grepped\n";
print join ":",@grepped,"\n";

You can see that @mapped is just a list of 1's. Notice that there are 5 ones whereas there are six elements in the original array, @stuff. This is because @mapped contains the true results of map -- in every case the expression /ing/ is successful, except for 'parties'.

In that case there the expression is false, so the result is discarded. Contrast this action with the grep function, which returns the actual value, but only if it is true. Try this:

@letters=(a,b,c,d,e);

@ords=map ord, @letters;
print join ":",@ords,"\n";

@chrs=map chr, @ords;
print join ":",@chrs,"\n";

This uses the ord function to change each letter into its ASCII equivalent, then the chr function convert ASCII numbers to characters. If you change map to grep in the example above, you can see that nothing appears to happen. What is happening is that grep is trying the expression on each element, and if it succeeds (is true) it returns the element, not the result. The expression succeeds for each element, so each element is returned in turn. Another example:
@stuff=qw(flying gliding skiing dancing parties racing);

print join ":",@stuff,"\n";

@mapped = map { s/(^[gsp])/$1 x 2/e } @stuff;
@grepped = grep { s/(^[gsp])/$1 x 2/e } @stuff;

print join ":",@stuff,"\n";
print join ":",@mapped,"\n";
print join ":",@grepped,"\n";

Recapping on regex, what that does is match any element beginning with g, s or p, and replace it with the same element twice. The caret ^ forces a match at the beginning of the string, the [square brackets] denote a character class, and /e forces Perl to evaluate the RHS as an expression.
The output from this is a mixture of 1 and nothing for map , and a three-element array called @grepped from grep. Yet another example:

@mapped = map { chop } @stuff;
@grepped = grep { chop } @stuff;

The chop function removes the last character from a string, and returns it. So that's what you get back from ^ , the result of the expression. The grep function gives you the mangled remains of the original value.

Writing your own grep and map functions
Finally, you can write your own functions:

@stuff=qw(flying gliding skiing dancing parties racing);

print join ":",@stuff,"\n";

@mapped = map { &isit } @stuff;
@grepped = grep { &isit } @stuff;

print join ":",@mapped,"\n";
print join ":",@grepped,"\n";

sub isit {
($word)=/(^.*)ing/;

if (length $word == 3) {
return "ok";
} else {
return 0;
}
}

The subroutine isit first grabs everything up until 'ing', puts it into $word , then returns 'ok' if the there are three characters in $word . If not, it returns the false value 0. You can make these subroutines (think of them as functions) as complex as you like.
Sometimes it is very useful to have map return the actual value, rather than the result. The answer is easy, but not obvious. Remember that subroutines return the value of the last expression evaluated? So, in this case, do blocks. What if the expression was, very simply:

@grepstuff=@mapstuff=qw(flying gliding skiing dancing parties racing);

print join " ",map { s/(^[gsp])/$1 x 2/e } @mapstuff;
print "\n";
print join " ",grep { s/(^[gsp])/$1 x 2/e } @grepstuff;

Now, make sure $_ is the last thing evaluated:
@grepstuff=@mapstuff=qw(flying gliding skiing dancing parties racing);

print join " ",map { s/(^[gsp])/$1 x 2/e;$_} @mapstuff;
print "\n";
print join " ",grep { s/(^[gsp])/$1 x 2/e } @grepstuff;

and there you have it. Now you understand that you can go and impress your friends, but please don't count on success.

No comments: