Wednesday, August 6, 2008

External Commands

Some ways to...
Perl can start external commands. There are five main ways to do this:
system
exec
Command Input, also known as `backticks`
Piping data from a process
Quote execute
We'll compare system and exec first.

Exec
Poor old exec is broken on Perl for Win32. What it should do is stop running your Perl script and start running whatever you tell it to. If it can't start the external process, it should return with an error code. This doesn't work properly under Perl for Win32. The exec function does work properly on the standard Perl distribution.



System
This runs an external command for you, then carries on with the script. It always returns, and the value it returns goes into $? . This means you can test to see if the program worked. Actually you are testing to see if it could be started, what the program does when it runs is outside your control if you use system .

This example demonstrates system in action. Run the 'vol' command from a command prompt first if you are not familiar with it. Then run the 'vole' command. I'm assuming you have no cute furry executables called vole on your system, or at least in the path. If you do have an executable called 'vole', be creative and change it.

system("vole");

print "\n\nResult: $?\n\n";

system("vol");

print "\n\nResult: $?\n\n";

As you can see, a successful system call returns 0. An unsuccessful one returns a value which you need to divide by 256 to get the real return value. Also notice you can see the output. And because system returns, the code after the first system call is executed. Not so with exec, which will terminate your perl script if it is successful. Perl's usual use of single and double quotes applies as per variable interpolation.

Backticks
These `` are different again to system and exec. They also start external processes, but return the output of the process. You can then do whatever you like with the output. If you aren't sure where backticks are on your keyboard, try the top left, just left of the 1 key. Often around there. Don't confuse single quotes '' with backticks `` .
$volume=`vol`;

print "The contents of the variable \$volume are:\n\n";

print $volume;

print "\nWe shall regexise this variable thus :\n\n";

$volume=~m#Volume in drive \w is (.*)#;

print "$1\n";

As you can see here, the Win32 vol command is executed. We just print it out, escaping the $ in the variable name. Then a simple regex, using # as a delimiter just in case you'd forgotten delimiters don't have to be / .

When to use external calls
Before you get carried away with creating elaborate scripts based on the output from NT's net commands, note there are plenty of excellent modules out there which do a very good job of this sort of thing, and that any form of external process call slows your script. Also note there are plenty of built in functions such as readdir which can be used instead of `dir` . You should use Perl functions where possible rather than calling external programs because Perl's functions are:

portable (usually, but there are exceptions). This means you can write a script on your Mac PowerBook, test it on an NT box and then use it live on your Unix box without modifying a single line of code;
faster, as every external process significantly slows your program;
don't usually require regexing to find the result you want;
don't rely on output in a particular format, which might be changed in the next version of your OS or application;
are more likely to be understood by a Perl programmer -- for example, $files=`ls`; on a Unix box means little to someone that doesn't know that ls is the Unix command for listing files, as dir is in Windows.
Don't start using backticks all over the place when system will do. You might get a very large return value which you don't need, and will consequently slurp lots of memory. Just use them when you actually want to check the returned strings.

Opening a Process
The problem with backticks is that you have to wait for the entire process to complete, then analyse the entire return code. This is a big problem if you have large return codes or slow processes. For example, the DOS command tree. If you aren't familiar with this command, run a DOS/command prompt, switch to the root directory (C:\ ) and type tree. Examine the wondrous output.
We can open a process, and pipe data in via a filehandle in exactly the same way you would read a file. The code below is exactly the same as opening a filehandle on a file, with two exceptions:

We use an external command, not a filename. That's the process name, in this case, tree.
A pipe, ie | is appended to the process name.
open TRIN, "tree c:\\ /a |" or die "Can't see the tree :$!";

while () {
print "$. $_";
}

Note the | which denotes that data is to be piped from the specified process. You can also pipe data to a process by using | as the first character.
As usual, $. is the line number. What we can do now is terminate our tree early. Environmentally unsound, but efficient.

open TRIN, "tree c:\\ /a |" or die "Can't see the tree :$!";

while () {
printf "%3s $_", $.;
last if $. == 10;
}

As soon as $. hits 10 we shut the process off by exiting the loop. Easy.

Except, maybe it won't. What if this was a long program, and you forgot about that particular line of code which exits the loop? Suppose that $. somehow went from 9 to 11, or was assigned to? It would never reach 10. So, to be safe

open TRIN, "tree c:\\ /a |" or die "Can't see the tree :$!";

while () {
printf "%3s $_", $.;
last if $. >= 10;
}

exit your loops in a paranoid manner, unless you really mean only to exit when at line ten. For maximum safety, maybe you should create your own counter variable because $. is a global variable. I'm not necessarily advocating doing any of the above, but I am suggested these things are considered.

You might notice the presence of a new keyword - printf . It works like print , but formats the string before printing. The formatting is controlled by such parameters as %3s , which means "pad out to a total of three spaces". After the doublequoted string comes whatever you want to be printed in the format specified. Some examples follow. Just uncomment each line in turn to see what it does. There is a lot of new stuff below, but try and work out what is happening. An explanation follows after the code.

$windir=$ENV{'WINDIR'}; # yes, you can access the environment variables !

$x=0;

opendir WDIR, "$windir" or die "Can't open $windir !!! Panic : $!";

while ($file= readdir WDIR) {
next if $file=~/^\./; # try commenting this line to see why it is there

$age= -M "$windir/$file"; # -M returns the age in days
$age=~s/(\d*\.\d{3}).*/$1/; # hmmmmm

#### %4.4d - must take up 4 columns, and pad with 0s to make up space
#### and minimum width is also 4
#### %10s - must take up 10 columns, pad with spaces
# printf "%4.4d %10s %45s \n", $x, $age, $file;

#### %-10s - left justify
# printf "%4.4d %-10s %-45s \n", $x, $age, $file;

#### %10.3 - use 10 columns, pad with 0s if less than 3 columns used
# printf "%4.4d %10.3d %45s \n", $x, $age, $file;

$x++;

last if $x==15; # we don't want to go through all the files :-)
}

There are some intentionally new functions there. When you start hacking Perl (actually, you already started if you have worked through this far) you'll see a lot of example code. Try and understand the above, then read the explanation below.
Firstly, all environment variables can be accessed and set via Perl. They are in the %ENV hash. If you aren't sure what environment variables are, refer to your friendly Microsoft documentation or books. The best known environment variable is path, and you can see its value and that of all other environment variables by simply typing set at your command prompt.

The regex /^\./ bounces out invalid entries before we bother do any processing on them. Good programming practice. What it matches is "anything that begins with '.'". The caret anchors the match to the beginning of the string, and as . is a metacharacter it has to be escaped.

Perl has several tests to apply on files. The -M test returns the age in days. See the documentation for similar tests. Note that the calls to readdir return just the file, not the complete pathname. As you were careful to use a variable for the directory to be opened rather than hardcoding it (horrors) it is no trouble to glue it together by using doublequotes.

Try commenting out $age=~s/(\d*\.\d{3}).*/$1/ and note the size of $age . It could do with a trim. Just for regex practice, we make it a little smaller. What the regex does is:

start capturing with (
look for 0 or more digits \d*
then a . (escaped)
followed by three digits \d{3}
and that's all we want to capture so the parens are closed. )
Finally, everything else in the string is matched .* where . is any character (almost) and * 0 or more. This is pretty much guaranteed to match to the end of the line
Having matched the entire string (and put part of it into $1 by using parens) we simply replace the string with what we have matched.
Easy !
Mention should also be made of sprintf , which is exactly like printf except it doesn't print. You just use it to format strings, which you can do something with later. For example :

open TRIN, "tree c:\\ /a |" or die "Can't see the tree :$!";

while () {
$line= sprintf "%3s $_", $.;
print $line;
last if $. == 10;
}



Quote execute
@opts=qw(w on ad oe b);

for (@opts) {
$result=qx(dir /$_);
print "dir /$_ resulted in:\n$result",'-' x 79;
sleep 1;
}

Anything within qx( ) is executed, and duly variable interpolated. This sample also demonstrated qw which is 'quote words', so the elements of @opts are delimited by word boundaries, not the usual commas. You can also use for instead of foreach if you want to save typing four character for the sake of legibility.

You may have noticed that system outputs the result of the command to the screen whereas qx does not. Each to its own.

No comments: