Wednesday, August 6, 2008

The Tutorial

Your First Time
Assuming all has gone to plan, you can now create your first Perl script. Follow these instructions, but before you start read them through once, then begin. That's a good idea with any form of computer-related procedure. So, to begin:


Create a new directory for your perl scripts, separate to your data files and the perl installation. For example, c:\scripts\, which is what I'll assume you are using in this tutorial.
Start up whatever text editor you're going to hack Perl with. Notepad.Exe is just fine. If you can't find Notepad on your Start menu, press the Start button, then select Run, type in 'notepad' and click OK.
Type the following in Notepad
print "My first Perl script\n";
Save the to c:\scripts\myfirst.pl. Be careful! Notepad will may save files with a .txt extension, so you will end up with myfirst.txt.pl by default. Perl won't mind, it'll still execute the file. If your version of Notepad does this, select "All files" before saving or rename the file then load it again. Better yet, use a decent text editor!
You don't need to exit Notepad -- keep it open, as we'll be making changes very soon.
Switch to your command prompt. If you don't know how to start a command prompt, click 'Start' and then 'Run'. If using Windows 9x, type in 'command' and press enter. If using NT, type in 'cmd' and press Enter.
Change to your perl scripts directory, for example cd \scripts .
Hold your breath, and execute the script: perl myfirst.pl

and you'll see the output. Welcome to the world of Perl ! See what I mean about it being easy to start ? However, it is difficult to finish with Perl once you begin :-)




What if it doesn't...?
So you typed in perl myfirst.pl and you didn't see My first Perl script on the screen. If you saw "bad command or filename" then either you haven't installed Perl or perl.exe is not in your path. Probably the latter. Reboot, then try again.

If you saw Can't open perl script "xxxx.pl": No such file or directory then perl is defintely installed, but you have either got the name of the script wrong or the script is not in the same directory as where you are trying to run it from. For example, maybe you saved in script in c:\windows and you are in c:\scripts so of course Perl complains it can't find the script. Could you? Well, don't expect Perl to then. You don't have to run the script from the directory in which it resides, but it is easier.


Assuming it's now all right...
W need to analyse what's going on here a little. First note that the line ends with a semicolon ; . Almost all lines of code in Perl have to end with semicolons, and those that don't have to will accept semicolons anyway. The moral is -- use semicolons. Sorry; the moral is; use semicolons.

Oh, one more thing -- if you haven't already done so, continue breathing.

Also note the \n . This is the code to tell Perl to output a newline. What's a newline? Delete the \n from the program and run it again:

print "My first Perl script";
and all should become clear. You have now written your first Perl script.



Shebang
Almost every Perl book is written for UN*X, which is a problem for Win32. This leads to scripts like:

#!c:/perl/perl.exe

print "I'm a cool Perl hacker\n";


The function of the 'shebang' line is to tell the shell how to execute the file. Under UNIX, this makes sense. Under Win32, the system must already know how to execute the file before it is loaded so the line is not needed.

However, the line is not completely ignored, as it is searched for any switches you may have given Perl (for example -w to turn on warnings).

You may also choose to add the line so your scripts run directly on UNIX without modification, as UNIX boxes probably do need it. Win32 systems do not. We shall continue with the lesson.


Variables


Scalars
So Perl is working, and you are working with Perl. Now for something more interesting than simple printing. Variables. Let's take simple scalar variables first. A scalar variable is a single value. Like $var=10 which sets the variables $var to the value of 10. Later, we'll look at lists like arrays and hashes, where @var refers to more than one value. For the moment, remember that Scalar is Singular. If weird metaphors help, think of lots of scaly snakes at a singles bar. If that didn't help, I apologise for putting the thought into your mind.



$ % @ are Good Things
If you have any experience with other programming languages you might be surprised by the code $var=10. With most languages, if you want to assign the value 10 to a variable called var you'd write var=10.

Not so in Perl. This is a Feature. All variables are prefixed with a symbol such as $ @ % . This has certain advantages, like making programs easier to read. Honestly, I'm serious! It just takes some getting used to. The prefixes mean that you can see where the variables are quite easily. And not only that, what sort of variable it is. The human language German has a similar principle (except nouns are capitalised, not prefixed with $ and Perl is easier to pronounce). You'll agree later, I think.

So, ever onwards. Time to try some more variables:

$string="perl";
$num1=20;
$num2=10.75;
print "The string is $string, number 1 is $num1 and number 2 is $num2\n";



Typing
A closer look...notice you don't have to say what type of variable you are declaring. In other languages you need to say if the variable is a string, array, what sort of number it is and so on. You might even have to declare what type of number it is. As an example, in Java you'd been saying things like int var=10 which defines the variable var as an integer, with the value 10.

So, why do these other programming languages force you to declare exactly what your variables are? Wouldn't it be easier if we could just not bother?

For short programs, yes. For really big projects with many programmers working on the same application, no. That's because forcing variable type declaration also forces a certain discipline and rigour which is what you need on big projects.

As you know, Perl is not designed for gigantic software engineering efforts. It is all about small, quick programs. For these purposes you don't need the rigour of variable controls as much, so Perl doesn't bother.

This idea of forcing a programmer to declare what sort of variable is being created is called typing. As Perl doesn't by default enforce any rules on typing, it is said to be a loosely typed language, as opposed to something like C++ which is strongly typed.



Variable Interpolation
We still haven't finished learning from that humble bit of code. To refresh your memory, here it is again:

$string="perl";
$num1=20;
$num2=10.75;
print "The string is $string, number 1 is $num1 and number 2 is $num2\n";

Notice the way the variables are used in the string. Sticking variables inside of strings has a technical term - "variable interpolation". Now, if we didn't have the handy $ prefix for we'd have to do something like the example below, which is pseudocode. Pseudocode is code to demonstrate a concept, not designed to be run. Like certain Microsoft software.

print "The string is ".string." and the number is ".num."\n";

which is much more work. Convinced about those prefixes yet ?

Try running the following code:

$string="perl";
$num=20;
print "Doubles: The string is $string and the number is $num\n";
print 'Singles: The string is $string and the number is $num\n';


Double quotes allow the aforementioned variable interpolation. Single quotes do not. Both have their uses as you will see later, depending on whether you wish to interpolate anything.



Changing Variables


Auto(de|in)crements
If you want to add 1 to a variable you can, logically, do this; $num=$num+1 . There is a shorter way to do this, which is $num++. This is an autoincrement. Guess what this is; $num-- . Yes, an autodecrement.

This example illustrates the above:

$num=10;
print "\$num is $num\n";

$num++;
print "\$num is $num\n";

$num--;
print "\$num is $num\n";

$num+=3;
print "\$num is $num\n";

The last example demonstrates that it doesn't have to be just 1 you can add or decrease by.




Escaping
There's something else new in the code above. The \ . You can see what this does -- it 'escapes' the special meaning of $ .

Escaping means that just the $ symbol is printed instead of it referring to a variable.

Actually \ has a deeper meaning -- it escapes all of Perl's special characters, not just $ . Also, it turns some non-special characters into something special. Like what ? Like n . Add the magic \ and the humble 'n' becomes the mighty NewLine ! The \ character can also escape itself. So if you want to print a single \ try:

print "the MS-DOS path is c:\\scripts\\";

Oh, '\' is also used for other things like references. But that's not even covered here.

There is a technical term for these 'special characters' such as @ $ %. They are called metacharacters. Perl uses plenty of metacharacters. In fact, you'll wear your keyboard pretty evenly during a night's perl hacking. I think it is safe to say that Perl uses every possible keystroke and shifted keystroke on a standard US PC keyboard.

You'll be working with all sorts of obscure characters in your Perl hacking career, and I also mean those on your keyboard. This has earned perl a reputation for being difficult to understand. That's entirely true. Perl does have such a reputation, no doubt about it.

Is the reputation justified? In my opinion, Perl does have a short but steep learning curve to begin with simply because it is so different. However, once you learn the character meanings reading perl code becomes much easier precisely because of all these strange characters.




Context: About Perl and @^$%&~`/?
Perl uses so many weird characters that there aren't enough to go round. So sometimes the same character has two or more meanings, depending on its context. As an example, the humble dot . can join two variables together, act as a wildcard or become a range operator if there are two of them together. The caret ^ has different effects in [^abc] as opposed to [a^bc] .

If this sounds crazy, think about the English language. What do the following mean to you ?

MEAN
POLISH
LIKE

Mean is, in one context, is a word to used describe the purpose of something. It is also another word for average. Furthermore, it describes a nasty person, or a person who doesn't like spending money, and is used in slang to refer to something impressive and good.

That's five different uses for 'mean', and you don't have any trouble understanding which one I mean due to context.

Polish, when capitalised, can either mean pertaining to the country Poland, or the act of making something shiny. And 'like' can mean similar to, or affection for.

So, when you speak or write English (think of two, to and too) you know what these words mean by their context. It is exactly the same way with Perl. Just don't assume a given metacharacter always means what you first thought it did.

To finish off this section, try the following:


Strings and Increments
$string="perl";
$num=20;
$mx=3;

print "The string is $string and the number is $num\n";

$num*=$mx;
$string++;
print "The string is $string and the number is $num\n";

Note the easy shortcut *= meaning 'multiply $num by $mx' or, $num=$num*$mx . Of course Perl supports the usual + - * / ** % operators. The last two are exponentiation (to the power of) and modulus (remainder of x divided by y). Also note the way you can increment a string ! Is this language flexible or what ?

Print: A List Operator
The print function is a list operator. That means it accepts a list of things to print, separated by commas. As an example:
print "a doublequoted string ", $var, 'that was a variable called var', $num," and a newline \n";

Of course, you just put all the above inside a single doublequoted string:
print "a doublequoted string $var that was a variable called var $num and a newline \n";

to achieve the same effect. The advantage of using the print function in list context is that expressions are evaluated before being printed. For example, try this:
$var="Perl";
$num=10;
print "Two \$nums are $num * 2 and adding one to \$var makes $var++\n";
print "Two \$nums are ", $num * 2," and adding one to \$var makes ", $var++,"\n";

You might have been slightly surprised by the result of that last experiment. In particular, what happened to our variable $var ? It should have been incremented by one, resulting in Perm. The reason being that 'm' is the next letter after 'l' :-)

Actually, it was incremented by 1. We are postincrementing $var++ the variable, rather than preincrementing it.

The difference is that with postincrements, the value of the variable is returned, then the operation is performed on it. So in the example above, the current value of $var was returned to the print function, then 1 was added. You can prove this to yourself by adding the line print "\$var is now $var\n"; to the end of the example above.

If we want the operation to be performed on $var before the value is returned to the print function, then preincrement is the way to go. ++$var will do the trick.



Subroutines -- A First Look
Let's take a another look at the example we used to show how the autoincrement system works. Messy, isn't it ? This is Batch File Writing Mentality. Notice how we use exactly the same code four times. Why not just put it in a subroutine?

$num=10; # sets $num to 10
&print_results; # prints variable $num

$num++;
&print_results;

$num*=3;
&print_results;

$num/=3;
&print_results;

sub print_results {
print "\$num is $num\n";
}

Easier and neater. The subroutine can go anywhere in your script, at the beginning, end, middle...makes no difference. Personally I put all mine at the bottom and reserve the top part for setting variables and main program flow.

A subroutine is just some code you want to use more than once in the same script. In Perl, a subroutine is a user-defined function. There is no difference. For the purposes of clarity I'll refer to them as subroutines.

A subroutine is defined by starting with sub then the name. After that you need a curly left bracket { , then all the code for your subroutine. Finish it off with a closing brace } . The area between the two braces is called a block. Remember this. There are such things as anonymous subroutines but not here. Everything here has a name.

Subroutines are usually called by prefixing their name with an ampersand, that is one of these -- & , like so &print_results; . It used to be cool to omit the & prefix but all perl hackers are now encouraged to use it to avoid ambiguity. Ambiguity can hurt you if you don't avoid it.

If you are worrying about variable visibility, don't. All the variables we are using so far are visible everywhere. You can restrict visibility quite easily, but that's not important right now. If you weren't worrying about variable visibility, please don't start. I'd tell you it's not important but that'll only make you worried. (paranoid ?) We'll cover it later.




Comments
Did you see a # crept in there. That's a comment. Everything after a # is ignored. You can't continue it onto a newline however, so if your comment won't fit on one line start a new one with # . There are ways to create Plain Old Documentation (POD) and more ways to comment but they are not detailed here.

No comments: