Perl Mini-Tutorial

Written 2/2013 by Wayne Pollock, Tampa Florida USA.

 

Perl was invented by Larry Wall to solve some Unix scripting problems.  Other methods involve learning a variety of filter commands, some quite complex (such as awk and sed), and learning how to “glue” these pieces together with shell constructs.  This is difficult; passing the values from one part of the script to another often involve complex quoting, named pipes, or temporary files.

Perl was designed as a single scripting language that combined all the features (and then some) of other filters plus the shell, into a single scripting language.  Now you only need to know a single filter command.  While complex, Perl is forgiving of style.  The motto is “there is more than one way to do it in Perl.”

In additional to the powerful built-in string, regular expression, and file processing capabilities in Perl, it is extensible with modules.  A vast number have been written and tested, and are available through the Comprehensive Perl Archive Network (CPAN), (discussed below).

Perl is so adept at these tasks it became the standard scripting language for CGI programming (for websites).  Perl regular expression parsing is second to none and is often used in other languages (referred to as Perl Compatible REs).

Fortunately, you don’t have to learn all of Perl to create very useful “one liners”.  Perl is fully documented in a variety of formats including man pages (see perltoc, perlintro, perlretut, and perlfaq) and perldoc -f func

Perl was invented by a linguist who felt that languages should be flexible.  In Perl you can say “if (expr) cmd;” but you can also say “cmd if (expr);” Over time Perl has grown and supports different styles such as object-oriented programming.  Because of the size of the language, most people only understand a subset of Perl; if the reader doesn’t know the same subset as the author, a Perl script can be unreadable.

Simpler (but nearly as powerful) scripting languages have started to become popular, including Python and Ruby (show demos).

 

The following (very) brief intro to Perl is adapted from How to Set Up and Maintain a Web Site 2nd edition, by Lincoln D. Stein, (C)1997 Addison Wesley, pages 469-472.

Perl supports three basic kinds of variables:  simple variables known as scalars, array variables (which are lists of values), and hashes (also called associative arrays).  The names of variables start with a character to indicate their type: $scalar, @array, and %hash.  Variables are automatically initialized.

When referring to elements of arrays and hashes the leading character indicates the type of the element: $ary[1] and $hash{'foo'}.  Notice how Perl uses square braces to index into an array, and curly braces to return a value from a hash.

Perl scripts allow blank lines and comments that start with (a word starting with) a “#” and continues through the end of that line.

Like awk, and unlike Python, strings and numbers are converted back and forth as needed.

Strings in single quotes are taken literally; with double quotes, the string is scanned for variables and escape-sequences (e.g., “\n”) which get replaced with their values.  (When printing whole arrays, double quotes work best.)

In Perl, all statements end with a semicolon.  So a simple (first) Perl script:

     #!/usr/bin/perl -TW
     print "Hello, World!\n";

The options above enable extra checks (“Taint mode”) and warnings.  (Note, if you runt the script from the command line, you need to specify “-T” if it is also in the she-bang line,)

You can also list the script on the command line, using the “-e” option:

     perl -TWe 'print "Hello, World!\n";'

(The final semicolon in your script doesn’t seem to be required.)

Functions such as print can have parenthesis around the argument list, but that is optional.  So:

            print( "Hello, World!\n" );
and:     print "Hello, World!\n";
also:   
$msg = "Hello, World!\n";  print $msg;

(Using single quotes would print the backslash-N literally.)

Arrays hold ordered lists of values, using a zero-based index:

     @stooges = ( 'moe', 'larry', 'curly' );
     print "@stooges";      # print the list
     print @stooges;        # print w/o spaces
     print @stooges . "\n"; # print 3
     print $stooges[0], "\n";
     print $stooges[@stooges - 1], "\n";
     ($moe, $larry) = @stooges;

Hashes hold unordered lists of values, each indexed by a string key:

%partner = ( "Laurel", "Hardy", "Abbot", "Costello" );
%partner = ("Laurel" => "Hardy", "Abbot"=>"Costello");
$partner{"Adam"} = "Eve";
print "$partner{'Abbot'}\n";
print keys(%partner), "\n";

Perl removes redundant parentheses, so the following are equivalent:

            @list = ( 'a', ('b', 'c'), 'd' );
     @list = ( 'a', 'b', 'c', 'd' );

To generate arrays of arrays, you need to store a reference to the sub-array.  These are generated by using square braces instead of parenthesis:

            @list = ( 'a', ['b', 'c'], 'd' );

(print "@list\n"; shows a and d, but a reference to a list, not “b c”!  To dereference a reference use curly braces around the reference, like this:

     print "$list[0] | @{$list[1]} | $list[2]\n";

Besides the usual math operators (including “**”) Perl uses a period for string concatenation: "a" . "b" and an x for repetition: 'a' x 3 (=”aaa”).

You can define a range in Perl with: @range = (1 .. 10); or for $i (1 .. 10).

For logical comparisons Perl uses standard ops for numeric comparisons (“==”, “!=”, “<”, etc.) but the following for string comparisons: eq ne lt le gt ge cmp.  (the $a cmp $b operator and the numerical equivalent of $a <=> $b returns -1, 0, or +1 for $b greater than $a, equal to, or $a greater than $b.  You can also test files with: -e file (exists), -r file (readable), -d file (directory), and others.  There are also several versions of and, or, and not operators.

To test an expression for true or false, an expression is converted to a string.  Then if "0" or "" it is false, otherwise it is true (so 0.0 which converts to "0" is false but "0.0" is true).

Like awk, Perl breaks up input lines into fields you can play with or test (if you ask it too; in awk, it happens automatically).  The current line is put into $_.

You must request Perl to break the line into fields by running the split function.  With no arguments, this will split the current line (“$_”) into an array of fields that are separated by white-space (and returns it; older versions of Perl implicitly set “@_”, but no longer).  So to print the second field of each input line (the “-n” means run for each line):

cat file | perl -ne \
'chomp; @words = split; print "$words[1]\n";'

You can print the last word like this:

cat file | perl -ne \
'chomp; @words = split; print "$words[@words-1]\n";'

Loops (while, until for, foreach)

foreach $i (1..5) {print "$i ";}  // Uses $_ if no $i
while (expr) {...}       until (expr) {...}
for (init; test; incr) {...} 
// can use for instead of foreach

if (expr) { statement...}

else, elsif, last (=break) and next (=continue).

Use statement if (cond);  or statement unless (cond);
or statement while (expr);

Input:  <> means read a line from stdin (incl. EOL), returns 0 on EOF.  To read from file:

open(NAME, "filename") or die ("msg: $!\n");
$line = <NAME>;   while ( <NAME> ) { ... }

(Note: input goes into $_ if you don’t put it elsewhere.  Some common idioms:

while (<>) { # reads a line into $_
   print;    # prints $_
}
if (/foo/)
  # means if $_ matches /foo/, a.k.a. if ( $_ =~ m/foo/)
                             # a.k.a.
if ( m/foo/ )

Use chomp $var to remove trailing newline (if any) from $var (or $_).

=~” means “bind to”.  So value =~ m/foo/ means match value against /foo/.  Also used with s// (substitution operator), tr/// (translate operator), and others.  Returns true (1) or false (0) if matched.  So:

$bar =~ s/a/b/  # changes a to b in $bar; returns 1 if any change made.

A key benefit of Perl in shell scripts is the powerful regular expression language.

Perl has some command line arguments that wrap a “one-liner” in one or another type of loop, allowing Perl to operate just like sed (“-p”) or sed -n (“-n”).

Command Line Options

She-bang:   #!/usr/bin/env perl  or #!/usr/bin/perl -Tw

-c                   check the script for syntax errors

-e 'script'          repeat for multiple scripts on one cmd line.

-i[ext]   process input (“<>”) in place by renaming the input by adding .ext and redirecting the output.  If no .ext than original isn’t saved.

-n                   Puts a loop around the script: LINE: while(<>){script}
(Can use
LINE in next and last.)  This is much like sed -n.

-p                   Similar to -n, this put makes Perl act like sed (process then print each line).  This is the same as the above loop, plus: continue{print or die "-p destination:$!\n"}

-T                   Force taint checks even if not running suid/sgid (which does -T by default).  This makes sure no un-processed user input can be used in dangerous ways.  (Very useful for CGI!)

-w                   Turns on several useful warnings.

-W                   Turns on all possible warnings.

Perl also has a directive you can add to the top of your script:

   use strict;

Forces you do declare variables before use, and other “best practices”.

Using CPAN:

Run cpan (or “perl -MCPAN -e shell”) once to configure it interactively.  The defaults are usually good enough.  To re-configure run the cpan command “o conf init”.  cpan is best run as root, so installed stuff can be automatically put into the correct places.

The cpan command can be run interactively (then you say “install foo” or “make foo”), or just run cpan moduleName.  To install the latest version of cpan run ”cpan CPAN”.  To be able to validate downloads, run as root “cpan Digest::SHA Module::Signature”.

Qu: I am not root, how can I install a module in a personal directory?  Ans: You need to use your own configuration, not the one for the root user.  CPAN’s configuration script uses the system config (set by root) for all defaults, saving your choices to your ~/.cpan/CPAN/MyConfig.pm file.  (Show.)

You can also manually initiate this process with the following command: perl -MCPAN -e 'mkmyconfig' or by running “mkmyconfig” from the CPAN shell, or even using “o conf init”.  You will need to configure the makepl_arg setting to install stuff in your home dir, something like this:

o conf makepl_arg "PREFIX=~/perl"

or:
o conf makepl_arg "LIB=~/perl/lib \
INSTALLMAN1DIR=~/man/man1 \
INSTALLMAN3DIR=~/man/man3"

(Don’t forget to create these directories.)  If you change individual settings with o conf, you make those settings permanent (like all “o conf” settings) with “o conf commit”.  You will also have to add ~/man to the MANPATH environment variable and tell your Perl programs to look into ~/perl/lib, by including the following at the top of your Perl scripts:

     use lib "$ENV{HOME}/perl/lib";

or by setting the PERL5LIB environment variable.

Examples

Search and replaces strings in many files:

perl -pi -e 's/text1/text2/g;' *.ext

or:

perl -pi.bak -e 's/text1/text2/g;' *.ext

find / search_criteria | xargs \
  perl -pi -e 's/text1/text2/g'

Show fix-style.pl.

Show Perl/Tk hellotk.pl  (demo via Knoppix, or install Windows Perl & Tk)

Show urldecode.

Show url2html.

Sending email with Perl:

cpan Net::Cmd; cpan Net::Config; cpan Net::SMTP

Then show mail.pl.