2012/04/13

Using a DVCS for your code and documents: Part 1

This document is the first in a series that outlines the very basics of using a Distributed Revision Control System (DVCS) to manage and store changes and updates to documents you write. Used primarily for software code, I've come to use it for my blog posts, poetry, documentation etc. If you've never heard of revision control systems before, you might want to do a quick search online for what they do, and what they are for.

Back in the day, I used software such as CVS and SVN to house my code changes. This was tedious though. I had to set up a server, keep it running 24x7, ensure it was properly backed-up and maintained. I worked at an ISP, so these things weren't a problem for me. However, for others, having a dedicated server doesn't make a lot of sense.

Recently, I decided to give Bitbucket a try. It, like GitHub, provides free hosting of your document repositories. The most interesting and useful feature of DVCS is that they are indeed distributed. If my CVS or SVN server went down, that was it... work stopped. With DVCS, I can clone my repository, and if the remote server goes down, I can use my current clone just as if it was the original repository. I can clone it again, or even use it to store new changes.

Go on over to Bitbucket and set yourself up an account. Once you're done that, navigate to where you can create a new repository. I'm going to name mine "Test" for this example. I am going to make it public, so that you can see my repository at the end of this post. I don't need a bug tracker, so I'm leaving that option unchecked, as well as the Wiki option. Although I've written patches against Git repositories, I like Mercurial, so I'm going to use that. (The command "hg" is for Mecurial, so you'll see it often in my examples.)

Once you've created the repository (hereinafter: repo), click the link that states "I'm starting from scratch".

Create a repo directory on your computer, and change into it:

$ cd ~
$ mkdir repos
$ cd repos

Now copy the 'clone' line that you see on the Bitbucket page, and paste it on your command line:

steve@ub:~/repos$ hg clone https://bitbucket.org/spek/test

Output:

destination directory: test
no changes found
updating to branch default
0 files updated, 0 files merged, 0 files removed, 0 files unresolved

We've cloned our new, empty repository. It created a new sub-directory called "test". Change into this new directory:

steve@ub:~/repos$ cd test

steve@ub:~/repos/test$ ls -la
total 12
drwxrwxr-x 3 steve steve 4096 2012-04-13 17:59 .
drwxrwxr-x 3 steve steve 4096 2012-04-13 17:59 ..
drwxrwxr-x 3 steve steve 4096 2012-04-13 17:59 .hg

The '.hg' directory is where the important information is stored.

Let's get right into using your repository. I will go through the basic commands as we encounter them.

Start by creating a new file, and adding some text to it. I use vim, but you can of course use any editor of your choosing.

steve@ub:~/repos/test$ vim test.pl

Save the file. Here's what my new file looks like:

#!/usr/bin/perl

use warnings;
use strict;

print "Hello, world!\n";

Ok, we have a new file with some text in it. Let's check the status of the file in relation to our repository. The command 'hg' is for Mercurial:

steve@ub:~/repos/test$ hg status
? test.pl

The "?" before the filename means that this file is unknown to the repository. There will be many cases where you won't want to add certain files to a repository, but we'll deal with that in a later post. For now, we want to add this file:

steve@ub:~/repos/test$ hg add test.pl 

Note that you can also call "hg add" with no filenames. This will include ALL files (recursively). Now let's re-check the status of our repository:

steve@ub:~/repos/test$ hg status
A test.pl

The "A" before the filename means that you have added a new file. It has not been committed to the repository yet. Let's do this now:

steve@ub:~/repos/test$ hg commit -m "-initial import"

Output:

abort: no username supplied (see "hg help config")

Whoops! What happened? Well, Mercurial (hg) needs to know authentication information before you send up changes back to your master repository. We'll discuss how to do this momentarily. First, lets focus on the "commit" command to hg. The "-m" flag tells hg that you want to add an inline message for this change. If you omit the -m and the following message, you will be dropped into your default editor to write one out there. You can cancel a commit simply by exiting your editor without saving. Now, back to adding auth information. While in your repository, create a new file named "hgrc", and add your information. Mine looks like this:

steve@ub:~/repos/test$ cat hgrc 
[paths]
default = https://spek@bitbucket.org/spek/test
[ui]
username = steveb <steveb@cpan.org>

The "default" directive under the [paths] category is the link to your repository on Bitbucket. Under the [ui] section, the "username" is the email address/account you signed up to Bitbucket with. I don't add anything further... I prefer to just type my password out manually when I need to. Once your 'hgrc' file is created, move it into the .hg directory:

steve@ub:~/repos/test$ mv hgrc .hg/

Now rerun your commit:

steve@ub:~/repos/test$ hg commit -m "-initial import"

Went off without a hitch. Committing saves your changes in a changeset in your local working copy. To push them to the master (in this case, Bitbucket), we use "push". Let's upload the local commits now:

steve@ub:~/repos/test$ hg push

Output:

pushing to https://spek@bitbucket.org/spek/test
searching for changes
http authorization required
realm: Bitbucket.org HTTP
user: spek
password: 
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
remote: bb/acl: spek is allowed. accepted payload.

Done. We added a file with "hg add", committed the changes via "hg commit", and uploaded the single changeset with "hg push". There's a problem though. My program was supposed to say hello to the universe, not just the world! Edit the test.pl file to print "Hello, universe!\n"; instead of "Hello, world!\n";, and then save the changes.

Now commit this update ("hg commit"), this time without the '-m' flag so it opens your editor. Add the following line in the commit message, and then save:

- replaced world with universe in print statement

Oh, man! I wanted to insert a comment saying what the print line is doing, but I forgot. Edit test.pl so it looks like this:

#!/usr/bin/perl

use warnings;
use strict;

# say "hi" to the universe
print "Hello, universe!\n";

Let's check the status again:

steve@ub:~/repos/test$ hg status
M test.pl

The 'M' prior to the filename signifies that we have a Modified file that hasn't been committed yet. Do that now:

steve@ub:~/repos/test$ hg commit -m "- added comment for print universe"

We committed two changes (which created two changesets), but these changes are local only. Let's push them up to our master repository:

steve@ub:~/repos/test$ hg push
pushing to https://spek@bitbucket.org/spek/test
searching for changes
http authorization required
realm: Bitbucket.org HTTP
user: spek
password: 
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 2 changesets with 2 changes to 1 files
remote: bb/acl: spek is allowed. accepted payload.

Notice this time the output found two changesets. This is because we committed two changes prior to pushing the first one. A rule of thumb is "commit early, commit often". I follow the same rule with push.

So, we have our program created, and it runs great. We have made changes, and saved these changes. Let's see the basics on viewing the changes we've made. "hg log" shows you a list with a brief set of details for all the changesets you've committed. They appear in reverse chronological order. The hexidecimal string next to the "N:" in the "changeset:" line represents the specific changeset. This is much more complicated than how I'm describing it, so we'll focus on these details in a later post.

steve@ub:~/repos/test$ hg log

changeset:   2:52eec25c7d12
tag:         tip
user:        steveb@cpan.org
date:        Fri Apr 13 19:30:58 2012 -0400
summary:     - added comment for print universe

changeset:   1:b558f5695e13
user:        steveb@cpan.org
date:        Fri Apr 13 19:29:09 2012 -0400
summary:     - replaced world with universe in print statement

changeset:   0:739f47eadd48
user:        steveb@cpan.org
date:        Fri Apr 13 19:14:08 2012 -0400
summary:     -initial import

The log is great for history, but what if we need to see more information... such as the list of files changed, and all the lines in the commit message as opposed to just the first? Adding the "-v" flag to 'hg log' will show you the files changed, as well as all the lines you added to your commit message. Here's an example from one of my real repositories:

steve@ub:~/devel/repos/devel-trace-method$ hg log -v | more

changeset:   13:2ca86cf74c83
user:        steveb 
date:        Sat Mar 03 10:54:36 2012 -0500
files:       Changes Makefile.PL README lib/Devel/Trace/Method.pm
description:
- 0.08 POD cleanup, Makefile.PL fix
- added meta section to Makefile.PL allowing us to tell CPAN
  that we use a different tracker than rt.cpan.org
- cleaned up POD so that LICENSE would appear correctly on
  CPAN

Within each commit, we can now see what files we changed, and the list of comments we made per changeset. What if we need to see the actual changes themselves? No problem... add the "-p" (patch) flag to 'hg log':

steve@ub:~/repos/test$ hg log -p

...wait! That lists ALL of our changesets (commits). That's too much information for what we want. I want to know about the last commit only right now. In Mercurial, we are currently working in the "tip" branch. Other *revision control systems may refer to this as HEAD. Let's check out the actual changes like we tried above, but only the most recent change. Again, the '-p' flag means "patch". The '-r' flag means "revision". We want to see the actual physical changes (-p) to the most recent revision (-r):

steve@ub:~/repos/test$ hg log -p -r tip

changeset:   2:52eec25c7d12
tag:         tip
user:        steveb@cpan.org
date:        Fri Apr 13 19:30:58 2012 -0400
summary:     - added comment for print universe

diff -r b558f5695e13 -r 52eec25c7d12 test.pl
--- a/test.pl Fri Apr 13 19:29:09 2012 -0400
+++ b/test.pl Fri Apr 13 19:30:58 2012 -0400
@@ -3,4 +3,5 @@
 use warnings;
 use strict;
 
+# # say "hi" to the universe
 print "Hello, universe!\n";

What if I want to see the more verbose changeset information (all files and all comments). Can I? Youbetcha!:

steve@ub:~/repos/test$ hg log -p -v

Here are a couple "howtos" regarding the "hg log" command. You can add the verbose (-v) and patch (-p) flags to either of these:

# view the first commit
steve@ub:~/repos/test$ hg log -r 0

# review the 1st and 3rd commit
steve@ub:~/repos/test$ hg log -r 0 -r 2

# review the most recent commit
steve@ub@:~/repos/test$ hg log -r tip

This tutorial series is primarily designed to describe the command-line usage of a DVCS application. The web-based display of the storage facility is outside the scope of this document, but it can be very handy. Here is what my online repo looks like after completing the examples in this post.

I'll end this post here. You've learnt the very basics on how to clone a Mercurial repository from your free Bitbucket account, how to commit changes into changesets, how to push the changesets back into the master repository, and how to do some basic review of the changes that you've made. In the next episode, we'll delve into how you can perform more advanced reviews of your changes, revert your working directory to a previous change, creating branches to manage different change tracks and an explanation and examples of how DVCS differs from non-distributed versioning systems. We'll also touch on the ".hgignore" file, which allows you to use "hg add" without adding files you don't want included.

Thanks for reading. If you've read any of my other posts, you know I appreciate all feedback, good and/or bad in either the comments below, or privately via email.

2012/04/09

use Perl; Poetry: Method to my $madness

Back in the day,
I was unable to say(),
I had to print() everything to find the err of my ways,
but things changed,
and now I can return(),
to the main() program of life,
where nearly everything burns,
I'm not rapping this on a mic,
I'm just writing to be free,
before I have to split(),
and perform other tasks that bore me.

So I grab my map(),
and put two and two together,
hashing into keys the locations,
that I can remember,
but I've %seen these places,
now what do I do?,
I pick a rand()om place that none of us have ever been to,
but the rules are strict,
I keep getting warnings,
something about being literal when I'm trying to be corny,
but I'm out of the ord()inary,
I'm a chr()acter in reverse(),
I know what Larry wanted,
but my linguistics are the worst,
this is freeverse,
I DESTROY()ed the English language,
I just hope I get eval()uated to true,
before I die() and then get laughed at,
but it isn't like that,
all I seek() is some closure,
I'm not an array of elements trying to get sort()ed into order,
I'm random,
I try to please for() each(),
it's like a disease,
I've thrown away all the keys(),
I write in these confinements,
it's a very strict type of squeeze.

The warnings teach me lessons,
correct my spelling and my messes,
but if Larry's crew would finish Perl6,
to my $madness I'd have methods.

- stevieb 20120409

2012/04/08

use Perl; Guide to references: Part 5

This is the final part in my five part guide to Perl references. It's a complete program that contains a menu system along with the card game 'war'. This is pretty serious spaghetti code, so I will likely replace it as soon as I come up with something that uses all of the examples in this series but has a more logical flow and is easier on the eyes :)

  • Part 1 - The basics
  • Part 2 - References as subroutine parameters
  • Part 3 - Nested data structures
  • Part 4 - Code references
  • Part 5 - Concepts put to use (this document)

Please leave any corrections, criticisms, improvements, additions, questions and requests for further clarity in the comments section below, or in an email.

The following program code can be copy/pasted without all of the comments from my scripts repository.

#!/usr/bin/perl

use warnings;
use strict;
use 5.10.0;

# create a master dispatch table, using a reference to an
# external sub, and two inline subs

my %dispatch_table = (
                        play    => \&play_game,
                        hello   => sub { say "\nHello, world!\n"; },
                        'exit'  => sub { say "\nGoodbye!\n"; exit; },
                    );

# create an href to the dispatch table hash

my $dt_ref = \%dispatch_table;

# take a reference to the closure within the games_played() sub

my $games_played = games_played();

# loop over the menu until the user exits

while ( 1 ){

    system( "clear" );

    # get the dispatch table options by dereferencing the
    # dispatch table href

    my @options = keys %{ $dt_ref };

    say "Enter one of these options: " . join( ' ', @options );
    chomp ( my $command = <STDIN> );

    # exit if an illegal option was entered by the user

    exit if ! exists $dt_ref->{ $command };
    
    # otherwise, execute the sub the user selected

    $dt_ref->{ $command }->();

    # check to see if any games have been played through
    # the $games_played closure cref

    if ( $games_played->() ){
        say "You've played " . $games_played->() . " games.\n";
    }

    print "Please press ENTER...";
    <STDIN>;
}
sub play_game {

    # this is the main game sub, called through the dispatch
    # table

    system( "clear" );

    # create a deck of cards using a hash, and assign
    # a numeric value to the face value key

    my %deck;
    my $card_value = 14;

    for ( qw( A K Q J ), ( reverse 2..10 ) ){

        $deck{ $_ } = $card_value;
        $card_value--;
    }

    # a list of the card faces (without their numeric values)
 
    my @cards = keys %deck;

    print "Enter your name: ";
    chomp ( my $player = <STDIN> );

    print "Enter number of rounds (default: 5): ";
    chomp ( my $rounds = <STDIN> );
    $rounds = 5 if $rounds !~ /\d+/;

    # create a nested HoH for the players, using an href as
    # the top level

    my $players = { 
                    $player => {
                                score    => 0,
                                card     => undef,
                            },
                    npc      => {
                                score    => 0, 
                                card     => undef,
                            },
                    };

    my @player_names = keys %{ $players };

    for my $round ( 1 .. $rounds ){

        print "Round $round: ";

        for my $player ( @player_names ){

            # call deal(), passing in an aref of the cards array

            my $card = deal( \@cards );
            print "$player $card   ";

            # set the players current card in their card slot in the
            # players HoH

            $players->{ $player }{ card } = $card;
        }

        # call the compare_hands() sub by passing in an anonymous
        # hash (reference) inline in the call, with three parameters.
        # All three values are references

        compare_hands({ 
                        player_names => \@player_names,
                        players      => $players,
                        deck         => \%deck,
                     });

        print "\n";
    }

    print "\n";

    # loop over players, and get each of their final
    # scores out of the players HOH

    for my $player ( @player_names ){

        my $score = $players->{ $player }{ score };
        say "$player won $score rounds.";
    }

    print "\n";

    # update games played

    $games_played->( 1 );
}
sub deal {

    # take an aref of @cards, and return a random one

    my $deck_of_cards = shift; # aref
    return $deck_of_cards->[ rand @{ $deck_of_cards } ];
}
sub compare_hands {

    my $named_params = shift;
    
    # separate out the data from the named parameters
    # in the href we got passed in

    my $player_names    = $named_params->{ player_names };
    my $players         = $named_params->{ players };
    
    # we convert the last named param back into a hash
    # by dereferencing it

    my %deck            = %{ $named_params->{ deck } };

    my ( $player1, $player2 ) = @{ $player_names };

    # get each player's card

    my $p1_card = $players->{ $player1 }{ card };
    my $p2_card = $players->{ $player2 }{ card };

    # check the face of the card to the %deck hash to
    # retrieve the numerical value

    my $p1_card_val = $deck{ $p1_card };
    my $p2_card_val = $deck{ $p2_card };

    # nobody wins this round... its a tie

    return if $p1_card_val == $p2_card_val;

    if ( $p1_card_val > $p2_card_val ){
        # player 1 wins
        $players->{ $player1 }{ score }++;
    }
    else {
        # player 2 wins
        $players->{ $player2 }{ score }++;
    }
}    
sub games_played {
    
    # state data

    my $games_played = 0;

    # our games_played closure

    return sub {
                my $add = shift;
                $games_played += $add if $add;
                return $games_played;
               }
}

Thank you very much for reading. I have received a lot of great feedback on the series, both from people informing me they have learnt a great deal, and others with corrections and additions. I appreciate you all. I hope you have enjoyed my Guide to reference tutorials. Please feel free to provide me feedback so I may improve on my style for future posts.

Regards and thanks,

-stevieb

2012/04/07

use Perl; Guide to references: Part 4

This is part four in my five part series on Perl references. In this post, we will be discussing code references (coderef, or just cref), some of the benefits they provide, and some interesting use cases, including closures and dispatch tables. If you haven't already, you may want to review the other parts in the series:

  • Part 1 - The basics
  • Part 2 - References as subroutine parameters
  • Part 3 - Nested data structures
  • Part 4 - Code references (this document)
  • Part 5 - Concepts put to use

As with all of the other parts in the series, I request that you leave corrections, criticisms, improvements, additions, questions and requests for further clarity in the comments section below, or in an email.

CODE REFERENCES

A code reference in Perl is no different than any of the other references we've discussed in the previous episodes, but instead of pointing to a data variable, the ref points to a subroutine. You take a reference to a subroutine the same way you take a reference to anything else:

sub hello {
    say "Hello, world!";
}

my $cref = \&hello;

The & sigil represents a sub, and it is needed when we take the reference. As with taking the ref, using the ref is the same as before as well. We must use the -> deref operator to access the item the reference points to.

# use an aref
$aref->[ 0 ];

# use an href
$href->{ a };

# use a cref
$cref->();

We can also assign an anonymous sub to a cref in cases where we don't necessarily have to define the function with a name:

my $cref = sub { say "Hello, world!"; }

Now that we have that out of the way, lets move on to some practical and interesting uses for code references.

CLOSURES

The most common type of closure is a sub that returns a reference to an inner sub. They are often used in Object Oriented Programming (OOP) (which is outside the scope of this tutorial) to keep state data. State data is data that persists after the program has exited the scope in which the data was defined. I can explain it better with some code:

sub persist {

    my $count = 0;
    return sub { say $count++; }
}

my $count_cref = persist();

$count_cref->();
$count_cref->();
$count_cref->();

First, we define a subroutine named persist(). Inside that sub we define a lexical variable $count (a lexical variable is one that can not be seen outside the scope of the block it is declared in. In this case, nothing outside of persist() can see the $count variable). After defining $count, we create an anonymous sub that prints the result of $count, and then adds one to it. We then call persist(), assigning its return value to $count_cref. The return of persist is a reference to the anonymous subroutine.

Because $count_cref points to the inner anonymous sub returned from persist() and not to persist() itself, the $count variable is never reset, and the sub that $count_cref points to will always keep its own version of $count, incremented each time the anon sub is executed through the reference.

To show how the $count variable retains its value as long as $count_cref is alive, here is the output from the above code snip:

0
1
2

Closures aren't only handy for OOP. We can use the same persist() sub to create multiple counters.

sub persist {

    my $count = 0;
    return sub { $count++; }
}

my $count_a_cref = persist();
my $count_b_cref = persist();
my $count_c_cref = persist();

say "Count A: " . $count_a_cref->();
say "Count A: " . $count_a_cref->();
say "Count B: " . $count_b_cref->();
say "Count B: " . $count_b_cref->();
say "Count B: " . $count_b_cref->();
say "Count C: " . $count_c_cref->();

Output:

Count A: 0
Count A: 1
Count B: 0
Count B: 1
Count B: 2
Count C: 0

Calls to the individual cref do not affect the state variables of the other cref state variables.

Here's an example that shows a more practical case where closures with state variables could be useful. If you're thinking that globals would do the trick here, you're right; that isn't the point of this tutorial though ;). I'm sticking with simple here. In my fifth and final installment, we'll write something far more realistic that brings all aspects of the series together.

sub write_line {

    my $count = 0;
    return sub { return ++$count; }
}

# call the function twice, each time receiving
# a separate anonymous sub, along with separate
# state variables

my $steve_lines = write_line();
my $sarah_lines = write_line();

# steve writes two lines of code

my $steve_total;
$steve_total = $steve_lines->();
$steve_total = $steve_lines->();

# sarah writes one

my $sarah_total;
$sarah_total = $sarah_lines->();

say "Steve wrote $steve_total lines of code";
say "Sarah wrote $sarah_total lines of code";

Output:

Steve wrote 2 lines of code
Sarah wrote 1 lines of code

As an aside: In the preceeding example, I had to declare the sub prior to using it. To understand why and how to get around that, see my Purpose and practical use of Perl's named blocks post.

Closures aren't limited to being returned from outer subs though. Any function that can return an inner anonymous sub that can contain its own lexical data can be used to create a closure. Here's an example:

my %h;
for my $color ( qw(red green blue) ){
   $h{$color} = sub { say $color };
}

$h{ blue }->();
$h{ red }->();
$h{ green }->();

In that example, we iterate over three colours. For each colour, we set a hash key as the colour and set that key's value as an anonymous sub that when called, prints the colour. The following three lines execute the closures. Although we could have, we didn't use any lexical data to keep track of anything. Also note that this code auto generated a dispatch table, which we are going to learn about next.

DISPATCH TABLES

Dispatch tables are hashes who's key's values are references to subroutines. It is like a table of contents that allows you to execute code through the hash keys.

my %dt = (
            hello => sub { say "Hello, world!"; },
            add   => \&add,
        );

# call the functions

my $more = $dt{ add }->( 5, 5 );
$dt{ hello }->();

sub add {

    my $x = shift;
    my $y = shift;
    return ( $x + $y );
}

First we define the dispatch table hash. The first key has a value of an anonymous sub. The second key contains a cref that points to the add() sub. This shows how short, one-line type subs can be housed within the dispatch table. The add sub has been defined to take two parameters. When we call the add sub, you can see how we call the sub through the hash key, which executes the sub through the cref it contains as its value. We then insert the parameters as normal.

An example of the benefits of a dispatch table is a menu system, where a user must select from a range of options. You give the user a list of options to select from, and in your dispatch table, you name your keys as the options you provided the user with. Each option to the user is dropped directly into the key field of the hash, and the subsequent subroutine runs.

In the following example, we have three operations the user can perform where the hash value is a cref to an external sub. The fifth option, exit, is short and simple, so we create it as an anonymous sub within the table itself. A couple sanity checks to ensure the input is legal, and running the correct operation is as simple as putting the users input into the dispatch table.

Here is a fully working menu program based on the concept of a dispatch table. I've kept it as simple and as basic as possible for clarity.

#!/usr/bin/perl

use warnings;
use strict;
use 5.10.0;

my %dt = (
            add      => \&add,
            subtract => \&subtract,
            multiply => \&multiply,
            'exit'   => sub { say "\nGoodbye!\n"; exit; },
        );
    
while (1) {

    system( "clear" );
    
    print "Please enter either add, subtract, multiply or exit: ";
    chomp ( my $operation = <STDIN> );

    # exit if told to

    $dt{ $operation }->() if $operation eq 'exit';

    # exit if illegal param

    if ( ! exists $dt{ $operation } ){
        say "\nIllegal input... exiting\n";
        exit;
    }
    
    print "Type in your first number: ";
    chomp ( my $x = <STDIN> );

    print "Type in your second number: ";
    chomp ( my $y = <STDIN> );

    # run the command selected by the user

    my $result = $dt{ $operation }->( $x, $y );

    say "\nPerforming $operation on $x and $y = $result\n";

    print "\nPress ENTER to continue...\n";
    <STDIN>;
    
}
sub add {
    my ( $x, $y ) = @_;
    return $x + $y;
}
sub subtract {
    my ( $x, $y ) = @_;
    return $x - $y;
}
sub multiply {
    my ( $x, $y ) = @_;
    return $x * $y;
}

Hopefully that simplistic example was enough to at least give you an idea of what dispatch tables could be capable of.

That's it for this episode, thanks for reading. In my next and last post in the series, we'll bring everything together in a single program that utilizes most of the concepts of what we have learnt throughout.

Update: Thanks to maximum-solo for pointing out that I had limited my definition of closures to only a single use case, and for the example code that returns closures from a for loop.

Update: Thanks to Jay Scott for sending typographical and grammatical corrections, and for numerous logical code description fixes.

use Perl; Guide to references Part 3

This is Part 3 of my five part guide to references series. In Part 1 we learnt the basic syntax for using references, in Part 2 we saw how to use references in subroutine calls, and in this episode we'll focus solely on nested data structures.

  • Part 1 - The basics
  • Part 2 - References as subroutine parameters
  • Part 3 - Nested data structures (this document)
  • Part 4 - Code references
  • Part 5 - Concepts put to use

At this point, it is rather imperative that you have a firm grasp on both the concepts and the syntax for creating, dereferencing and otherwise using references. If you are unfamiliar with any of these, I recommend you see Part 1.

As with the other parts in the series, I request that you to leave corrections, criticisms, improvements, additions, questions and requests for further clarity in the comments section below, or in an email.

NESTED DATA STRUCTURES

The two most elementary complex data structures are an array of arrays (AoA) and a hash of hashes (HoH). An AoA is simply an array where each element contains a reference to another array. Here's an example based on some of the concepts we've already learnt:

my @a;
my @a_0 = ( 1, 2, 3 );
my @a_1 = ( 4, 5, 6 );
my @a_2 = ( 7, 8, 9 );

$a[0] = \@a_0;
$a[1] = \@a_1;
$a[2] = \@a_2;

Using Data::Dumper, we see the contents of @a as follows. (I've inserted the comments for clarity)

$VAR1 = [ # the top @a array
          [ # $a[0]
            1,
            2,
            3
          ],
          [ # $a[1]
            4,
            5,
            6
          ],
          [ # $a[2]
            7,
            8,
            9
          ]
        ];

AoAs are good for storing multiple lists of data where the items will always retain their order. To access individual elements of the nested arrays, we need the -> deref operator again:

my $x = $a[0]->[0]; # value is 1

Note the positioning. We access the first element of @a as normal, but since $a[0] is a reference to another array, we must dereference here. Again:

my $y = $a[2]->[2]; # value is 9

Still using the above AoA structure, here's how to loop over each aref within the array. Note in the nested for() loop we see the @{} dereference operators again to access the data that each aref points to:

my $x = 0;

for my $aref ( @a ){

    say "in top level of a, elem $x";
    $x++;

    my $y = 0;

    for my $aref_elem ( @{ $aref } ){

        say "in second level elem $y, elem is: $aref_elem";
        $y++;
    }
}

Output:

in top level of a, elem 0
in second level elem 0, elem is: 1
in second level elem 1, elem is: 2
in second level elem 2, elem is: 3
in top level of a, elem 1
in second level elem 0, elem is: 4
in second level elem 1, elem is: 5
in second level elem 2, elem is: 6
in top level of a, elem 2
in second level elem 0, elem is: 7
in second level elem 1, elem is: 8
in second level elem 2, elem is: 9

You can compare that output to the loop itself, and also to the Data::Dumper output above to get a better idea of the nested structure.

More interesting and (imho) far more useful than the AoA is the HoH. Here's where significant usefulness begins.

my %person; # top level hash container

my %clothes  = ( shirt => 'red', pants => 'black', );
my %schedule = ( work => '0800', home => '0500', sleep => '2300', );
my %skills   = ( programming => 'poor', social => 'good' );

$person{ clothes  } = \%clothes;
$person{ schedule } = \%schedule;
$person{ skills }   = \%skills;

The Dumper output for a HoH looks much more interesting and easy to follow than the AoA:

$VAR1 = { # %person

          'skills' => {
                        'programming' => 'poor',
                        'social' => 'good'
                      },
          'clothes' => {
                         'pants' => 'black',
                         'shirt' => 'red'
                       },
          'schedule' => {
                          'work' => '0800',
                          'home' => '0500',
                          'sleep' => '2300'
                        }
        };

Here are a few examples of how to use the data:

# get the person's shirt

my $shirt_colour = $person{ clothes }->{ shirt }; # red

# change the person's shirt

$person{ clothes }->{ shirt } = 'black';

# list the persons skills

say "Person has the following skills: ";

for my $skill ( keys %{ $person{ skills } } ){
    print "$skill ";
}
print "\n";

# list each skill with the ability to perform the skill

say "Person's ";

while ( my ( $skill, $ability ) = each %{ $person{ skills } } ){

    print "$skill is $ability\n";
}

When dealing with a simple HoH, the deref operator (->) is not required. Due to the fact that Perl knows that a hash can never directly contain another hash, it is not ambiguous to type $person{ clothes }{ shirt }; Perl can identify that the nested key is a reference to another hash. Where the -> is required, is when the top level of the structure is a reference itself:

# create hrefs to anonymous hash

my $inner_1 = { a => 1, b => 2 };
my $inner_2 = { z => 26, y => 25 };

# add hrefs to hash

my %h = ( ref_1 => $inner_1, ref_2 => $inner_2 );

# take a ref to the %h hash

my $href = \%h;

# because $href is now a reference itself, we MUST use the dereference operator

say $href->{ ref_1 }{ z }; # prints 26

What if you wanted to keep track of all the classes in a school, and for each class, keep a list of all the student names? A HoH isn't needed, because all we want are the student names. The student names don't need a value. In this case, we would use a hash of arrays, or HoA:

# define the classrooms

my @room_1 = qw( steve mike dawn megan );
my @room_2 = qw( chris alexa melissa dave );
my @room_3 = qw( brittany hakim francois );

# declare the school. we'll declare it as a scalar
# because we're going to use an anonymous hash

my $school; # will become an href

# add the classrooms to the school

$school->{ room1 } = \@room_1;
$school->{ room2 } = \@room_2;
$school->{ room3 } = \@room_3;

# who's in room 2?

for my $student ( @{ $school->{ room2 } } ){
    say $student;
}

# output:
chris
alexa
melissa
dave

Notice the use of the array deref operator @{} in the for line. Things are starting to look a little more complex. Because $school->{ room2 } contains a reference to an array, we must dereference the entire thing. That example of dereferencing an array within a hash is where I see the most difficulty for programmers who are just starting to grasp refs. It is the mis-understanding of what is actually happening here that leads programmers to make syntax errors that generate output such as the following:

Not dereferencing the array ref prior to printing it:

ARRAY(0x8fba97c) 

Not using -> to dereference the $school reference to access the anonymous hash it points to. When an error like the following appears, it is a loud warning that you forgot to dereference the scalar $school, and that there is no %school counterpart... indeed, $school points to an unnamed (anonymous) hash:

Global symbol "%school" requires explicit package name at ./hoa.pl line 29.
Execution of ./hoa.pl aborted due to compilation errors.

Forgetting to dereference the array ref prior to pushing a new value onto it

Type of arg 1 to push must be array (not hash element) at ./hoa.pl line 31, near "'jeremy';"

Let's go back to school. Class three just got a new student. Let's add him to the roster.

# with push

push @{ $school->{ room3 } }, 'jeremy'; 

# or directly to the element, if we already know its position

$school->{ room3 }[3] = 'jeremy';

Let's print out all the classes.

# get the keys by dereferencing $school

for my $room_name ( keys %{ $school } ){
    
    say "Students in $room_name: ";
    print "    ";

    # get each student name from each class by
    # dereferencing each class aref

    for my $student ( @{ $school->{ $room_name } } ){
        print "$student ";
    }
    print "\n";
}

Output:

Students in room3: 
    brittany hakim francois jeremy 
Students in room1: 
    steve mike dawn megan 
Students in room2: 
    chris alexa melissa dave 

Notice that the names from the room arrays are still in original order, but the classrooms are not. Arrays keep their elements in the order in which you assign them, hashes act in a random fashion. To ensure the rooms are listed in order in this case, we simply add sort() to the for() line:

for my $room_name ( sort keys %{ $school } ){

A side note on dereferencing nested structures. The following are equivalent:

my $x = $href->{ aref }->[0];
my $x = $href->{ aref }[0];

In other words, you only need to use the -> deref operator for the first reference encountered. Perl implicitly dereferences everything thereafter without the explicit ->. This is because everything underneath the first data structure is always a reference, and Perl knows this.

There is no limit to the depths and complexity you can conceive with these nested data structures thanks to references. Almost all objects in Object Oriented Programming in Perl use storage mechanisms just like this.

Thanks for reading part three of my series. In part four, we'll focus on subroutine references (coderef) and dispatch tables. Then we'll build a menu system using all of the concepts we've learnt that you can incorporate into your own programs. Once again, please leave feedback in comments, or send me an email.

use Perl; Guide to references Part 2

This is part two in my five part series on Perl references. In Part 1, we went through the basics; how to take references to items and access the items through their references. In this episode, we'll explain some of the differences and benefits of sending references into subroutines, as opposed to the list-type data variables themselves. It s divided up into three sub-sections: references as subroutine parameters, named parameters and anonymous data.

  • Part 1 - The basics
  • Part 2 - References as subroutine parameters (this document)
  • Part 3 - Nested data structures
  • Part 4 - Code references
  • Part 5 - Concepts put to use

This episode assumes that you have at least a minimal understanding of how subroutines (functions) work in Perl; both how to send data into a function, and the standard methods of accessing the data once the function has accepted it. As before, I urge you to leave corrections, criticisms, improvements, questions and requests for further clarity in the comments section below, or in an email.

From this point forward, I will often substitute certain terms with abbreviations: ref for reference, deref for dereference, aref for array reference, href for hash reference and sub or function for subroutine.

REFERENCES AS SUBROUTINE PARAMETERS

Let's start off this section with a sample piece of code:

my @a = ( 1, 2, 3 );
my %h = ( a => 10, b => 20, c => 30 );

hello( @a, %h );

sub hello {

    my @array = shift;
    my %hash  = shift;

    # do stuff
}

As it appears, you are calling the hello() function with two parameters; an array as parameter one, and a hash as parameter two. We then proceed to take the parameters and assign them accordingly. However, in Perl, this does not work as you may think. Perl doesn't keep the parameters as separate parts. Instead, it flattens all the parameters together into a single list. In the case above, if we printed the parameter list before we took anything from it, it would appear as one long list of individual items:

1 2 3 c 30 a 10 b 20 

So in the above code, @array would contain 1, while we would have forced 2 into %hash. The rest of the flattened parameters (that are essentially one long list of scalar values) remain unused.

Because refs are simple individual scalars that only point to a data structure, we can pass the ref in as opposed to the list of the data structure's contents.

my @a = ( 1, 2, 3 );
my %h = ( a => 10, b => 20, c => 30 );

my $aref = \@a;
my $href = \%h;

hello( $aref, $href );

sub hello {

    my $aref_param = shift;
    my $href_param = shift;
}

In the first example, we thought we were passing in two parameters, but perl took the values from our parameters and merged them into one long list. By passing refs, our sub receives only two parameters as intended, and we can easily differentiate our array data and our hash data. This is termed "passing by reference", and it is the most common method to pass parameters to a function when the function needs more than just a few scalar values. We can now work on the refs within the sub the same way we were doing in Part 1.

When passing by reference, any changes made to the data the ref points to will be permanently changed, even after the subroutine returns. Passing data into a sub directly (not via a ref) makes an internal *copy* of the data, and when the sub returns, the original data is not modified. If it is necessary to keep your original data intact, you can make a copy of the data by dereferencing it within the function, and returning either the copy, or a reference to the copy:

my @a = ( 1, 2, 3 );

my $aref = \@a;

my @b = hello( $aref );

say "Original array:";
for my $x ( @a ){
    print "$x ";
}

say "\nReturned copy:";
for my $y ( @b ){
    print "$y ";
}

sub hello {

    my $aref = shift;
    
    # make a copy of the referenced array
    my @array = @{ $aref };

    $array[ 0 ] = 99;

    return @array;
}

Output:

Original array:
1 2 3 
Returned copy:
99 2 3

Although we've now modified our code so that we can take data structures as a parameter via their refs, we're still using "positional" function arguments, meaning that the parameters must be sent into the function in a specified order. Here's a brief code snippet of a similar example:

sub goodbye {
    my $mandatory_param_aref = shift;
    my $optional_param_aref  = shift;
}

# call it like this

goodbye( $aref1, $aref2 );

Now, what happens if we want to modify the code to accept a second optional argument?

sub goodbye {
    my $mandatory_param_aref = shift;
    my $optional_param_aref  = shift;
    my $second_optional_aref = shift;
}

# call it like this

goodbye( $aref1, $aref2, $aref3 );

No problem. However, what happens if you don't want to use the first optional parameter? You can't just do this:

goodbye( $aref1, $aref3 );

Because the function would take $aref3 and shift it off as the first optional parameter causing potentially all kinds of grief. You could send in undef in the optional positions that you don't want to supply data for so that the second optional parameter is assigned appropriately to the correct variable within the function:

goodbye( $aref1, undef, $aref2 );

But how about in a case with five optional parameters where you only want to supply the third and fifth?

goodbye( $param1, undef, undef, $param4, undef, $param6 );

Not only is that unsightly, but it is potentially very unstable code. You can see that it wouldn't be hard to position those incorrectly. There is a solution though.

NAMED PARAMETERS USING HASH REFERENCES

my %data = (
            user => 'stevieb',
            year => 2012,            
        );

my $data_ref = \%data;

user_of_the_year( $data_ref );

sub user_of_the_year {
    my $p = shift;

    my $user = $p->{ user };
    my $year = $p->{ year };

    say "Our luser of $year is $user";
}

We created a hash with the data we want to send in to our function, then we take a reference to that hash. The hash reference is what we send into the function. Inside the function, we shift off the only parameter we received (the href), and proceed to extract the values and assign them to lexical variables through the ref using the deref operator ->.

A few things to note here. First, the positional problem is gone. The function will only ever accept a single parameter; the href. Also, if the function has optional parameters, there's no undef trickery to reposition the remaining parameters. Simply omit the named key in the hash.

In the above function definition, it isn't mandatory to dereference the hash and extract its values to scalars right away. The last line could just as easily have been written like this:

say "Our luser of $p->{ year } is $p->{ user }";

However, I personally opt to extract immediately, therefore I can very quickly see what the function expects the data to look like without having to wade through the function code. Extracting in one place also makes it very easy to visually verify that your POD function use statements are accurate.

ANONYMOUS DATA

Often it is the case that you need to make a data structure on the fly, but don't need to assign a temporary name to it. We can skip steps by using references.

Instead of this two step process:

my %h = ( a => 1, b => 2 );
my $href = \%h;

We can take a reference directly from an unnamed (anonymous) hash:

my $href = { a => 1, b => 2 };

So, to create an href to an anonymous hash, we surround the data within braces instead of parens. Note that the braces are also used to distinguish hash keys. Arrays are similar, but they use their element brackets instead:

my $aref = [ 1, 2, 3 ];

In the function example above, I created the hash, took a ref to the hash, and passed the ref into the function as a parameter. Using anonymous data, I can skip creating the hash and taking a ref to it by inserting the ref to the anonymous data right within the function call:

user_of_the_year( { user => 'stevieb', year => 2012 } );

Or for more complex function calls with named parameters, you can put it on multiple lines:

sub user_of_the_year ({
                        name    => 'stevieb',
                        year    => 2012,
                        score   => 199,
                        awards  => 3,
                    });

Thank you for reading. Again, if you have any improvements or questions, leave me comments or send me an email.

2012/04/06

use Perl; Guide to references: Part 1

Understanding references and their subtleties in Perl is one of the more difficult concepts to fully wrap one's head around. However, once they are fully understood by the blossoming developer, they find a whole new level of capability and power to exploit and explore.

I often see newer programmers struggle with the concept of references on the Perl help sites I frequent. Some still have a ways to go, but many are at the stage where perhaps one more tutorial may push them over the edge and give them that 'Ahhhh' moment of clarity. My moment of clarity came when I read Randal Schwartz's "Learning Perl Objects, References & Modules" book for the something like the 8th time. Although once the concept of references is understood, the syntax and use cases can still be confusing for quite some time, especially in Perl, because There Is More Than One Way To Do It.

This tutorial is the first in a five part series. This part will focus on the basics, preparing you for more complex uses in the following four parts. I've created a cheat sheet that summarizes what you'll learn in this document.

  • Part 1 - The basics (this document)
  • Part 2 - References as subroutine parameters
  • Part 3 - Nested data structures
  • Part 4 - Code references
  • Part 5 - Concepts put to use

I will stick with a single consistent syntax throughout the series and will refrain from using one-line shortcuts and other simplification techniques in loops and other structures in hopes to keep any confusion to a minimum. Part one assumes that you have a very good understanding of the Perl variable types, when they are needed, and how they are used. Some exposure to references may also prove helpful, but shouldn't be required.

If you find anything in this document that you feel could use improvement, or if you have any questions or you feel the document needs further clarity, please feel free to provide any and all feedback via the comments section below, or send me an email.

THE BASICS

References in Perl are nothing more than a scalar variable that instead of containing a usable value, they 'point' to a different variable. When you perform an action on a reference, you are actually performing the action on the variable that the reference points to. A Perl reference is similar to a shortcut to a file or program on your computer. When you double click the shortcut, the shortcut doesn't open, it's the file that the shortcut points to that does.

We'll start with arrays, and I'll get right into the code.

We'll define an array as normal, and then print out its contents.

my @array = ( 1, 2, 3 );

for my $elem ( @array ){
    say $elem;
}

Prepending the array with a backslash is how we take a reference to the array and assign the reference to a scalar. The scalar $aref now is a reference that points to @array.

my $aref = \@array;

At this point, if you tried to print out the contents of $aref, you would get the location of the array being pointed to. You know you have a reference if you ever try to print a scalar and you get output like the following:

ARRAY(0x9bfa8c8)

Before we can use the array the reference points to, we must dereference the reference. To gain access to the array and use it as normal, we use the array dereference operator @{}. Put the array reference inside of the dereference braces and we can use the reference just as if it was the array itself:

for my $elem ( @{ $aref } ){
    say $elem;
}

The standard way of assigning an individual array element to a scalar:

my $x  = $array[0];

To access individual elements of the array through the reference, we use a different dereference operator:

my $y = $aref->[1];

Assign a string to the second element of the array in traditional fashion:

$array[1]  = "assigning to array element 2";

To do the same thing through an array reference, we dereference it the same way we did when we were taking an element from the array through the reference:

$aref->[1] = "assigning to array element 2";

You just learnt how take a reference to an array (by prepending the array with a backslash), how to dereference the entire array reference by inserting the reference within the dereference block @{}, and how to dereference individual elements of the array through the reference with the -> dereference operator. That is all there is to it. Hashes are extremely similar. Let's look at them now.

Create and initialize a normal hash, and iterate over its contents:

my %hash = ( a => 1, b => 2, c => 3 );

while ( my ( $key, $value ) = each %hash ){

    say "key: $key, value: $value";
}

Take a reference to the hash, and assign it to a scalar variable:

my $href = \%hash;

Now we'll iterate over the hash through the reference. To access the hash, we must dereference it just like we did the array reference above. The dereference operator for a hash reference is %{}. Again, just wrap the reference within its dereferencing block:

while ( my ( $key, $value ) = each %{ $href } ){

    say "key: $key, value: $value";
}

Access an individual hash value:

my $x = $hash{ a };

Access an individual hash value through the reference. The dereference operator for accessing individual elements of a hash through a reference is the same one we used for an array (->).

my $y = $href->{ a };

Assign a value to hash key 'a':

$hash{ a }  = "assigning to hash key a";

Assign a value to hash key 'a' through the reference:

$href->{ a } = "assigning to hash key a";

That's essentially the basics of taking a reference to something, and then dereferencing the reference to access the data it points to.

When we operate on a reference, we are essentially operating on the item being pointed to directly. Here is an example that shows, in action, how operating directly on the item has the same effect as operating on the item through the reference.

my @b = ( 1, 2, 3 );
my $aref = \@b;

# assign a new value to $b[0] through the reference

$aref->[0] = 99;

# print the array

for my $elem ( @b ){
    say $elem;
}

Output:

99
2
3

As you can see, the following two lines are equivalent:

$b[0] = 99;
$aref->[0] = 99;

CHEAT SHEET

Here's a little cheat sheet for review before we move on to the next part in the series.

my @a = ( 1, 2, 3 );
my %h = ( a => 1, b => 2, c => 3 );

# take a reference to the array
my $aref = \@a;

# take a reference to the hash
my $href = \%h;

# access the entire array through its reference
my $elem_count = scalar @{ $aref };

# access the entire hash through its reference
my $keys_count = keys %{ $href };

# get a single element through the array reference
my $element = $a->[0];

# get a single value through the hash reference
my $value = $h->{ a };

# assign to a single array element through its reference
$a->[0] = 1;

# assign a value to a single hash key through its ref
$h->{ a } = 1;

This concludes Part 1 of our Guide to Perl references. My goal was not to compete with all the other reference guides available, but instead to complement them, with the hope that perhaps I may have said something in such a way that it helps further even one person's understanding. Next episode, we'll learn about using references as subroutine parameters.

Update: An astute reader sent me an email after noticing that this tutorial does not mention scalar references at all. This was a design choice. I didn't feel it necessary to justify the extra space to explain them, as they are very rarely used. They do exist though :) Thanks Asbjørn Thegler for the kind email!

2012/04/05

use Perl6; A few very welcome changes in Perl5++

In the 10 years I've been programming Perl off and on, I've heard a fair amount about Perl 6. There are those who love it, and those who dislike (fear?) it. For me, I had always wanted to look further into it but never found the time. Don't get me wrong, I absolutely love Perl 5, and will likely be using it until we see the day that it fades into the same level of obscurity that some of my code resembles.

Over the last couple of weeks, I've been constantly tempted to follow the Perl6 link in moritz's PerlMonks signature. Yesterday I broke down and decided to see what colour I wanted my bikeshed. Here are a few of the really interesting differences I've found so far.

In this post, I'll touch on strict, sigils, how variables are objects and have methods, types, and a bit on control structures. In a couple following posts, I'll describe the basics of other changes, and then get into more advanced aspects of the new language. When I'm comfortable enough and can change as much as possible from 5 to 6, in my last post on the subject, I'll include the code of one of my short Perl 5 modules translated into Perl6.

STRICT

Out of the box, the first really nice feature is that strict is enabled by default.

% cat no_strict.pl

#!/home/steve/perl6/perl6
say $hello;

Output:

% ./no_strict.pl

===SORRY!===
Variable $hello is not declared
at ./no_strict.pl:2

SIGILS

In Perl6, variables retain their sigils regardless of what operation you perform on them. To access an element of an array or the value of a hash in Perl 5, you had to use the scalar sigil to signify you intend to access it as such:

Perl 5 way:

my @a = qw( 1, 2, 3 );
say $a[0];

my %h = ( key => 'value' );
say $h{ key };

But in Perl6:

my @a = 1, 2, 3;
say @a[0];

my %h = 'key' => 'value';
say %h{ key };
Output:
===SORRY!===
CHECK FAILED:
Undefined routine '&key' called (line 7)

Oh, oh! What happened? The array portion of the code is fine, but we broke at the hash code. Well, in Perl6, hash keys are not automatically quoted like they are in Perl 5 when attempting to access the hash values. Instead of retrieving the value, it attempts to call the sub key(), looking for it to return the name of the key to be used. The proper way to access the hash values through a key is as such:

# the old faithful

say %h{ 'key' };

# or the new auto-quote syntax

say %h< key >;

VARIABLES ARE OBJECTS (and have methods)

Here are a few examples of the new variable object methods in action, and their corresponding perl 5 syntax (which still works in Perl6). I'll show a few examples of arrays first, then hashes. Also worth noting is the lack of parens around the array elements in the definition. Surrounding the elements in parens is still valid, but the qw() function is missing.

Variable methods, arrays

my @a = 2, 3, 1;

# number of array elements

say @a.elems;
say scalar @a;

# sort array

say @a.sort;
say sort @a;

# map array

say @a.map({ $_ + 10 });
say map { $_ + 10, ' ' } @a;

# or even

say @a.sort.map({ $_ + 10 });
say map { $_ + 10, ' ' } sort @a;

I found an interesting difference while building those code examples. In Perl 5:

perl -E 'my @a=qw( 1 2 3 ); my $x=@a; say $x'
3

...but in Perl6:

perl6 -e 'my @a=1,2,3; my $x=@a; say $x'
1 2 3

However, using the array in numeric comparisons evaluates the array as its number of elements:

perl6 -e 'my @a=1,2,3; say "ok" if @a == 3'
ok

Variable methods, hashes and their Perl 5 syntax counterparts

my %h = z => 26, b => 5, c => 2, a => 9;

say %h.keys;
say $_ for keys %h;
# could also be written as:
say keys %h; # but the spacing is different in 5

say %h.values;
say $_ for values %h;

say %h.keys.sort;
say $_ for sort keys %h;

Note: Most of the variable object methods also still act as functions, so the following are equivalent:

say %h.keys;
say keys %h;

EVERYTHING IS AN OBJECT, AND HAS A TYPE (and can optionally be constrained)

To give an extremely clear example of how everything is an object and has a type before I get further into how types are handy, I'll use some syntax that I tried and was surprised that it worked. The WHAT() method when called on something informs you of its type.

# calling methods on literals w00t! :)

say 25.WHAT;
say 'string'.WHAT;
say (1,2,3).WHAT;

Output:

Int()
Str()
Parcel()

We can do simple type checking:

my $quote = "I am liking Perl6";

if $quote ~~ Str {
    say "it's a string";
}

Note the lack of parens again, in the if() condition this time. More on this shortly. For now, just know that they can be used (but there are gotchas), but it is recommended that you don't use them.

Constraining variables to certain types is also easy.

# define $x as an Int
my Int $x = 5;

# try to assign it a string
$x = "Hello, world!";

Output:

Type check failed in assignment to '$x'; expected 'Int' but got 'Str'
  in block  at ./types.pl:15

Types have an inheritance hierarchy, but I am not too familiar with it yet. I'll update this post as I learn more. For example, an Int is a subclass of Numeric.

CONTROL STRUCTURES

I briefly touched on using parens with the if statement above. Take this example:

my $x = 5;
if ($x < 10){
    # do stuff
}

In Perl6, having the parameter directly next to the opening parens with no whitespace tells the interpreter to try to call a function named 'if'. This is true for all of the control structures (if, while, for etc). If you leave at least one whitespace between the opening parens and the first character of the expression, things will work as normal. However, to protect against mistakes, it is advised you omit the parens entirely. Here are some interesting changes:

In Perl 5, for the most part, we'd use named lexicals in our for loops like this:

for my $elem ( @a ){ say $elem; }

In Perl6, to avoid use of $_, we use a "pointy block":

for @a -> $elem { say $elem; }

Because I've been testing each code snip before I paste it into this document, I of course just ensured that my Perl 5 for() example was written correctly lest my eyes miss something. Against perl6, the typical Perl 5 for structure above gave me this output:

===SORRY!===
This appears to be Perl 5 code
at ./control.pl:15

So it looks like the pointy block is the way forward. Another note about for(); it is now only used for lists. Perl6 separated the C-style for loop into a loop() structure.

It is now possible to use more than one loop variable:

for @a -> $first, $second, $third { 
    say "$first, $second, $third: I'm greedy on each iteration!"; 
}

Or iterate over a hash without a while/each

for %h.kv -> $k, $v {
    say "$k :: $v"
}

While I was throwing out the use of the kv() method, one of my readers who opted to remain Anonymous pointed out a fantastic feature that I had missed. kv() can be used against arrays as well as hashes. When used against an array, the key is the index number of the array, and the value is the contents of the element. How many times have you sighed at the fact that you have to declare an iteration scalar prior to a for() loop, and then waste another line increasing it upon each loop? No more! The first potential use case that came to me after reading Anon's comment was using the index of the array and the element to create a hash:

my @a = 'a', 'b', 'c';
my %h;

for @a.kv -> $index, $elem {
    %h{ $index } = $elem;
}

There are so many cases where I can think of that we can benefit from not having to define "$i = 0;" and then a second line "$i++". Two lines saved. If you are like me, you dislike using variables for temporary assignments.

Thanks for reading. I hope you enjoyed my little beginning venture into the world of Perl6.

For the most part, the resource I'm using to base my tests and code on can be found here.

20120405

Update 20120406: Thanks to Daniel Ruoso for clarifying what really happens when a bareword is used as a hash key.

Update 20120410: Thanks to a kind Anonymous reader who pointed out that I missed that kv() could be called against an array, and the details of its implementation. They also provided sample code to describe a use-case, which I used as a baseline to create my array kv() example. Cheers!

2012/04/03

use Perl; Purpose and practical use of the built-in named blocks

This post attempts to explain the subtleties of Perl's five named blocks. You'll learn during what phase of operation each one operates at, the order of execution, and the reason they may be needed, including code examples.

The sample code assumes perl version 5.10 or higher.

Perl's five named blocks (in order of execution) are BEGIN, UNITCHECK, CHECK, INIT and END. We'll begin with BEGIN :)

BEGIN: These blocks are executed during compilation, as soon as the definition of the block is complete.

Ensuring lexical state data that shares an outer block with a subroutine is a perfect example of where a BEGIN block makes sense. Here is an example of what happens if the sub uses its state data prior to the data being defined in normal program flow. (hint: it is reset when it is redefined):

Code:

persist();
{
    my $store = 0;
    sub persist {
        say $store++;
    }
}
persist();
persist();

Output:

./begin.pl
0 0 1

By using the BEGIN block, the inner code is defined during compile, so it is available and ready to use before runtime even starts. We can now safely call persist() as many times as we like regardless of the layout of the code, and the state variable will never be reset.

Code:

persist();
BEGIN {
    my $store = 0;
    sub persist {
        say $store++;
    }
}
persist();
persist();

Output:

./begin.pl
0 1 2

NOTE: INIT blocks will perform the same task as the BEGIN block in this case, but INIT is performed at the beginning of runtime as opposed to during compile time. Although INIT could be used here, it is more common to see BEGIN used. BEGIN is only *mandatory* when you need to execute code prior to runtime starting, eg. before any other files or modules are imported. See the INIT section below for an example case where INIT *must* be used.

END: This block will execute after all code in the calling stack has finished. For instance, if I need the program to write to a log file no matter if the program fails or not, I could use an END block to ensure this happens.

Code:


say "Doing work";

other_work();

# write that we've finished

write();

sub other_work {
    say "Doing other work";
    die() if 1; #fatal error!
}
sub write {
    open my $fh, '+>', 'file.log' or die "Can't open file: $!";
    say $fh "Program run at " . time();
}

END{
    open my $fh, '+>', 'file.log' or die "Can't open file: $!";
    say $fh "Program failed at " . time();
}

Because the program terminates via die() before the write() function is called, the log file is not updated, therefore we don't know if the program ran today or not. Since we need to know that the program ran regardless of whether it exited prematurely, we'd use an END block to ensure this. END blocks are executed no matter how or why the program terminates.

INIT/CHECK/UNITCHECK: Perform the same tasks as BEGIN or END, but are executed during different phases, and in different orders.

UNITCHECK: is executed during compile (in reverse order) after the successful compilation of each file loaded with a use() statement. I suppose this would be used if one needed to change the environment in steps to set things up before the next file is loaded. I've never seen it used.

CHECK blocks run in reverse order immediately after all of the code (both use()d code and main code) is compiled. I have read that CHECK blocks are used specifically by people writing and working on the insides of compilers, but don't quote me.

INIT runs code after compilation but before the execution of the code, so realistically, it would be the choice to run what I have up in my BEGIN example above, because I didn't need the code in that example during compilation. However, most coders are more familiar with seeing the use of BEGIN blocks.

There are however distinct situations where INIT must be used instead of BEGIN. If the code within the BEGIN block calls code that will not be defined at compile time (ie. outside of any other BEGIN blocks), compilation will fail. eg:

BEGIN {
    my $store = init_store();
    sub persist {
        say $store++;
    }
}

sub init_store { 0; }

Output:

Undefined subroutine &main::init_store called at ./init.pl line 10.
BEGIN failed--compilation aborted at ./init.pl line 14.

The state data can not be defined within the BEGIN block, because the init_store() sub is not known about until runtime. Remember that BEGIN blocks are executed during compilation, prior to the program running. An INIT block must be used in this case.

See perldoc perlmod

Thanks to JavaFan for the INIT section code example.