Friday, September 28, 2007

Operators and types

Ruby and Python overload the + operator for a large number of things, the most common ones being addition of numbers, concatenation of strings, and concatenation of tuples. Very different things are represented by the same syntax. In Perl these three roles are occupied by three different operators (+,. and ,). For that matter almost all operators for numbers are separated from operators on strings in Perl. This causes is one of the most common misconception among non-natives about Perl's typing system: it's not weakly typed. This piece of code:

my $foo = "1";
return $foo + 0;

does not cause an implicit type conversion, nor is the last statement in any way ambiguous. The addition operator causes an explicit conversion of its argument. It's not an implicit one for a simple reason: it's the Perl idiom for converting any variable to a number.

I think this is an excellent example of the waterbed theory of complexity. To reduce the number of operators in the language Python and Ruby use runtime polymorphism on data whose behavior is already known to the programmer (not to the compiler) at compile time. I cannot think of real-world code where you don't know if your variable is a number, a string or a tuple but want to do addition/concatenation nonetheless. It is trading semantic clarity for syntactic clarity. It's a valid choice, just as Perl's choice of separating them. I think a lot of rubyists and pythonistas fail to see that their choice has its disadvantages too.

Monday, September 17, 2007

Control structures in Perl

As I said in my previous entry, there is a need for education in social coding skills in Perl. Therefor I'll put my money where my mouth is ;-)

Basically there are four patterns for branching in Perl.

  1. Conditional statements: if(condition) { true_action }
    else { false_action }
  2. Statement modifiers action if condition;
  3. Logical operators: condition and action
  4. Ternary operator: condition ? true_action : false_action

How to decide which one to use? That depends on the situation of course. Only conditional statements and the ternary operator can provide an else clause. If you need one, your choices are already limited. All four put their emphasis differently. For example print $line unless $line =~ /^#/; may be better than
$line =~ /^#/ or print $line; Because the principles of prominence. This is a linguistic notion that tells us that people tend to prefer important things to be in front and details to be in the end so they can skip over the details when scanning the code. When skipping the second part the code still makes sense (even though it may not be correct anymore). In general statement modifiers are best used when the action is much more important than the condition.

Logical operators are useful in two situations. The first one is exactly the opposite of the statement modifier: When the condition is vastly more important than the action. This usage typically uses the low precedence version. For example: open my $filehandle, $filename or die "Can't open $filename: $!"; is better than die "Can't open $filename: $!" unless open my $filehandle, $filename; because the first one communicates the intend of the programmer (opening a file) better. Error handling is not important for understanding the big picture of the code.

It has another trait that is important for deciding when to use this pattern. Unlike the previous two patterns, logical operators are expressions not statements. As such they can be used in places where the former can not. parse($filename || "default.filename") is significantly easier to read than if(!$filename) { $filename = "default.filename"; } parse($filename);

Similarly my $id; if($input >= 0) { $id = $input; } else { $id = 1; } can be simplified using the ternary operator to my $id = $input >= 0 ? $input : 1

You may wonder now, when should I use old fashioned conditional statements then? First of all, if the action contains multiple statements and isn't suitable for putting in a function. It puts an equal emphasis on the condition and the action. It makes sense to use this pattern if you don't have a reason to do otherwise.

Summary
Conditional statements Statement modifiers Logical operators Ternary operator
Emphasis None Action Condition Condition
Expression No No Yes Yes
Else clause Yes No No Yes
Nestable/Chainable Yes, very well No Yes Yes
Multiple statements Yes No Yes No

Monday, May 21, 2007

When Perl is beautiful

Programming is hard. Writing maintainable programs is even harder. This is because code tends to be easier to write than to read. Writing easily readable code is almost as hard as reading it.

When writing code, one has to take two kinds of readers into consideration: computers and humans. Writing a correct program is a only matter of making your intentions clear to the computer. Writing a maintainable program however requires making it readable for your fellow humans (including yourself). For any program you're not going to throw away really soon, the latter is just as important as the former. Programs must be written for people to read, and only incidentally for machines to execute.1

The syntax of Perl and Python are quite different from each other (even though under the hood they are much more similar than many zealots would like to admit). This difference stems mostly from a difference in how each tries to make code more accessible for humans.

Python has a philosophy of readability that is minimalist. There should be one—and preferably only one—obvious way to do it2, don't ask for an other one. Python tries to make the programmer reads exactly the same as what the compiler reads.

Perl on the other hand tries something very different. Perl's creator (Larry Wall) was not only educated as a computer scientist, but also as a linguist. As such, Perl behaves more like a natural language than pretty much any other programming language out there. Where some languages such as COBOL have tried to do this by abusing half of the English dictionary, Perl does so by having a 'natural' structure. Thus it tries to fit in with the way humans naturally think. This structure, with its plenitude of operators and other syntax, gives rise to the Perl motto: There Is More Than One Way To Do It (or TIMTOWTDI).

Perl has a lot more syntax than Python, but they have more or less the same functionality. This provides Perl with a lot more bandwidth than Python has to talk to the human reader. This bandwidth comes at a price: programmers who don't know how to make use of this bandwidth will emit line noise without realising it. This phenomenon has given Perl a bad name in much of the programming world.

When people learn to program they learn to talk to the computer, but sadly most books, courses and websites forget to teach the novice programmer how to talk to other humans (Damien Conway's Perl Best Practices is the welcome exception). Perl is more affected by this lack of what I call social coding skills than other programming languages because of its design.

Good Perl code is a thing of beauty and beautiful Perl code is almost always good code. For Perl to lose its bad reputation, novice programmers need to learn how to communicate on the human channel.

1. Structure and Interpretation of Computer Programs - Abelson & Sussman
2. PEP 20: The Zen of Python - Tim Peters