Sunday, July 11, 2010

Perl sucks

This is not new territory, I admit. But given how much I hate Perl now, I figured I had to write something.

I came to Perl in the early 1990's from a background in C and Unix sh, and compared to sh, sed, awk, etc., Perl was wonderful. You could do all sorts of cool things, and didn't have to shoot yourself worrying about quoting conventions (the bane of shell scripts) or spend enormous amounts of time writing basic code to manipulate strings, like you did in C.

But I remember when I started trying to do more with Perl, especially involving arrays. I could never quite remember the syntax for array manipulations, and had difficulty figuring out what the actual internal model for arrays was -- how were they stored, when were they being copied vs. passed by reference, etc.? The Programming Perl book wasn't much help, and worse yet made reference to bizarre things like "typeglobs" that somehow allowed you to do other weird tricks with arrays but whose internal model was even more opaque to me. And what if I wanted to store an array with other arrays in it? Well, if you try this, you find that your arrays get automatically flattened, and if you don't want this, you have to use pointers and dereferencing -- again, something I could never quite work out the internal model of.

Someone eventually pointed me to Python, and the instant I started playing around with it, I switched languages and never went back to Perl, except for one-liners like

perl -pi -e 's/foo\((.*?),(.*?)\)/foo($2,$1)/'

which are still quite useful.

Recently I went back to take a look at Perl, and now that I'm used to Python, I'm flabbergasted by how bad it is. For example, it makes sense to me that adding two strings or arrays together concatenates them -- although I can equally see the logic of people complaining that this should be an error, and there should be different operators/functions for these operations. But it certainly should be the case that

  1. Adding two arrays or two strings should either concatenate them or produce an error
  2. Adding unlike types (numbers and strings, strings and arrays, etc.) should produce an error

The reason for both of these things is obvious. The second one is especially important -- in dynamic languages it's very easy to introduce a wrongly typed object by mistake, and logic errors like this should be caught early. It's true that dynamic languages trade off some safety for convenience, but it's hard to see what the sensible result of adding e.g. a number and an array should be, and any operation without a sensible result should be an error.

For example, in Python:

>>> print "a" + "b"
ab
>>> print ("a") + ("b")
ab
>>> print (1,2) + (3,4)
(1, 2, 3, 4)
>>> print "a" + (1,2)
TypeError: cannot concatenate 'str' and 'tuple' objects
>>> print 1 + (1,2)
TypeError: unsupported operand type(s) for +: 'int' and 'tuple'
>>> print [1,2] + [3,4]
[1, 2, 3, 4]
>>> print "12" + 1
TypeError: cannot concatenate 'str' and 'int' objects
>>> print "12" + (1,2)
TypeError: cannot concatenate 'str' and 'tuple' objects
>>> print "12" + [1,2]
TypeError: cannot concatenate 'str' and 'list' objects

In this case, the result of print ("a") + ("b") may not seem obvious if you come from a Perl world. In Python, parentheses can be used to create tuples (similar to lists), but only if there's a comma somewhere. Without a comma, parentheses are just used for grouping, so ("a") is the same as just "a".

Now, let's try the same thing in Perl:

% perl -e '
use strict;
print "a" + "b"; print "\n";
print ("a") + ("b"); print "\n";
print (1,2) + (3,4); print "\n";
print "a" + (1,2); print "\n";
print 1 + (1,2); print "\n";
print [1,2] + [3,4]; print "\n";
print "12" + 1; print "\n";
print "12" + (1,2); print "\n";
print "12" + [1,2]; print "\n";
'

0
a
12
2
3
537744328
13
14
269031628


What the fuck???? None of the results makes any obvious sense, and even worse, not one of the operations triggers an error, even with
use strict! Now, I know from experience that Perl automatically converts strings to integers, which is horrible behavior but at least explains why "12" + 1 produces 13. But some of the others? Presumably 53774432 and 269031628 are memory locations, but how can it possibly be useful for them to appear? And how in the hell does (1,2) + (3,4) become 12?

Speaking of arrays, I remember that Perl has both (1,2) and [1,2], and that the latter is actually syntax for a reference to an array, which you need to do if you don't want auto-flattening behavior. But what's the syntax for actually using them?

perl -e 'my $a=("a","b"); print $a[1];'

-> no output

WTF? Oh yeah, you need to use the @ sigil when assigning an array to a variable, and I would have gotten an error if I had remembered to use strict. Ok ...

perl -e 'use strict; my @a=("a","b"); print $a[1];'

-> b

Now what about that [] syntax?

perl -e 'use strict; my @a=["a","b"]; print $a[1];'

-> no output

WTF again?? Even with use strict, no error. Just incredibly hostile behavior.

Oh, fuck now, I remember that the @ sigil is only for actual arrays, not references, which use the $ sigil. But still, why don't I get an error when using the wrong sigil? Hmm, let's do some experimenting:

perl -e 'use strict; my @a=["a","b"]; print $a[0];'

-> ARRAY(0x10043af0)

Huh? Oh yeah, from painful experience I now remember that junk like ARRAY(0x10043af0) means you tried to print out a reference. Again, very hostile (why doesn't it print something readable like [a,b] or ['a','b'], like Python does)? But at least I work out that:

1. The [] "array", which is actually a reference to an array, is sitting in the first element of the @a array; therefore

2. Assigning a non-array to an array variable automatically converts it to an array of size 1. (Yuck!)

I also remember now that a syntax like @$a ought to dereference an array reference. So I try this:

perl -e 'use strict; my $a=["a","b"]; print @$a;'

-> ab

OK, that works. Maybe there's actually some logic to this. So, back to my previous example, this means I can use @$a[0] to dereference the array, right?

perl -e 'use strict; my @a=["a","b"]; print @$a[0];'

-> no output

Fuck me! Why doesn't this work? I thought I had gotten a sense of how this shit worked. Maybe the precedence is wrong, and I need to put in some parens?

perl -e 'use strict; my @a=["a","b"]; print @($a[0]);'

-> Scalar found where operator expected at -e line 1, near "@($a"
(Missing operator before $a?)
syntax error at -e line 1, near "@($a"

God damn it all!!!! This utterly sucks. I have no idea why this fails, or what this error message means. Let's just give up on trying to understand this crap, and try to get something working.

perl -e 'use strict; my $a=["a","b"]; print $a[1];'

-> Global symbol "@a" requires explicit package name at -e line 1.

What the ...? If you weren't already Perl-literate, would you ever in God's name figure out what this error message actually means? Aside from the $ gobbledygook, this looks totally sensible, and in fact it's exactly like what you'd do in Python. You have to remember that

1. $a and @a are not just extra line noise you have to stick in front of a variable to indicate its type, but are actually different variables.

2. An expression like $a[1] actually refers to the @a variable, not $a.

3. The error message is actually telling you that @a isn't defined, even though that's not at all what it says.

Ok, now I remember from before about dereferencing a reference to an array ...

perl -e 'use strict; my $a=["a","b"]; print @$a[1];'

-> b

At least it works, but it's butt-ugly.

The whole point of all this [] mess is so I can put arrays in arrays, so with the little tiny bit of energy I've got left after this sordid affair, let me try this:

perl -e 'use strict; my $a=[["a","b"], 1]; print @$a[1];'

-> 1

Ok, so far, so good.

perl -e 'use strict; my $a=[["a","b"], 1]; print @$a[1];'

-> 1

perl -e 'use strict; my $a=[["a","b"], 1]; print @$a[0];'

-> ARRAY(0x10043af0)

Oops, there's that garbagey reference stuff again. Just got to dereference it ...

perl -e 'use strict; my $a=[["a","b"], 1]; print @@$a[0];'

-> Scalar found where operator expected at -e line 1, near "@@$a"
(Missing operator before $a?)
syntax error at -e line 1, near "@@$a"
Global symbol "@a" requires explicit package name at -e line 1.

Oh no!!!!!!!!!!! Help!!!!!!!!!!!!!! Doesn't work, gives me two errors, and I have no idea what the right syntax is.

Well, fuck it, can I at least figure out how to get the value that should logically be a[0][0], or at least $a[0][0], or @$a[0][0], or something like that?

perl -e 'use strict; my $a=[["a","b"], 1]; print $a[0][0];'

-> Global symbol "@a" requires explicit package name at -e line 1.

perl -e 'use strict; my $a=[["a","b"], 1]; print @$a[0][0];'

-> syntax error at -e line 1, near "]["

perl -e 'use strict; my $a=[["a","b"], 1]; print @$@$a[0][0];'

-> Array found where operator expected at -e line 1, at end of line
(Missing operator before ?)
syntax error at -e line 1, near "@$@$"

perl -e 'use strict; my $a=[["a","b"], 1]; print @@$a[0][0];'

-> Scalar found where operator expected at -e line 1, near "@@$["
(Missing operator before $[?)
Number found where operator expected at -e line 1, near "$[0"
(Missing operator before 0?)
syntax error at -e line 1, near "@@$["
Unmatched right square bracket at -e line 1, at end of line

Fuck fuck fuck fuck fuck fuck fuck fuck! None of these fucking possibilities work, and each one gives different (but equally meaningless) errors. And fuck you, compiler, there is not an unmatched right square bracket! And I bet you $1,000,000 that taking off that bracket is just going to make you complain about an unmatched left bracket.

perl -e 'use strict; my $a=[["a","b"], 1]; print @@$a[0][0;'

-> Scalar found where operator expected at -e line 1, near "@@$["
(Missing operator before $[?)
Number found where operator expected at -e line 1, near "$[0"
(Missing operator before 0?)
syntax error at -e line 1, near "@@$["
Unmatched right square bracket at -e line 1, at end of line
Missing right curly or square bracket at -e line 1, at end of line

Hah!!! I was right! And now you're managing to say that there is a right bracket at EOL that shouldn't be there (I sure don't see it, though ...), and simultaneously there isn't a right bracket at EOL that should be there. Damn, you're the stupidest piece of crap washed up in this here backwater since Vern's pathetic excuse for a nephew crashed through back in '37.

Now the hero goes for one last desparate hail-mary pass:

perl -e 'use strict; my $a=[["a","b"], 1]; print @$a[0,0];'

-> ARRAY(0x10043af0)ARRAY(0x10043af0)

????????????? Ok, compiler, I give up, you beat me, fair and square. In fact, you pounded me into a great, sobbing, gibbering piece of pulp, not even worthy of tearfully huddling around his mother's knee.

BTW, I finally had to ask the powerful guru RTFM, who said the magic incantation is

perl -e 'use strict; my $a=[["a","b"], 1]; print $a->[0][0];'

-> a

I would never in a million years have guessed that, since it looks like a total train wreck, with two operators butting up against each other. Turns out you can also use $a->[0]->[0] or even @$a[0]->[0]. But for unknown reasons, no @$a[0][0]. And what if you want to dereference the inner array?

perl -e 'use strict; my $a=[["a","b"], 1]; print $a->[0];'

-> ARRAY(0x10043af0)

perl -e 'use strict; my $a=[["a","b"], 1]; print @$a->[0];'

-> ARRAY(0x10043af0)

Huhhh?? Try to dereference, get exactly same result? Do we have precedence problems? I remember an exact same situation where I thought there might be precedence problems and I stuck in parens, but just got a syntax error. Let's pray ...

perl -e 'use strict; my $a=[["a","b"], 1]; print @($a->[0]);'

-> Scalar found where operator expected at -e line 1, near "@($a"
(Missing operator before $a?)
syntax error at -e line 1, near "@($a"ARRAY(0x10043af0)

Nope.

Once again, I came groveling to the might RTFM, who said, Thou shalt use the curly rather than the rounded brackets for grouping dereference operators. Lo and behold:

perl -e 'use strict; my $a=[["a","b"], 1]; print @{$a->[0]};'

-> ab

Eureka! Now I understand how to get the above expressions to work that I was banging my head against:

perl -e 'use strict; my $a=[["a","b"], 1]; print ${@$a[0]}[0];'

-> a

But what a stressful and exhausting journey it's been, and the result is ungodly ugly. Truly a pyrrhic victory.

3 comments:

  1. I'm trying to learn Perl, and I must admit it's a much more painful experience than learning Python. I still haven't figured out how to avoid putting "my" everywhere, for example.
    Maybe Perl 6 will be better?

    ReplyDelete
  2. I first picked up Perl about two years ago. It was unpopular and had a bad reputation and I wanted to see what the fuss was about. References are somewhat consistent, but unnecessarily complex. It's easy to get them wrong and not notice, and hard to debug if you do make a mistake. Using references to arrays and hashes is more convenient in general, but you frequently have to deference them to use the builtin functions and wind up having two syntaxes for many things. The Perl documentation is in some ways not helpful since it's geared more towards people trying to get an in-depth understanding of all the minutiae than actually trying to use the language. Here's my attempt at a condensed explanation.

    my @a = ("a", "b"); # array of "a" and "b"
    print $a[0];

    # pathological autoflattening behavior
    my @a = ("a", ("b", "c")) # array of "a", "b", and "c"
    print $a[2] # prints "c"

    # array reference
    # sigils and -> both dereference scalars.
    # $,@,% indicates the type of the expression you are getting back,
    # not the type of container.
    my $a = ["a", "b"];
    print $a->[0]; # prints "a".
    print $$a[0]; # also prints "a". can also be explicitly written ${$a}[0]

    @{$a}[0] # one element slice of $a ... the @ means you're getting an array.

    # indexing is in some ways pretty consistent.
    # if you want an array of values corresponding to keys from a hash
    # you can take a hash slice
    my %h = (key0 => 'value0', key1 => 'value1', key2 => 'value2');
    @h{'key0', 'key1', 'key0'} # ('value0', 'value1', 'value0')

    # values of a hash sorted by key.
    @h{sort keys %h}
    # same thing for a hash reference
    @{ $hashref }{sort keys %$hashref}

    ReplyDelete
  3. perl -e 'use strict; my $a=[["a","b"], 1]; print $a->[0]->[0]'

    Also, perldoc warnings. All code written since 2010 should have warnings turned on, and it would have told you about at least 50% of your mistakes.

    It sucks its not on by default, sure, but it can't be made on by default due to legacy compat reasons.

    It looks like you ended up with an archaic book that was simply no longer relevant and you got a lot of outdated advice, and then dumped your existing preconceptoins on top of that. ( For instance, anyone with rudimentary knowledge of perl will tell you "a" + "b" is math, not concatenation , you want "." for that. )

    ReplyDelete