Sunday, July 11, 2010

Perl sucks

This is not new territory, I admit. But given how much I hate Perl now, I figured I had to write something.

I came to Perl in the early 1990's from a background in C and Unix sh, and compared to sh, sed, awk, etc., Perl was wonderful. You could do all sorts of cool things, and didn't have to shoot yourself worrying about quoting conventions (the bane of shell scripts) or spend enormous amounts of time writing basic code to manipulate strings, like you did in C.

But I remember when I started trying to do more with Perl, especially involving arrays. I could never quite remember the syntax for array manipulations, and had difficulty figuring out what the actual internal model for arrays was -- how were they stored, when were they being copied vs. passed by reference, etc.? The Programming Perl book wasn't much help, and worse yet made reference to bizarre things like "typeglobs" that somehow allowed you to do other weird tricks with arrays but whose internal model was even more opaque to me. And what if I wanted to store an array with other arrays in it? Well, if you try this, you find that your arrays get automatically flattened, and if you don't want this, you have to use pointers and dereferencing -- again, something I could never quite work out the internal model of.

Someone eventually pointed me to Python, and the instant I started playing around with it, I switched languages and never went back to Perl, except for one-liners like

perl -pi -e 's/foo\((.*?),(.*?)\)/foo($2,$1)/'

which are still quite useful.

Recently I went back to take a look at Perl, and now that I'm used to Python, I'm flabbergasted by how bad it is. For example, it makes sense to me that adding two strings or arrays together concatenates them -- although I can equally see the logic of people complaining that this should be an error, and there should be different operators/functions for these operations. But it certainly should be the case that

  1. Adding two arrays or two strings should either concatenate them or produce an error
  2. Adding unlike types (numbers and strings, strings and arrays, etc.) should produce an error

The reason for both of these things is obvious. The second one is especially important -- in dynamic languages it's very easy to introduce a wrongly typed object by mistake, and logic errors like this should be caught early. It's true that dynamic languages trade off some safety for convenience, but it's hard to see what the sensible result of adding e.g. a number and an array should be, and any operation without a sensible result should be an error.

For example, in Python:

>>> print "a" + "b"
ab
>>> print ("a") + ("b")
ab
>>> print (1,2) + (3,4)
(1, 2, 3, 4)
>>> print "a" + (1,2)
TypeError: cannot concatenate 'str' and 'tuple' objects
>>> print 1 + (1,2)
TypeError: unsupported operand type(s) for +: 'int' and 'tuple'
>>> print [1,2] + [3,4]
[1, 2, 3, 4]
>>> print "12" + 1
TypeError: cannot concatenate 'str' and 'int' objects
>>> print "12" + (1,2)
TypeError: cannot concatenate 'str' and 'tuple' objects
>>> print "12" + [1,2]
TypeError: cannot concatenate 'str' and 'list' objects

In this case, the result of print ("a") + ("b") may not seem obvious if you come from a Perl world. In Python, parentheses can be used to create tuples (similar to lists), but only if there's a comma somewhere. Without a comma, parentheses are just used for grouping, so ("a") is the same as just "a".

Now, let's try the same thing in Perl:

% perl -e '
use strict;
print "a" + "b"; print "\n";
print ("a") + ("b"); print "\n";
print (1,2) + (3,4); print "\n";
print "a" + (1,2); print "\n";
print 1 + (1,2); print "\n";
print [1,2] + [3,4]; print "\n";
print "12" + 1; print "\n";
print "12" + (1,2); print "\n";
print "12" + [1,2]; print "\n";
'

0
a
12
2
3
537744328
13
14
269031628


What the fuck???? None of the results makes any obvious sense, and even worse, not one of the operations triggers an error, even with
use strict! Now, I know from experience that Perl automatically converts strings to integers, which is horrible behavior but at least explains why "12" + 1 produces 13. But some of the others? Presumably 53774432 and 269031628 are memory locations, but how can it possibly be useful for them to appear? And how in the hell does (1,2) + (3,4) become 12?

Speaking of arrays, I remember that Perl has both (1,2) and [1,2], and that the latter is actually syntax for a reference to an array, which you need to do if you don't want auto-flattening behavior. But what's the syntax for actually using them?

perl -e 'my $a=("a","b"); print $a[1];'

-> no output

WTF? Oh yeah, you need to use the @ sigil when assigning an array to a variable, and I would have gotten an error if I had remembered to use strict. Ok ...

perl -e 'use strict; my @a=("a","b"); print $a[1];'

-> b

Now what about that [] syntax?

perl -e 'use strict; my @a=["a","b"]; print $a[1];'

-> no output

WTF again?? Even with use strict, no error. Just incredibly hostile behavior.

Oh, fuck now, I remember that the @ sigil is only for actual arrays, not references, which use the $ sigil. But still, why don't I get an error when using the wrong sigil? Hmm, let's do some experimenting:

perl -e 'use strict; my @a=["a","b"]; print $a[0];'

-> ARRAY(0x10043af0)

Huh? Oh yeah, from painful experience I now remember that junk like ARRAY(0x10043af0) means you tried to print out a reference. Again, very hostile (why doesn't it print something readable like [a,b] or ['a','b'], like Python does)? But at least I work out that:

1. The [] "array", which is actually a reference to an array, is sitting in the first element of the @a array; therefore

2. Assigning a non-array to an array variable automatically converts it to an array of size 1. (Yuck!)

I also remember now that a syntax like @$a ought to dereference an array reference. So I try this:

perl -e 'use strict; my $a=["a","b"]; print @$a;'

-> ab

OK, that works. Maybe there's actually some logic to this. So, back to my previous example, this means I can use @$a[0] to dereference the array, right?

perl -e 'use strict; my @a=["a","b"]; print @$a[0];'

-> no output

Fuck me! Why doesn't this work? I thought I had gotten a sense of how this shit worked. Maybe the precedence is wrong, and I need to put in some parens?

perl -e 'use strict; my @a=["a","b"]; print @($a[0]);'

-> Scalar found where operator expected at -e line 1, near "@($a"
(Missing operator before $a?)
syntax error at -e line 1, near "@($a"

God damn it all!!!! This utterly sucks. I have no idea why this fails, or what this error message means. Let's just give up on trying to understand this crap, and try to get something working.

perl -e 'use strict; my $a=["a","b"]; print $a[1];'

-> Global symbol "@a" requires explicit package name at -e line 1.

What the ...? If you weren't already Perl-literate, would you ever in God's name figure out what this error message actually means? Aside from the $ gobbledygook, this looks totally sensible, and in fact it's exactly like what you'd do in Python. You have to remember that

1. $a and @a are not just extra line noise you have to stick in front of a variable to indicate its type, but are actually different variables.

2. An expression like $a[1] actually refers to the @a variable, not $a.

3. The error message is actually telling you that @a isn't defined, even though that's not at all what it says.

Ok, now I remember from before about dereferencing a reference to an array ...

perl -e 'use strict; my $a=["a","b"]; print @$a[1];'

-> b

At least it works, but it's butt-ugly.

The whole point of all this [] mess is so I can put arrays in arrays, so with the little tiny bit of energy I've got left after this sordid affair, let me try this:

perl -e 'use strict; my $a=[["a","b"], 1]; print @$a[1];'

-> 1

Ok, so far, so good.

perl -e 'use strict; my $a=[["a","b"], 1]; print @$a[1];'

-> 1

perl -e 'use strict; my $a=[["a","b"], 1]; print @$a[0];'

-> ARRAY(0x10043af0)

Oops, there's that garbagey reference stuff again. Just got to dereference it ...

perl -e 'use strict; my $a=[["a","b"], 1]; print @@$a[0];'

-> Scalar found where operator expected at -e line 1, near "@@$a"
(Missing operator before $a?)
syntax error at -e line 1, near "@@$a"
Global symbol "@a" requires explicit package name at -e line 1.

Oh no!!!!!!!!!!! Help!!!!!!!!!!!!!! Doesn't work, gives me two errors, and I have no idea what the right syntax is.

Well, fuck it, can I at least figure out how to get the value that should logically be a[0][0], or at least $a[0][0], or @$a[0][0], or something like that?

perl -e 'use strict; my $a=[["a","b"], 1]; print $a[0][0];'

-> Global symbol "@a" requires explicit package name at -e line 1.

perl -e 'use strict; my $a=[["a","b"], 1]; print @$a[0][0];'

-> syntax error at -e line 1, near "]["

perl -e 'use strict; my $a=[["a","b"], 1]; print @$@$a[0][0];'

-> Array found where operator expected at -e line 1, at end of line
(Missing operator before ?)
syntax error at -e line 1, near "@$@$"

perl -e 'use strict; my $a=[["a","b"], 1]; print @@$a[0][0];'

-> Scalar found where operator expected at -e line 1, near "@@$["
(Missing operator before $[?)
Number found where operator expected at -e line 1, near "$[0"
(Missing operator before 0?)
syntax error at -e line 1, near "@@$["
Unmatched right square bracket at -e line 1, at end of line

Fuck fuck fuck fuck fuck fuck fuck fuck! None of these fucking possibilities work, and each one gives different (but equally meaningless) errors. And fuck you, compiler, there is not an unmatched right square bracket! And I bet you $1,000,000 that taking off that bracket is just going to make you complain about an unmatched left bracket.

perl -e 'use strict; my $a=[["a","b"], 1]; print @@$a[0][0;'

-> Scalar found where operator expected at -e line 1, near "@@$["
(Missing operator before $[?)
Number found where operator expected at -e line 1, near "$[0"
(Missing operator before 0?)
syntax error at -e line 1, near "@@$["
Unmatched right square bracket at -e line 1, at end of line
Missing right curly or square bracket at -e line 1, at end of line

Hah!!! I was right! And now you're managing to say that there is a right bracket at EOL that shouldn't be there (I sure don't see it, though ...), and simultaneously there isn't a right bracket at EOL that should be there. Damn, you're the stupidest piece of crap washed up in this here backwater since Vern's pathetic excuse for a nephew crashed through back in '37.

Now the hero goes for one last desparate hail-mary pass:

perl -e 'use strict; my $a=[["a","b"], 1]; print @$a[0,0];'

-> ARRAY(0x10043af0)ARRAY(0x10043af0)

????????????? Ok, compiler, I give up, you beat me, fair and square. In fact, you pounded me into a great, sobbing, gibbering piece of pulp, not even worthy of tearfully huddling around his mother's knee.

BTW, I finally had to ask the powerful guru RTFM, who said the magic incantation is

perl -e 'use strict; my $a=[["a","b"], 1]; print $a->[0][0];'

-> a

I would never in a million years have guessed that, since it looks like a total train wreck, with two operators butting up against each other. Turns out you can also use $a->[0]->[0] or even @$a[0]->[0]. But for unknown reasons, no @$a[0][0]. And what if you want to dereference the inner array?

perl -e 'use strict; my $a=[["a","b"], 1]; print $a->[0];'

-> ARRAY(0x10043af0)

perl -e 'use strict; my $a=[["a","b"], 1]; print @$a->[0];'

-> ARRAY(0x10043af0)

Huhhh?? Try to dereference, get exactly same result? Do we have precedence problems? I remember an exact same situation where I thought there might be precedence problems and I stuck in parens, but just got a syntax error. Let's pray ...

perl -e 'use strict; my $a=[["a","b"], 1]; print @($a->[0]);'

-> Scalar found where operator expected at -e line 1, near "@($a"
(Missing operator before $a?)
syntax error at -e line 1, near "@($a"ARRAY(0x10043af0)

Nope.

Once again, I came groveling to the might RTFM, who said, Thou shalt use the curly rather than the rounded brackets for grouping dereference operators. Lo and behold:

perl -e 'use strict; my $a=[["a","b"], 1]; print @{$a->[0]};'

-> ab

Eureka! Now I understand how to get the above expressions to work that I was banging my head against:

perl -e 'use strict; my $a=[["a","b"], 1]; print ${@$a[0]}[0];'

-> a

But what a stressful and exhausting journey it's been, and the result is ungodly ugly. Truly a pyrrhic victory.

I do not like git

Recently I had to use git for the first time. I'm used to using Mercurial instead. For those of you not familiar with these programs, they're both distributed source control managers (SCM's), which are used for maintaining a history of changes you've made to an application or other project.

In many ways, git is extremely hostile. If you're considering git or Mercurial, use the latter.

Examples:
  1. To commit all changes in Mercurial, do hg commit. In git, git commit only commits added or moved files; to commit all changes, you have to do git commit . (why?).
  2. To see all changes in Mercurial, do hg diff. Added or moved files are shown appropriately (using diffs to/from null, or using a move foo to bar statement + a cross-file diff if you use hg diff --git -- ironically, since git can't do this!). In git, git diff does NOT show added or moved files that you haven't checked in, and there seems to be no option to make it show them. (Hence, the only way to figure out whether you have added or moved files is to run git commit to see what it says will be done, and then exit from the editor without changing anything, to abort the commit.)
  3. To revert a file back to the latest repository version in Mercurial, do e.g. hg revert foo.c. To revert all files, do hg revert --all. In git, to revert a file, you have to do git checkout HEAD -- foo.c (super obvious, no?), but to revert all files, you use a totally different incantation: git reset --hard. Note that neither of these git incantations uses git revert, which does something else entirely (check out an earlier version of the repository and then commit it).
  4. Commits in Mercurial are given sequential numbers starting from 1, to be used locally for convenience, as well as a long (32-hex-digit) globally unique commit identifier. git, naturally, is trying to be hostile, so it doesn't have anything like Mercurial's sequential numbers. The only way to identify a commit is using some fairly long subset of the commit ID (long enough to be unique, but I assume you can only find that out by trying it and seeing if git complains), or using something like HEAD~3 i.e. 3 commits before the last one. The latter works ok if you're referring to very recent commits, but for older commits your only option is the long hexadecimal commit ID's.