Would be interesting to see how Julia perf compares to C++ compiled or JITed with LLVM via G++ 4.6+Dragonegg or Clang 3.0/svn. Would be more apples-to-apples, since both would use the same middle- and back-end. G++ 4.2.1 is a bit obsolete at this point, but as it probably came by default on the MBP they tested on, it's understandable.
That would certainly be doable. If the performance is better, we can certainly switch to using that for our benchmarks. The idea for the benchmarks is to compare to a "gold standard" — hence the fact that the best results are taken across all optimization levels. We could even take the best results across multiple C compilers to give ourselves the absolute hardest comparison :-)