Dart vs Java — the DeltaBlue Benchmark

As of the time of this writing the performance page on dartlang.org tracks Dart VM performance as measured by the DeltaBlue benchmark.

I ported the benchmark_harness Dart package (including the DeltaBlue benchmark) into Java and ran against the latest Java 7 and 8 JDKs.

The experience of translating Dart to Java was surprisingly smooth. Some of the most common small porting tasks included

Dart bool to Java boolean;
Dart C++-like super call syntax;
Dart constructor syntactic sugar;
Dart shorthand (=>) functions to Java full format;
Wrapping Dart top-level functions and variables inside a Java top-level class;
Changing the use of the Dart Function type to a Java Runnable;
The Dart truncating division operator ~/, which apparently is equivalent to plain division (/) when applied to integers;
Dart list access [] operator to Java List.get()

The trickiest part of the translation was the following piece of code that appeared absolutely befuddling at first sight:

<Strength>[WEAKEST, WEAK_DEFAULT, NORMAL, STRONG_DEFAULT,
           PREFERRED, STRONG_REFERRED][value]

As it turns out, this is simply an array literal

[WEAKEST, WEAK_DEFAULT, NORMAL, STRONG_DEFAULT,
 PREFERRED, STRONG_REFERRED]

prefixed by a generic type parameter specifying the type of the elements in the list

<Strength>

and followed by the list access ([]) operator, getting the element of the list at index value:

[value]

After working my way through this, the translation went smoothly, until I got to run the benchmark and hit a NullPointerException. In DeltaBlue, the BinaryConstraint constructor calls the addConstraint(), which is overridden in its subclasses. The ScaleConstraint sublcass implementation of addConstraint(), in particular, accesses ScaleConstraint fields that are initialized in the constructor. This pattern works in Dart, where apparently “this” constructor arguments are stored in their corresponding instance fields before the super constructor is invoked. Since this is not possible in Java (the super call must be the first statement in the constructor), I moved the addConstraint() call from BinaryConstraint to each of the subclass constructors. With that fix, the port was complete and I was able to run the Java version of the benchmark.

Here are the DeltaBlue numbers for Dart and Java on my ThinkPad W510:

Dart (22416)   2,810.39   355.82
Dart (22577)   2,283.11   438.00
Java (1.7.0_21-b11)   2,728.51   366.50
Java (1.8.0-ea)   2,693.14   371.31
Java (1.7 32-bit)   3,555.95   281.22

Update 5/11 More numbers: running for 45 seconds improves the performance of the 64-bit JVM (1.7,45s) but not the 32-bit one (1.7 32-bit,45s); the 32-bit Server JVM (32-server,45s) performs just as fast as the 64-bit JVM; the xxgreg version (xxgreg,45s) of DeltaBlue runs slower on the 64-bit JVM than my version ported from Dart; the xxgreg benchmark (xxgreg-run) uses a different harness and measurements include VM startup and warmup time.

Java (1.7 32-bit,45s) 3,533.99 282.97
Java (32-server,45s) 2,701.67 370.14
Java (1.7,45s) 2,559.38 390.72
Java (xxgreg,45s) 2,780.61 359.63
Dart (xxgreg-run) 2,356.70 424.32
Java (xxgreg-run) 2,800.10 357.13

The number in the first column is the runtime in us as reported by the benchmark harness at the end of a run. The second number is the score as defined on the dartlang.org performance page: “runs/second.” I ran the benchmark on each VM multiple times and as the variance between runs was small enough I picked the result from a random execution for each VM.

The first Dart VM (22416) is the current public release available on the Dart website, while 22577 is the current nightly build. I included the nightly build, as it is clearly visible on the dartlang.org performance page that Dart saw a major gain in performance as of build 22437. My test confirmed this observation.

The results are truly impressive. Dart, still a baby at 2 years of age and pre-1.0, already exhibits 15% better performance than Java, a veteran of 18 years. I think this truly deserves to be called a case of David vs Goliath.

Update: Both Dart VMs tested are 32-bit, while the two original Java VMs are 64-bit. Tested with the 32-bit Java 1.7.0_21 VM with even more disappointing results.

20 comments

Jakob says:

May 10, 2013 at 8:11 am

Did you consider JVM warmup times in your benchmarks

Reply
1. nikolay_botev says:
  
  May 11, 2013 at 2:57 am
  
  @Jacob Yes and no. The benchmark_harness that was ported from Dart warms up for up to 100ms, which might not be enough for Java (If I remember correctly there is a magic number of 10,000 executions before a method gets JIT-ed. I have more numbers that show some improvement in speed with a warmup time of 15 seconds.
  
  Reply
greg says:

May 10, 2013 at 3:31 pm

There’s already a Java version of Delta blue from sun. I did some benchmarking recently. http://github.com/xxgreg/deltablue
Looks like dart has got faster in the meantime

Reply
1. nikolay_botev says:
  
  May 11, 2013 at 2:49 am
  
  @greg Thanks, I added your version so it can be executed with the Dart-based benchmark_harness. The numbers look similar.
  
  Reply
Charles Oliver Nutter says:

May 10, 2013 at 9:27 pm

I tried running your code, and the first thing I noticed was the huge rate of allocation it’s doing. During the 1.7s it takes on my system per iteration, it’s doing 15-30 young GC runs, each evacuating about 75MB of objects. On the low end that’s in the range of 750MB of allocation every second.

I also set it up to run multiple times in the same VM, and the second+ runs were all about 15-20% faster than the first.

I’m going to see if I can clean up some of the allocation.

Reply
Charles Oliver Nutter says:

May 11, 2013 at 12:02 am

I made various improvements, and it was rather easy to make the Java version run faster than the Dart version.

I posted my findings on Reddit here: http://www.reddit.com/r/programming/comments/1e2jhr/dart_vs_java_the_deltablue_benchmark/c9wloq3

In the end, though, both Dart and Java are getting close to irreducible complexity without redesigning the algorithm itself. They’re both forced to do some amount of allocation, and beyond that the logic is not particularly complex. Both JITs should be expected to handle it easily and produce similar optimized code.

Reply
1. nikolay_botev says:
  
  May 11, 2013 at 3:06 am
  
  @nutter I did notice that there was excessive object allocation, most egregiously in the nextWeaker() method. I did try extracting the array into a constant and when I did not see a difference in performance I stopped there. I actually just ran the official DeltaBlue port from Oracle labs, optimized further by Greg 2 months ago, and that actually ran slower than my Dart port of the benchmark.
  
  I understand the Java version can be optimized to do less allocation, but that immediately means the Dart version can be optimized the same way. Both languages are very similar in the way object allocation is done. How about applying the same edits you did to the Java code to Dart and comparing the resulting performance. It would be interesting to see if the Dart VM can benefit from less allocations too.
  
  Reply
Charles Munger says:

May 11, 2013 at 2:36 am

Out of idle curiosity, does adding the -server flag to the jvm change the numbers?

Reply
1. nikolay_botev says:
  
  May 11, 2013 at 2:48 am
  
  @Munger Yes, I added more numbers that show the 32-bit Server JVM performs just as fast as the 64-bit one.
  
  Reply
Isaac Gouy says:

May 11, 2013 at 8:24 am

A well-known Java port of the benchmark code has been available since 1996 (Benchmarking Java with Richards and DeltaBlue. Sun Microsystems Laboratories).

https://community.cablelabs.com/svn/OCAPRI/tags/stable_ctp_no_upnp/ri/ODLSrc/OCAP-1.0/jvm/Sun/src/share/javavm/test/COM/sun/labs/kanban/DeltaBlue/DeltaBlue.java

Reply
1. nikolay_botev says:
  
  May 16, 2013 at 10:25 pm
  
  @Isaac thanks for the link. I believe greg used that port of DeltaBlue in his code at github.com/xxgreg. I ran his version of DeltaBlue and saw no major difference in the results. My goal was to make a direct Java port from Dart for an apples-to-apples comparison, and also to get a feel of Dart the languages itself.
  
  Reply
greg says:

May 12, 2013 at 4:24 am

Tom M posted some results to the dart mailing list showing different results JVM 1.12x gcc o3 vs DartVM 1.24x gcc o3. I wonder if anyone else can reproduce those results.

I wonder if it could be a AMD/Intel thing?

Reply
tawek says:

May 12, 2013 at 6:06 pm

What Charles have done wrt to optimizing allocations of ArrayList is not something that would be done by an average joe programmer writing typical web applications in java

More interesting would be some questions about differences between java and dart vms:

Does dart have native implementation of lists which outperforms java ArrayList library class? – probably not as there is list.dart

Does dart perform better escape analysis of (apparently) temporary objects than java (and eliminates heap allocation)? – afaik yes : https://codereview.chromium.org//14935005

Reply
1. nikolay_botev says:
  
  May 16, 2013 at 10:26 pm
  
  @tawek thanks for the link and sharing your thoughts. I tend to agree with what you are saying.
  
  Reply
tawek says:

May 12, 2013 at 7:07 pm

I’ve done some more research. Apparently deltablue benchmark had benefited much more from something called “polymorphic inlining” (+20%) rather than from “allocation sinking”. However tracer benchmark saw +26% improvement on “allocation sinking”.

http://www.dartlang.org/performance/

Two interesting points on those graphs:

1. 22412:22434 deltablue +20% performance improvement

r22434 | kmillikin@google.com | 2013-05-06 19:22:18 +0200 (Mon, 06 May 2013) | 13 lines

Initial support for polymorphic inlining.

Review URL: https://codereview.chromium.org//14740005

2. 22480:22488 tracer +26% performance improvement
which correlates to allocation sinking (see above)

So if I understand correctly polymorphic inlining is something that is similar in nature to what can be done with dynamic invoke on jvm, but java compiler does not generate such byte codes (yet).

It would also be interesting to compare java vs dart on tracer benchmark…

Reply
1. nikolay_botev says:
  
  May 16, 2013 at 10:21 pm
  
  @tawek, AFAIK Java does polymorphic inlining already. Invokedynamic is more about exposing that to dynamically-typed languages.
  
  Take a look at these links for details:
  
  http://en.wikipedia.org/wiki/Inline_caching
  
  http://mechanical-sympathy.blogspot.com/2012/04/invoke-interface-optimisations.html
  
  http://stackoverflow.com/questions/7772864/would-java-inline-methods-during-optimization
  
  http://www.azulsystems.com/blog/cliff/2007-11-06-clone-or-not-clone
  
  http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem
  
  Great findings btw, thanks a lot for sharing!
  
  Reply
Michael says:

May 12, 2013 at 7:25 pm

What’s the memory usage? If Dart is 15% faster but also consumes 15% or more memory than Java…

Reply
1. nikolay_botev says:
  
  May 16, 2013 at 10:12 pm
  
  @michael that’s a good question. I have not taken the time to measure memory usage.
  
  Reply
tawek says:

May 17, 2013 at 5:30 pm

I asked headius on polimorphic inlining in jvm on his blog (http://blog.headius.com/2013/05/on-languages-vms-optimization-and-way.html) and that is what he said

“1. Hotspot will inline up to 2 targets. Past that, the site is considered polymorphic and Hotspot will use other mechanisms to optimize the dispatch…but not inlining (I’m not sure if it still inlines the hot two targets)”

That is probably not answer from “the source” however probably it is quite close to it.

This way or the other deltablue was designed to test polimorphic dispatch. It has 6 diffeent classes extending base Constraint class – 2 classes on first level and 2 x 3 classes on second level.

From the benchmark results it seems that on that case dartvm is ahead of jvm.

BTW. http://www.dartlang.org/performance/ shows another +5% performance improvement on deltablue around 22672 however I wasn’t able to find commit causing that.

Seems like Vyacheslav (mraleph) and kmillikin are working full ahead on performance 🙂

PS. Now if only google had aquired sun couple of years back we would already have java 12 (vide chrome progress) with all those performance improvements and more, lambdas and what not already at our fingertips…

PS2. Anybody for that tracer benchmark comparison, chances are there would be another big splash…

Reply
1. nikolay_botev says:
  
  May 17, 2013 at 9:57 pm
  
  @tawek, thanks for the inside stories. I just posted numbers for Richards and Tracer yesterday.
  
  Reply

20 comments

Leave a Reply to Isaac Gouy Cancel reply