Benchmarking Scala against Java
A question recently came up at work about benchmarks between Java and Scala. Maybe you came across my blog post because you too are wanting to know which is faster, Java or Scala. Well I'm sorry to say this, but if that is you, you are asking the wrong question. In this post, I will show you that Scala is faster than Java. After that, I will show you why the question was the wrong question and why my results should be ignored. Then I will explain what question you should have asked.
The benchmark
Today we are going to choose a very simple algorithm to benchmark, the quick sort algorithm. I will provide implementations both in Scala and Java. Then with each I will sort a list of 100000 elements 100 times, and see how long each implementations takes to sort it. So let's start off with Java:
public static void quickSort(int[] array, int left, int right) {
if (right <= left) {
return;
}
int pivot = array[right];
int p = left;
int i = left;
while (i < right) {
if (array[i] < pivot) {
if (p != i) {
int tmp = array[p];
array[p] = array[i];
array[i] = tmp;
}
p += 1;
}
i += 1;
}
array[right] = array[p];
array[p] = pivot;
quickSort(array, left, p - 1);
quickSort(array, p + 1, right);
}
Timing this, sorting a list of 100000 elements 100 times on my 2012 MacBook Pro with Retina Display, it takes 852ms. Now the Scala implementation:
def sortArray(array: Array[Int], left: Int, right: Int) {
if (right <= left) {
return
}
val pivot = array(right)
var p = left
var i = left
while (i < right) {
if (array(i) < pivot) {
if (p != i) {
val tmp = array(p)
array(p) = array(i)
array(i) = tmp
}
p += 1
}
i += 1
}
array(right) = array(p)
array(p) = pivot
sortArray(array, left, p - 1)
sortArray(array, p + 1, right)
}
It looks very similar to the Java implementation, slightly different syntax, but in general, the same. And the time for the same benchmark? 695ms. No benchmark is complete without a graph, so let's see what that looks like visually:
So there you have it. Scala is about 20% faster than Java. QED and all that.
The wrong question
However this is not the full story. No micro benchmark ever is. So let's start off with answering the question of why Scala is faster than Java in this case. Now Scala and Java both run on the JVM. Their source code both compiles to bytecode, and from the JVMs perspective, it doesn't know if one is Scala or one is Java, it's just all bytecode to the JVM. If we look at the bytecode of the compiled Scala and Java code above, we'll notice one key thing, in the Java code, there are two recursive invocations of the quickSort routine, while in Scala, there is only one. Why is this? The Scala compiler supports an optimisation called tail call recursion, where if the last statement in a method is a recursive call, it can get rid of that call and replace it with an iterative solution. So that's why the Scala code is so much quicker than the Java code, it's this tail call recursion optimisation. You can turn this optimisation off when compiling Scala code, when I do that it now takes 827ms, still a little bit faster but not much. I don't know why Scala is still faster without tail call recursion.
This brings me to my next point, apart from a couple of extra niche optimisations like this, Scala and Java both compile to bytecode, and hence have near identical performance characteristics for comparable code. In fact, when writing Scala code, you tend to use a lot of exactly the same libraries between Java and Scala, because to the JVM it's all just bytecode. This is why benchmarking Scala against Java is the wrong question.
But this still isn't the full picture. My implementation of quick sort in Scala was not what we'd call idiomatic Scala code. It's implemented in an imperative fashion, very performance focussed - which it should be, being code that is used for a performance benchmark. But it's not written in a style that a Scala developer would write day to day. Here is an implementation of quick sort that is in that idiomatic Scala style:
def sortList(list: List[Int]): List[Int] = list match {
case Nil => Nil
case head :: tail => sortList(tail.filter(_ < head)) ::: head :: sortList(tail.filter(_ >= head))
}
If you're not familiar with Scala, this code may seem overwhelming at first, but trust me, after a few weeks of learning the language, you would be completely comfortable reading this, and would find it far clearer and easier to maintain than the previous solution. So how does this code perform? Well the answer is terribly, it takes 13951ms, 20 times longer than the other Scala code. Obligatory chart:
So am I saying that when you write Scala in the "normal" way, your codes performance will always be terrible? Well, that's not quite how Scala developers write code all the time, they aren't dumb, they know the performance consequences of their code.
The key thing to remember is that most problems that developers solve are not quick sort, they are not computation heavy problems. A typical web application for example is concerned with moving data around, not doing complex algorithms. The amount of computation that a piece of Java code that a web developer might write to process a web request might take 1 microsecond out of the entire request to run - that is, one millionth of a second. If the equivalent Scala code takes 20 microseconds, that's still only one fifty thousandth of a second. The whole request might take 20 milliseconds to process, including going to the database a few times. Using idiomatic Scala code would therefore increase the response time by 0.1%, which is practically nothing.
So, Scala developers, when they write code, will write it in the idiomatic way. As you can see above, the idiomatic way is clear and concise. It's easy to maintain, much easier than Java. However, when they come across a problem that they know is computationally expensive, they will revert to writing in a style that is more like Java. This way, they have the best of both worlds, with the easy to maintain idiomatic Scala code for the most of their code base, and the well performaning Java like code where the performance matters.
The right question
So what question should you be asking, when comparing Scala to Java in the area of performance? The answer is in Scala's name. Scala was built to be a "Scalable language". As we've already seen, this scalability does not come in micro benchmarks. So where does it come? This is going to be the topic of a future blog post I write, where I will show some closer to real world benchmarks of a Scala web application versus a Java web application, but to give you an idea, the answer comes in how the Scala syntax and libraries provided by the Scala ecosystem is aptly suited for the paradigms of programming that are required to write scalable fault tolerant systems. The exact equivalent bytecode could be implemented in Java, but it would be a monstrous nightmare of impossible to follow anonymous inner classes, with a constant fear of accidentally mutating the wrong shared state, and a good dose of race conditions and memory visibility issues.
To put it more concisely, the question you should be asking is "How will Scala help me when my servers are falling over from unanticipated load?" This is a real world question that I'm sure any IT professional with any sort of real world experience would love an answer to. Stay tuned for my next blog post.
Re: Benchmarking Scala against Java
i'm far from being expert in functional programming, but isn't
def sortList(list: List[Int]): List[Int] = list match {
case Nil => Nil
case head :: tail => sortList(tail.filter(_ < head)) ::: head :: sortList(tail.filter(_ >= head))
}
is going through the same list (doing filter) 2 times instead of 1? Shouldn't you rewrite it in something like
let splitted = splitBy((>head), tail) in (sortList (fst splitted)) ++ [head] ++ (sortList (snd splitted))
To make traversal only 1 time?
Re: Benchmarking Scala against Java
Yes. In fact there are many ways to rewrite it to make it more performant, if you really want it to perform, don't use List, because List is implemented using a linked list, which uses far more memory than an array, the ::: operation runs in n time, and the whole thing jumps all over the memory so plays really poorly with CPU caching. All of this and more is why the idiomatic Scala solutions do not perform well, but my point wasn't to try and create a idiomatic solution that performed well, but rather something that is clear and easy to read.
Obviously in the inner loop of quick sort traversing a list twice is really bad. In a typical web app though for example, if you have a list of things that you loaded from the database, you can traverse them a hundred times over, the amount of time it takes to do that is negligable compared to the amount of time it takes to load them from the database. It doesn't make sense to worry about these sorts of optimisations in those situations because they yield so little help, and are often at the cost of readibility. This is my point. Idiomatic Scala often does not perform well, but often it doesn't need to perform well either.
Re: Benchmarking Scala against Java
"Scalable language" refers to its ability to be used for anything from a small one-off script (where Java would be hopelessly clunky) all the way up to massive multi-team projects where typical scripting languages start to collapse.
It has nothing to do with being distributed, fault tolerant, etc. The language itself doesn't have such concepts built in any more than Java does.
The standard library does have actors built in, but they should not be used for anything but concurrency in the smallest of projects.
Re: Benchmarking Scala against Java
Try to write your code by making it tail recursive, then the @tailrec annotation on the methode. You should have better performance. Martin Odersky has been teaching a functional prrogramming course on coursera, and he's been emphasizing on writing functional code in tail recursive way. And it's not just a scala problem, since tail recursioon is not natively supported on the jvm.
Re: Benchmarking Scala against Java
Hotspot was definitely kicking in, I ran each benchmark in a loop 3 times and only took the time from the third run, by the time it ran the inner loops would have executed several hundred thousand times, the JVM benchmarking guide recommends 10000, so I'm pretty confident that hotspot had kicked in. In addition, I always do a System.gc() immediately before a benchmark, to ensure that each run starts with the same heap state.
Re: Benchmarking Scala against Java
In case you didn't read my blog post correctly, I'd just like to repeat, my point is that these benchmarks should be ignored, and in fact whatever you do to compare Java and Scala from a raw performance perspective, including using the tool you linked to, is irrelavent because it's all just byte code run by the same JVM. I'm not sure what point you're trying to make.
Re: Benchmarking Scala against Java
<i>I'm not sure what point you're trying to make.</i>
You said - "I'm pretty confident that hotspot had kicked in" - and my point was that was just your guess.
<i>because it's all just byte code run by the same JVM</i>
So we'd get the same performance as Scala programs if we used any other language implementation that targets JVM, for example <a href="https://developer.mozilla.org/en-US/docs/Rhino">Rhino</a>?
I don't think so.
Re: Benchmarking Scala against Java
Ok, so I reran the benchmarks with the "-XX:+PrintCompilation" flag, which logs every method that is JITed when it is JITed. In the warm up run, everything was JITed, in the actual run, nothing was JITed, so this is no longer a guess, hotspot had definitely kicked in.
No, Rhino is not just byte code. Nor is jruby or jython. They may compile some of the code to bytecode, but that bytecode needs their associated runtimes still to run, because the JVM was not designed for dynamic languages, method/field lookups must be resolvable at classloading time, whereas method/field lookups in dynamic languages are done at runtime, and so they need a Rhino/jruby/jython runtime to do those lookups. This adds a lot of overhead to method invocations and field lookups.
Scala on the other hand translates directly to byte code, no runtime needed. A class in Scala is just a class in bytecode. A method in Scala is just a method in bytecode. A method invocation in Scala is just a bytecode invoke operation, no need to go into a runtime. A field in Scala is just a field in bytecode, and a field access is just a bytecode set or get operation, no need to go into a runtime (ok, actually it's more often a method invocation for Scala because of automatic encapsulation, but this would usually be inlined by the JVM anyway). After compiling a Scala class, you can use it from Java as if it was Java code, no runtime required. The same can't be said for Rhino, Jython or JRuby, because they aren't just byte code. Scala and Java are just byte code.
Nice example!
Excellent work, James! I refered to this post in another post on Scala performance which responds to an allegation that Scala is 100x slower than Java.
Re: Benchmarking Scala against Java
Just to be clear, sortArray() calls itself twice, and only the second one could be optimized by scalac. Obviously the first one is not in tail position -- there is more work to be done. So @tailrec would cause the compiler to bark about the first one. Still, it helps to have even one of them optimized.
You could make the method fully tail-recursive by pushing onto a list (or, for more efficiency, into arrays) a description of the work to be done on the two sub-arrays and then recursing to process the next one.
Also, scalac can only do tail-recursion optimization on a method if it is final or private, because only then can scalac be assured that the method is not overridden in a subclass. (If the method were overridden, the call should invoke the subclass's method.)
Re: Benchmarking Scala against Java
Hi James, I think there might be something wrong with your benchmarking method. On my mid-2011 MacBook Air with Scala 2.9.2, for 100000x100 I get very different microbenchmark results and a conclusion that is the opposite of yours. Namely, mine shows that the functional immutable style was faster than your procedural in-place style.
- 1. Java in-place style: 470 ns + array generation
- 2. Scala in-place style 460 ns + array generation
- 3. Scala functional style: 295 ns
- 4. My Scala functional style: 295 ns
- 5. Scala.List.sortWith(_<_): 3675 ns
- 6. java.util.Arrays.sort(_): 750 ns including array generation
I chose to sort a reversed list of integers (100000, 99999, 99998, etc) 100 times.
Key points being:
- The Scala functional style (#3) is over 35% faster than the Java in-place style (#1) NOT 20 times slower.
- I didn’t seem to get "tail recursion" optimisation with the in-place style so my Scala result (#2) is the same as Java (#1)
- My functional style (#4) is slightly different from yours (#3), but both perform equivalently after compiler optimisation (good compiler).
- In this specific case, the Scala functional styles were 2.5 times faster than java.util.Arrays.sort (#6) and over 12 times faster than the Scala.List.sortWith API (#5).
My Scala functional implementation was:
def qsort[Type <% Ordered[Type]](unsorted: List[Type]): List[Type] = data match {
case Nil => Nil
case pivot :: todo =>
val (low, high) = todo partition (_<=pivot)
qsort(low) ::: pivot :: qsort(high)
}
Re: Benchmarking Scala against Java
I don't know scale very well, but it sems that when you declare the pivot in java you didn't declared it as final and in the scala code you declared as val, that is the same. Since the JVM will not observe this object, because it's final in scala it, teorically have a better performance.
Please, do the test for us and let us know :)
Hi! My name is James Roper, and I am a software developer with a particular interest in open source development and trying new things. I program in Scala, Java, PHP, Python and Javascript, and I work for 

