Poor Groovy Performance

Stefan - 14. April 2007

I’m always looking for computer languages which can help me do my work more effectively. While I’m a big fan of dynamic languages and especially Perl, Java is still on my list of interesting languages. One reason is the simple deployment Java offers. Or, to be honest, people are used to the way Java applications are deployed.

Therefore I’ve been following the recent improvements of the JVM for dynamic languages with great interest. Especially Groovy seems very appealing to me: A true dynamic language with a REPL-like (read-evaluate-print-loop) console, extremely easy integration of the various Java libraries and the possibilty to compile your Groovy programs to “real” Java classes.

GroovyConsole: A powerful console / shell / REPL

Being a Perl programmer for a long time, I know that the kind of performance a typical micro-benchmark is measuring isn’t that important. Nonetheless I’ve been surprised that there wasn’t any benchmark on the computer language shootout site.

So, I decided to do my own little benchmark. Calculating the sum of integers saved as a simple textfile (called “sum-file” on the shootout site) is the kind of task which comes very close to my real-life work. I took the code for several languages from the shootout site and wrote the code for the Groovy benchmark myself.

Groovy performance chart

Of course, these results were a big disappointment. According to the shootout site, Scala and Nice, two languages also created for the JVM, come very close to the performance of Java. So I’ve been hoping that Groovy could be as fast as those languages. But as you can see Groovy, compiled to a Java class, was about ten times slower than native Java!

And there’s another thing to look at: On my MacBook (Core 2 Duo, 2 GHz) Perl outperformed Java by some 25 percents. According to the shootout site, Java should be about twice as fast as Perl. In fact, Perl was the fastest language in this micro-benchmark, even faster than Lua, a dynamic language well known in game programming for its performance.

Groovy wasn’t the slowest language in my little contest, by the way. There is one computer language missing from the chart above: Squeak Smalltalk. Squeak took about 30 seconds to calculate the sum of the integers. This time was so long, it would have distorted the chart.

For all the Groovy geeks out there, this is the code I used for my benchmark:
def fileName = "/Users/sf/Documents/source/groovy/benchmark/zahlen.txt"
def sum = 0
new File(fileName).eachLine {
    line -> sum += Integer.valueOf(line)
}
println "Summe: " + sum

I even tried to use static types for line and sum and I wrote one version using .eachline and a closure, but the performance did not change.

What can we learn from this little experiment? Groovy is a nice dynamic language but very slow, while Perl can still compete with Java even if it comes to runtime performance.

Abgelegt in: Perl

20 Kommentare:

Well, I hope that Groovy performance has improved since I hope to use it in a client side project.

As for Squeak, it hardly seems fair to slam it as you did without showing us the code you used to perform the benchmark.

Well, I did not mean to be unfair. In fact I’m very interested in Squeak and thus included it in my benchmarks. Because my article focused on Groovy, I did not show you the Squeak code. So, here it is:

| file line sum |
file := CrLfFileStream fileNamed: 'numbers.txt'.
sum := 0.
[file atEnd] whileFalse:
[ line := file nextLine.
sum := sum + line.
line.
].
Transcript show: sum; cr.

Of course I’m happy to learn how to speed up this little chunk of code.

“Groovy… Nonetheless I’ve been surprised that there wasn’t any benchmark on the computer language shootout site.”

There used to be – I just got fed up of dealing with Groovy. You’ll find old Groovy programs in CVS –

http://alioth.debian.org/plugins/scmcvs/cvsweb.php/shootout/bench/sumcol/?cvsroot=shootout

We do show some Squeak programs –

http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=all&lang=squeak&lang2=perl

I just repeated your Groovy test and got radically better numbers than you report. The sumcol-input.txt file I found on the shootout website only had 1000 entries in it. When I ran your groovy program, unaltered, on this dataset it only took 1.25s. On the shootout site they say N=21,000 numbers, but I couldn’t find any file of that size to download, so I generated a random file of 21,000 numbers (half positive, half negative). Groovy time on that set of 21,000 was 1.35 seconds.

It seems obvious that most of this must be startup time since it varies so little over this range, so I tried a whole series of input file sizes, with these results:

// 100 1.149 s
// 1000 1.205 s
// 2000 1.204 s
// 4000 1.205 s
// 8000 1.255 s
// 10000 1.250 s
// 100000 1.903 s
// 500000 4.826 s
// 1000000 8.169 s

Startup time dominates until you get to around 100,000 numbers in the input. This is not surprising since Groovy compiles to java classes in the background each time you run it. As you can see, in my tests I had to get up to 500,000 numbers in my file before my running time got into the 5 second range you report.

All of my tests were on a 1.83 GHz Core Duo Macbook.
Groovy Version: 1.1-BETA-1 JVM: 1.5.0_07-87

I wonder if you were using an older version of Groovy?

Oh, I should have stated somewhere, that the input file I used consisted of 1,000,000 lines. So the results you got on your 1.83 GHz Macbook seem to confirm my little benchmark, which was run on an 2 GHz Macbook.

Groovy version: Version: 1.0 JVM: 1.5.0_07-87

Obviously startup time is a factor, I tested the script against 1200000 records (ie 200k more than your test) and if you change the code to use Groovy’s for loop which is more efficient it completed in 5.124 seconds:

def fileName = ‘sumcol-input.txt’
def sum = 0
def file = new File(fileName)
for(line in file) {
sum += line.toInteger()
}
println “Summe: ” + sum

I’m also on a MacBook 1.83ghz. Nevertheless, we are always trying to improve Groovy performance, but likely if you do have a bottleneck you can write the relevant code in Java and not have to fallback to C (fun!)

Graeme, thank you very much. Your code is indeed an improvement. Using the same file as from the the original benchmarks, it now takes about 3.5 seconds.

Yup, groovy is slow. But the 1.0 release is less than half a year old. I asked on the mailing list and they assured me that they haven’t even begun to optimize it.

But please note that Groovy uses the class library as Java. So the librarys, including collections, are already optimized.

Just give it some time– it will get better.

Glen
http://glenp.net

I try groovy for a simple regexp-foreachline and the script is very very slow!

for(line in file)
{
if (line ==~ regExp)
{
matcher = ( line =~ regExp );
if (matcher.matches())
{
println “XXX ” + (matcher[0]);
final String key = matcher[0][1];
final String line2 = matcher.replaceAll(“>

First, if you are doing a complex calculation and actually care if it takes 2 seconds or 8 seconds, you should use plain old Java. Scripting languages are not intended for complicated numerical calculations or for operations that require the utmost speed.

Second, there is a startup time associated with Java, and it doesn’t end after 1.25 seconds. Java will dynamically recompile heavily used sections of code over time. So any measurement of Java would need to take this into consideration, perhaps by using the unpublished -Xcomp flag (which optimally compiles all code that it loads), and by running the test multiple times within the same VM. Running a one-time test on Java will result in bad profiling information, no matter how big the data set is.

Third, the reason for having script languages is to make the programmer’s job easier. How many programmer’s can write the test scripts used in this benchmark in 8 seconds? Okay, now what is the difference in time it takes to write the test script in Java, build an ANT script to build the class files, build a *.BAT file to pull together all the CLASSPATH information…etc., and the time it takes to run a couple of lines of script code from an interactive console???

Minutes in JavaScript, maybe. Minutes in Groovy, maybe. Half an hour in Java?

And if you want to use your existing custom Java libraries, I’d bet that I could get it running faster in Groovy than just about anything else. Groovy uses standard Java types natively. Javascript can use many Java object types, but there is a painful mismatch.

Groovy will never be as fast as Java. Java will never be as dynamic as Groovy. They will always play nicely together. You can’t say that about any other scripting language that I know of.

Jason,
I’m not disappointed that Groovy is slower than Java. This is the price we pay for dynamic languages, because it makes live easier for us programmers. I’m disappointed that Groovy is that much slower than Perl, Ruby and even PHP – all of them also dynamic languages with a start-up penalty.
I did this benchmarks prior to a decision to what technology to use for a new project. This project involves a lot of processing in the back-end, so runtime performance is important. And we wanted to use the same language for the front-end, too, for obvious reasons. Of course we could have used Groovy for the front-end and Java for the heavy back-end work, but we wanted a dynamic(!) language for the back-end.

This is not the price we pay for dynamic languages.
It is noted in the benchmark that Perl was faster than Java. Lisp code can be compiled to machine language (you can have a mash of interpreted and compiled functions), and good Lisp code will in general be less than 2x more expensive than C (ike Java)
In conclusion, Groovy’s performance (or lack thereof) cannot be attributed to an intrinsic feature of dynamic languages.

Well, I just stumbled over a groovy book and thought: Hey, groovy has all the features I was missing in ruby and the tight java integration wouldn’t be a disadvantage too. So, I ran a stupid test — the typical prime calculation thingy. The test itself is pointless. But groovy took 12 to 15 seconds to startup, which is about the time ruby needs to finish the test. I think one of the problems lies in the startup script that scans all sorts of directories which is kind of slow with cygwin.

If you want to see a very different perspective on performance this article: http://1060.org/upload/fibonacci.html examines the performance of an address space based computational environment compared to a conventional algorithmic approach. Effectively what it shows is an environment in which each computed result is located at an address (a URI address space) and each result is cached using the URI as the cache key. The net result for working systems is that computations can be done with a scripting language (JavaScript in this paper) and the whole system performs very well because redundant computations can be avoided. The system is called NetKernel which supports Groovy, JavaScript, Ruby, Python, Java and other languages.

Test 1) I added time1 and time2 and println time2-time1 lines to your test code. In groovy 1.5 command line your sum test takes 31 or *62* or 78 ms and in groovyshell 16 ms to execute.

Test 2) Removing time1 and time2 code, it is taking about 30 ms by hand time measuring, by a clock, when executed from groovy command line.

(My machine has 2 T7200 2 MHz processor with jdk 1.6)

Thomas, how big was the file you used to calculate the sums? Wihout knowing how many times the loop had to iterate, your times don’t give us much information.

I know this is an old post, but it still shows up on Google, so I thought I’d post.

I’m considering switching a project from Rails to Grails. After running across this post, I decided to test it out for myself.

I used a sum.txt file of 1,000,000 lines (the same contents listed on the site, repeated 1,000 times). To minimize the effects of startup time, I put a for loop counting 1..10 around the whole block of code. The code I used was exactly what was posted here and on the shootout site, with the text file being read each iteration of the loop.

All I’ve got running right now is a windows box (XP SP2, E6600 core2), so I’m not sure how to exactly measure CPU time on it. I just watched the “CPU Time” field in the task manager, and the results were:

Ruby 1.8.6: 27s
Ruby 1.9.0: 27s
Java 1.6.0_5: 1m 6s
JRuby 1.0.3: 43s
JRuby 1.1 RC2: java.lang.OutOfMemoryError: Java heap space
Groovy: 15s

Java definitely ran the fastest once the file was loaded, but something about file IO or the StreamTokenizer class with FileInputStream really slowed it down.

Loading the file into an array of 1,000,000 strings once for all 10 iterations produced:
Ruby 1.8.6: 9s (179mb)
Ruby 1.9.0: 9s (179mb)
Java 1.6.0_5: 8s (67mb)
JRuby 1.0.3: 12s (190mb)
JRuby 1.1 RC2: 12s (190mb)
Groovy: 13s (77mb)
(ram usage in parenthesis)

For the hell of it, I used Groovy code to load the file and had it pass off an ArrayList to Java to sum up, and it finished in only 2.5 seconds. I’d imagine more optimized Java would be about as quick.

The difference in RAM usage seems really significant. I wonder how much RAM is used in equal size Grails vs. Rails apps?

[…] “little” problem with Groovy – low performance You can find tests and quick comparison here. Why it is so ? explanation you can find in headius (Charles Nutter) blog in this article. […]

I benchmarked compiled Groovy in an enterprise Java web application to be used as JSF backing beans and it is more than 100 times slower!!!
See, so very slower than expected…

groovy rocks

Schreibe einen Kommentar
benötigt
benötigt (wird nicht angezeigt)
optional

Suchen