Thursday, January 18, 2007

Hsqldb vs Derby


I'm not much into absurd comparisons, but I find this Derby thing funny in a Marx Brothers sense. A friend just sent me a presentation about Derby at JavaOne 2006 from which you can draw some conclusions.

What makes Java need an embedded database in the JDK is beyond me, and specially picking one backed by the Jakarta group. Back in the day Sun decided that instead of log4j they should roll their own logging framework from scratch; now they must have known the lesson, because instead of including any well-tested database, Derby will do.

What is Derby, anyway? Derby is Cloudscape in a new clothing. It had so many bugs that changing the name could only remove bad past memories. Cloudscape has been included in the J2EE SDK since 1.2.1, so it was always there, consistently ignored.

I used to work with Cloudscape in 2001, but switched to hsqldb because it was faster, reliable and jboss included it (arguments weighted in that order). Since the IBM folks have convinced Sun to include Cloudscape in the JDK, I thought it could be time to switch sides again, or at least give it some thought.

First thing is a feature comparison; both databases are very much the same. Seems that Derby supports named cursors and does not support working in-memory (i.e. no backing media) and viceversa. As a result, for big databases you better be careful as hsqldb is likely to consume more memory (well, if it's big enough you should discard all these embedded rubber toys altogether and go with some serious database server).

Size: ahem. Derby claims that it's lightweight with only 2mb memory footprint, compressible to 600kb using the jar compression voodoo tools included in java 5. Hsqldb is a 640kb jar file that can be shrunk to about 450kb using the ant file to remove administration tools and other extras. The jar compression tools could still be used to get even smaller files.

Size is not everything in an embedded database - but speed is, and Derby does not get a good ranking in the available comparisons, be it third-party or not. A good part of this is consequence of being definitely oriented to do in-memory work, which seems like good design to me for embedded systems.

The slides include an example explaining the urgent need of a java database and why your life has been dull and gray without one of these. So, be known that you can send a mail from a database trigger. Send a mail from a trigger. Layer separation my ass. It's not 99 anymore; triggers still have some place in this world, but this kind expired like five years ago.

Four times bigger jar files, does not support in-memory databases, much slower, and evangelizes that I should put business logic in the database. I don't like making absurd comparisons, so if someone removed those 2mb from my JDK I wouldn't have to.

UPDATE: to add some insult to the offense, an independent study from the Norwegian University has concluded that as the number of processors increases, Derby performance decreases (see points 4.1, 5.1 and 6.1).

4 comments:

  1. Also check out this one. Has even better performance. http://www.h2database.com/html/frame.html

    I haven't tried it though.

    ReplyDelete
  2. H2 is from the same guy who brought us hsqldb and includes better support for BLOB types and no other caveat I know of, so I definitely would recommend switching.

    ReplyDelete
  3. But HSQLDB is not ACID compliant (from http://hsqldb.org/web/hsqlFAQ.html: "HSQLDB support only READ UNCOMMITTED isolation level when queries are made from your Java program."), which makes it pretty useless (except in a one thread, one serial access scenario).

    ReplyDelete
  4. For that purpose you may use h2 instead. It has been forked from the hsqldb implementation, and according to this link includes transaction isolation. We have been using it for years and never experienced any problems with transactions.

    ReplyDelete