I've java application that process such type of data:

class MyData
     Date date;
     double one;
     double two;
     String comment;

All data are saved in csv format on hard disk drive, maximum size such data sequence is ~ 150 megabytes, as well as for this moment I simply load it fully to memory and use it.

Now I've the job to improve maximum data sequence for 100s of gigabyte. guess I have to use DB, but I didn't use them before.

My questions:

  1. Which DB easier to decide for my reasons(you will see only one table with data as abowe) ?
  2. Which library easier to use for connecting Java <-> DB
  3. I suppose you will see used something like cursor?!? if that's the case, can there be any cursor realization with good record caching for immediate access?

Every other tips&tricks about java <-> DB are welcome!

Your real question is pretty unspecific. There is not a better of breed - it is dependent on how much cash you've and what type of hardware.

As your mapping between Java and also the DB is fairly simple, JDBC ought to be enough. JDBC can create a cursor for you personally as necessary lost loop within the rows within the ResultSet. With respect to the database, you may want to configure it to make use of cursors, though.

Because you mention "100s of gb", that rules out the majority of the "simple" databases. For those who have money, try Oracle. Without having money, try MySQL or Postgres.

You may also try JavaDB (also called Derby). But I am unsure the performance is going to be the thing you need.

Observe that every one has their eccentricities and "features", so be prepared to spend a few days to get where you're going together.

Is dependent positioned on what you should do using the data. Must you index it to retrieve specific records, or are you currently stream processing the whole data set to create some statistics (for instance)? Does the database have to be utilized at the same time by multiple clients/processes?

Don't hurry immediately towards SQL/JDBC, relational databases are effective, however they add lots of complexity and therefore are frequently entirely unnecessary for that task at hands.

Again, based on that which you really have to do, something similar to BerkeleyDB may suit you perfectly, or else you might just require a smaller sized binary message format: take a look at Protocol Buffers and Kryo.

If you will need to scale some misconception, take a look at Hadoop/HDFS for distributed processing (but that is getting rather complicated).

Oh, and usually speaking, JavaDB/Derby has a tendency to suck somewhat.

I would suggest JavaDB. I have tried personally it inside a Reason for Purchase system and delay pills work excellent. It's very simple to integrate to your Java Application, and you will integrate it towards the same .jar file if you would like.

Using Java DB in Desktop Applications might be a helpful article. You'll use JDBC for interfacing the database from Java, this causes it to be simple to change to another database if you won't want to use JavaDB.

You will want to evaluate several databases (you will get tests of nearly them if they are not freeOrtotally free already). I'd recommend trying Oracle, Mysql/Postgres along with the size your computer data (and it is insufficient apparent complexity) you might like to think about a datagrid too (gridgain or similar).

Certainly prototype though.

I'd much like to include the "quickest" database isn't always the very best.

You should also consider:

  • reliability,
  • software license cost,
  • simplicity of use,
  • easy administration,
  • accessibility to support,
  • and so forth.