I am creating an application that will need to put at max 32 GB of information into my database. I'm using B-tree indexing since the reads may have range queries (like from < time < 1hr).

At the start (database size = 0GB), I'll get 60 and 70 creates per nanosecond. After say 5GB, the 3 databases I have examined (H2, berkeley DB, Sybase SQL Anywhere) have REALLY slowed down lower to love under 5 creates per nanosecond.

Question: Is typical? Would I still check this out scalability problem basically REMOVED indexing? What causes this issue?

Thanks ahead of time.


Notes: Each record includes a couple of ints

Yes indexing enhances fetch occasions at the expense of place occasions. Your amounts seem reasonable - not understanding more.

You are able to benchmark it. You will need to possess a reasonable quantity of data saved. Consider if you should index based on the queries - heavy fetch and lightweight place? index everywhere a where clause would use it. Light fetch, heavy card inserts? Most likely avoid indexes. Mixed workload benchmark it!

When benchmarking, you would like just as real or realistic data as you possibly can, in volume as well as on data domain (distribution of information, not only all "henry cruz" but various names, for instance).

It's typical for indexes to sacrifice place speed for access speed. You'll find that from a database table (and I have seen these within the wild) that indexes each and every column. There is nothing naturally wrong with this if the amount of updates is small in comparison to the amount of queries.

However, considering that:

1/ You appear to become concerned that the creates decelerate to fiveOrmicrosoft (that's still 5000/second),

2/ You are only writing a couple of integers per record and

3/ You are queries are just according to time queries,

you might want to consider skipping a normal database and moving your personal sort-of-database (my ideas are that you are collecting real-time data for example device blood pressure measurements).

If you are only ever writing sequentially-timed data, you can easily make use of a flat file and periodically write the 'index' information individually (say at the beginning of every minute).

This can greatly accelerate your creates but nonetheless allow a comparatively efficient read process - worst situation is you will need to find the beginning of the appropriate period and perform a scan after that.

This obviously is dependent on my small assumption of the storage being correct:

1/ You are writing records sequentially according to time.

2/ You only have to query promptly ranges.

Yes, indexes will normally slow card inserts lower, while considerably accelerating chooses (queries).

Do bear in mind that does not all card inserts right into a B-tree are equal. It is a tree if whatever you do is place in it, it needs to carry on growing. The information structure enables for many padding, but when you retain placing in it amounts which are growing sequentially, it needs to keep adding new pages and/or shuffle things around to remain balanced. Make certain that the exams are placing amounts which are well distributed (presuming that's the way they will be real existence), and try to do anything whatsoever to inform the B-tree the number of products to anticipate from the start.

Totally accept @Richard-t - it is extremely common in offline/batch situations to get rid of indexes completely before bulk updates to some corpus, simply to re-apply them when update is done.

The kind of indices applied also influence insertion performance - for instance with SQL Server clustered index update I/O can be used for data distribution in addition to index update, while nonclustered indexes are up-to-date in seperate (and for that reason more costly) I/O procedures.

Just like any engineering project - most sage advice would be to measure with real datasets (skews page distribution, tearing etc.)

I believe somewhere within the BDB paperwork they point out that page size influences this behavior in btree's. Presuming you arent doing much when it comes to concurrency and you've got fixed record dimensions, you should attempt growing your page size