I'm evaluating choices for efficient data storage in Java. The information set 's time placed data values having a named primary key. e.g.
Name: A|B|C:D Value: 124 TimeStamp: 01/06/2009 08:24:39,223
Might be a stock cost in a given time, so it's, I guess, a vintage time series data pattern. However, I truly require a generic RDBMS solution that will use any reasonable JDBC compatible database as I must use Hibernate. Consequently, time series extensions to databases like Oracle aren't actually a choice as I'd like the implementor to have the ability to use their very own JDBC/Hibernate capable database.
The task here is just the massive amount of data that may accumulate inside a short time. To date, my implementations are focused around determining periodical rollup and purge agendas where raw information is aggregated into DAY, WEEK, MONTH etc. tables, but however the first lack of granularity and also the slight inconvenience of period mismatches between periods saved in various aggregates.
The task has limited options since there's a complete limit to just how much data could be physically compressed while retaining the initial granularity from the data, which limit is amplified through the directive of utilizing a relational database, along with a generic JDBC capable one at this.
Borrowing a notional concept from classic data compression calculations, and using the truth that many consecutive values for the similar named key can likely to be identical, I'm wondering if there's way I'm able to effortlessly reduce the amount of saved records by conflating repeating values into one logical row whilst storing a counter that signifies, effectively, "the following n records have a similar value". The implementation of exactly that appears not so difficult, however the downside would be that the data model has become hideously complicated to question against using standard SQL, particularly when using any kind of aggregate SQL functions. This considerably cuts down on the effectiveness from the data store since only complex custom code can restore the information to a "decompressed" condition leading to an impedance mismatch with 100s of tools that won't have the ability to render this data correctly.
I considered the potential of determining custom Hibernate types that will essentially "understand" the compressed data set and mess it up support and return query results using the dynamically produced synthetic rows. (The database is going to be read simply to all clients except the tightly controlled input stream). Some of the tools I'd in your mind will integrate with Hibernate/POJOS additionally to raw JDBC (eg. JasperReports) But this doesn't really address the aggregate functions problem and most likely has a lot of other conditions too.
And So I am part method to resigning myself to possibly needing to make use of a more proprietary [possibly non-SQL] data store (any suggestions appreciated) after which concentrate on the possibly less complex task of writing a pseudo JDBC driver to a minimum of ease integration with exterior tools.
I heard mention of the something known as a "bit packed file" like a mechanism to do this data compression, but I don't are conscious of any databases supplying this and also the last factor I wish to do (or can perform, really....) is write my very own database.
Any suggestions or insight ?