I am searching to distribute some good info to various machines for efficient and very immediate access with no network overhead. The information is available inside a relational schema, which is essential to "join" on relations between organizations, but it's not really a requirement to create towards the database whatsoever (it will likely be produced offline).
I'd a lot of confidence that SQLite would deliver on performance, but RDMBS appears to become unacceptable in a fundamental level: joins are extremely costly because of price of index searches, as well as in my read-only context, are a pointless overhead, where organizations could store direct references to one another as file offsets. In by doing this, a catalog research is switched for any file seek.
What exactly are my options here? Database does not really appear to explain what I am searching for. I am conscious of Neo4j, however i can't embed Java during my application.
Edit, to reply to your comments ought to:
- The information will depend on 1gb in dimensions, and I am using PHP so keeping the information in memory isn't a choice. I'll depend around the OS buffer cache to prevent constantly likely to disk.
- Example will be a Product table with 15 fields of mix type, along with a query to list out items having a certain make, joining on the Category table.
- The answer must be some type of flat file. I am wondering if there already is available some software that fits me.
The performance issue is measured. Basically, it's unacceptable during my situation to exchange a 2ms IO bound query to Memcache by having an 5ms CPU bound call to SQLite... For instance, the groups table has 500 records, that contains parent and child groups. The next query takes ~8ms, without any disk IO: Choose 1 FROM groups a INNER JOIN groups B on b.id = a.parent_id. Some simpler, join-less queries are extremely fast.
I might not be completely obvious in your goals regarding the kinds of queries you're requiring. However the part about storing file offsets with other data appears like it might be a really brittle solution that's difficult to maintain and debug. There can be some tool that will assist with it, but my suspicion is you would finish up writing the majority of it yourself. If another person needed to arrive later and debug and determine a homegrown extendable, it might be more work.
However, my first thought would be to question when the referred to performance issue is believed at this time or really measured. Perhaps you have run the tests using the data inside a relational format to determine how quickly it really is? It is a fact that the join will more often than not involve more file reads (perform the binary search while you pointed out after which obtain the connected record information after which research that record). This might take four to five or even more disk procedures ... in the beginning. However in the groups table (in the OP), it might finish up cached if it's generally hit. This can be a complete guess on my small part, however in many situations the amount of groups is comparatively small. If that's the situation here, the whole category table and it is index may stay cached in memory through the OS and therefore lead to extremely fast joins.
When the performance is actually a real problem, another possibility may be to denormalize the data. Within the groups example, just duplicate the course value/title and store it with every product record. The database size will grow consequently, however, you could still make use of an embedded database (there are a number of possibilities). If done sensibly, it might be maintained reasonably well and supply a chance to read full object with one research/seek and something read.