Okay so I've been creating a system to date in primary memory which has a variety of objects and every object stores lists of other objects within the system. Now I wish to move this to persistent storage. I am not searching for the apparent answer of utilizing a DBMS because the thing is that I am writing a custom database for my system.

Let's focus on each object I am setting an ID. The ids could be researched inside a table to obtain the block and offset for the position of the data for your object. Now each object has lists/sets that could indicat other objects within the system. So clearly within the storage they'll be lists of 8 byte (using longs for that ids) ids you can use to obtain the other objects. Now my question here's which i be aware of lists is going to be growing with time so that they need room to develop. My best thought to date for storing the lists to ensure that I will not need to maneuver objects once they grow would be to have each list designated an id similar to the objects to ensure that they are able to researched inside a table similar to the objects to locate them around the disk.

Now each list portion may have a collection allotted space to keep 10 objects after which in the finish would be the id from the next list portion whether it consists of more objects. This appears just like a decent method of doing it and to cope with constantly growing objects but I am wondering if you will find much better approaches. I'd keep indexes in memory (space enabling) so given an item id, the research is within memory it would take 1 I/O to locate get it's data and list ids in the disk. then for every list you need to traverse through it might take another research and that iOrTo for each 10 objects within the list or less when the block is cached.

The amount of I/O's isn't terrible and that i would keep locality of list portions to get rid of unnecessary I/Os, but it is possible to better method of carrying this out? Shall We Be Held right to keep lists outside of the item or must i consider techniques of storing all of them with the object's data. My be worried about doing that's that as you list develops it'll encounter another list after which have to be fragmented which could possibly get more difficult. Any suggestions are appreciated and thanks ahead of time.

Your concept of getting these expanding lists is nice. I believe your explanation is missing some particulars (ie: purchased lists or otherwise, exactly what do you mean if you attempt to split up lists from objects, a diagram of those lists may help).

I'd have a sorted index in memory for immediate access. The index might have list id, and placement on disk. If you are thinking about range queries opt for a b - tree approach, otherwise you could utilize a hashmap to keep these indeces.

An additional improvement, if you are doing looking for the lists, is to ensure that they're sorted... or at best semi sorted to ensure that you are able to group similar lists within the same chunk. This could accelerate searching within the lists should you once in awhile cache to memory the limitations of every chunk (nodes with values b/w 1-9, 10-25, etc). Merge sort is most likely the very best sort for lists. As well as, whenever you place nodes within the lists place within the correct location therefore the list is definitely sorted. Take a look track of binary search. If information is not indexed correctly and never sorted, you are likely to disk multiple occasions for queries as well as in this situation any search you utilize provides you with linear time due to disk time.

You may also cache data nodes from the 10% most researched nodes/lists.

With respect to the size these lists (and just how manyc portions you've on their behalf), you could utilize some RAID to get some parallel reads/creates.