Typically, the databases are made as below to permit multiple types to have an entity.

Entity Title Type Information

Entity title is something like account type and number might be like savings,current etc inside a bank database for instance.

Mostly, type is going to be some type of string. There might be more information connected by having an entity type.

Normally queries is going to be posed such as this. Find account amounts of the particular type? Find account amounts of type X, getting balance more than a million?

To reply to these queries, query analyzer will scan the index when the index is connected having a particular column. Otherwise, it is going to do a complete scan of all of the rows.

I'm taking into consideration the below optimisation. Why don't you we keep hash or integral worth of each column data in the table so that the ordering rentals are maintained, to ensure that it will likely be simple for comparison.

It's below advantages. 1. Table size is going to be lot less because we are storing small size values for every column data. 2. We are able to create a clustered B+ tree index around the hash values for every column to retrieve the related rows matching or greater or more compact than some value. 3. The related values can be simply retrieved by getting B+ tree index within the primary memory and locating the related values. 4. Infrequent values won't ever have to retrieved.

I'm still getting more optimizations i believe. I'll publish individuals in line with the feedback for this question.

I don't know if this sounds like already implemented in database, case a concept.

Appreciate reading through this.

-- Bala


I'm not attempting to emulate exactly what the database does. Normally indexes are produced through the database administrator. I'm attempting to propose an actual schema by getting indexes on all of the fields within the database, to ensure that database table dimensions are reduced and it is simple to answer couple of queries.

Updates:(Joe's answer)

So how exactly does adding indexes to each area reduce how big the database? You've still got to keep all the true values additionally towards the hash we do not would like to query for existence but wish to return the particular data.

Inside a typical table, all of the physical data is going to be saved. However by producing a hash value on each column data, I'm only storing the hash value in the table. To be sure that it is not reducing how big the database, nevertheless its reducing how big the table. It will likely be helpful when you don't have to return all of the column values.

Most RDBMSes answer most queries effectively now (particularly with key indices in position). I am getting a difficult time creating situations where your database could be more effective and save space.

There might be just one clustered index on the table and all sorts of other indexes need to unclustered indexes. With my approach I'll be getting clustered index on all of the values from the database. It'll improve query performance.

Putting indexes inside the physical data -- that does not really seem sensible. The important thing to indexes' performance is the fact that each index is saved in sorted order. How can you propose doing that across any possible area if they're only saved once within their physical layout? Ultimately, the particular rows need to be sorted by something (in SQL Server, for instance, this is actually the clustered index)?

The fundamental idea is the fact that rather than developing a separate table for every column for efficient access, we're doing the work in the physical level.

The table may be like this.

Row1 - OrderedHash(Column1),OrderedHash(Column2),OrderedHash(Column3)

Google for "hash index". For instance, in SQL Server this kind of index is produced and queried while using CHECKSUM function.

This really is mainly helpful when you really need to index a column which consists of lengthy values, e.g. varchars that are normally a lot more than 100 figures or something like that like this.