I am a developer with a few limited database understanding, trying to construct a scalable DB design for any new application. Any ideas that anybody could provide about this problem could be appreciated.
Assume I presently possess the following table:
Stuff ------------ ID Integer Attr1 Integer Attr2 Integer Attr3 Double Attr4 TinyInt Attr5 Varchar(250)
Searching forward, assume we'll have 500 million records within this table. However, at any time only 5000 approximately records may have anything within the Attr5 column other records may have an empty or null Attr5 column. The Attr5 column is populated with 100-200 figures whenever a record is placed, a nightly process will obvious the information inside it.
My problem is that this type of large varchar area in the heart of a tablespace that otherwise consists of mostly small number fields will reduce the efficiency of reads from the table. As a result, I had been wandering if it may be easier to alter the DB design to make use of two tables such as this:
Stuff ------------ ID Integer Attr1 Integer Attr2 Integer Attr3 Double Attr4 TinyInt Stuff_Text ------------ StuffID Integer Attr5 Varchar(250)
Then just remove from Stuff_Text throughout the nightly process keeping it at 5,000 records, thus keeping the Stuff table minimal in dimensions.
So my real question is this: Must you break this table into two, or perhaps is the database engine intelligent enough to keep and access the data effectively? I saw the DB blending the information efficiency and storing records without data in Attr5 as though there is no varchar column. I possibly could also begin to see the DB departing a wide open 250 bytes of information in each and every record anticipating data for Attr5. I am inclined to expect the previous, when i believed that was the objective of varchar over char, but my DB experience is restricted and so i figure I'd better make sure.
I'm using MySQL 5.1, presently on Home windows 2000AS, eventually improving to Home windows Server 2008 family. Database is presently on the standard 7200 revoltions per minute magnetic disc, eventually to become gone to live in an SSD.
If you are using
VARCHAR and permitting
NULL values, then you definitely should not have problems. Becouse it is efficient storing this type of datatype. This is not the same as
CHAR datatype, however, you already has
Anyway, splitting it into two tables isn't a bad idea. This may be good to help keep the query cache alive, however it mostly is dependent within the begin using these tables have.
Last factor i'm able to say: Attempt to benchmark it. Instert a majority of data and then try to simulate some use.
Stuff ------------ ID Integer Attr1 Integer Attr2 Integer Attr3 Double Attr4 TinyInt Attr5 Integer NOT NULL DEFAULT (build a catalog about this) Stuff_Text ------------ Attr5_id Integer (primary key) Attr5_text Varchar(250)
desc select * from Stuff WHERE Attr5<>0; desc select Stuff.*, Stuff_text.Attr5_text from Stuff inner join Stuff_text ON Stuff.Attr5=Stuff_text.Attr5_id;
- don't store NULL
- take advantage on integer as foreign key
- when tugging of record
where Attr5 <>0 <-- scan 5,000 rows
- much more compact index size
- perform a benchmark yourself