I am a developer with a few limited database understanding, trying to construct a scalable DB design for any new application. Any ideas that anybody could provide about this problem could be appreciated.

Assume I presently possess the following table:

Stuff
------------
ID  Integer
Attr1   Integer
Attr2   Integer
Attr3   Double
Attr4   TinyInt
Attr5   Varchar(250)

Searching forward, assume we'll have 500 million records within this table. However, at any time only 5000 approximately records may have anything within the Attr5 column other records may have an empty or null Attr5 column. The Attr5 column is populated with 100-200 figures whenever a record is placed, a nightly process will obvious the information inside it.

My problem is that this type of large varchar area in the heart of a tablespace that otherwise consists of mostly small number fields will reduce the efficiency of reads from the table. As a result, I had been wandering if it may be easier to alter the DB design to make use of two tables such as this:

Stuff
------------
ID  Integer
Attr1   Integer
Attr2   Integer
Attr3   Double
Attr4   TinyInt

Stuff_Text
------------
StuffID Integer
Attr5   Varchar(250)

Then just remove from Stuff_Text throughout the nightly process keeping it at 5,000 records, thus keeping the Stuff table minimal in dimensions.

So my real question is this: Must you break this table into two, or perhaps is the database engine intelligent enough to keep and access the data effectively? I saw the DB blending the information efficiency and storing records without data in Attr5 as though there is no varchar column. I possibly could also begin to see the DB departing a wide open 250 bytes of information in each and every record anticipating data for Attr5. I am inclined to expect the previous, when i believed that was the objective of varchar over char, but my DB experience is restricted and so i figure I'd better make sure.

I'm using MySQL 5.1, presently on Home windows 2000AS, eventually improving to Home windows Server 2008 family. Database is presently on the standard 7200 revoltions per minute magnetic disc, eventually to become gone to live in an SSD.

If you are using VARCHAR and permitting NULL values, then you definitely should not have problems. Becouse it is efficient storing this type of datatype. This is not the same as CHAR datatype, however, you already has VARCHAR.

Anyway, splitting it into two tables isn't a bad idea. This may be good to help keep the query cache alive, however it mostly is dependent within the begin using these tables have.

Last factor i'm able to say: Attempt to benchmark it. Instert a majority of data and then try to simulate some use.

Stuff

------------

ID     Integer

Attr1  Integer

Attr2  Integer

Attr3  Double

Attr4  TinyInt

Attr5  Integer NOT NULL DEFAULT  (build a catalog about this)

Stuff_Text

------------

Attr5_id   Integer (primary key)

Attr5_text Varchar(250)

For action

desc select * from Stuff WHERE Attr5<>0;

desc select Stuff.*, Stuff_text.Attr5_text 
from Stuff
inner join Stuff_text ON Stuff.Attr5=Stuff_text.Attr5_id;
  1. don't store NULL
  2. take advantage on integer as foreign key
  3. when tugging of record where Attr5 <>0 <-- scan 5,000 rows
  4. much more compact index size
  5. perform a benchmark yourself