I requested an identical question before (integer-vs-char-for-db-record-property) but happened upon something which is the opposite of all recommendations which i received during my previous publish. In Wordpress 3, typically the most popular and mature free blog script, publish status is saved as
VARCHAR(20) in db - 'publish', 'auto-draft', 'inherit', 'pending', etc. and never as
INT with research table or planned string constants, or
CHAR, or anything like this. This is applicable towards the area
post_type ('post', 'attachment', 'revision', etc.) plus some other fields.
To find all released posts I have to run something similar to
SELECT * FROM posts WHERE post_status = 'published' AND post_type = 'post'. Also, there's a multiple column index on publish_status, publish_type plus some other posts which certainly accelerates this type of search. Can someone explain why they managed to get by doing this and never another, and do you know the benefits and disadvantages of the approach?
Simply because some application established fact does not mean they'd a great database design. This would violate the normalization rules. Maybe they improve performance and perhaps they did not consider the other options once they chose that one becasue they didnt know better. Maybe they were aplication developers creating a database without understanding database theory perfectly or could it have been a deliberate denormailzation with performance stats to support it. Or possibly they did not think the likelihood of needing to update 100 million records whenever we made the decision we desired to alter the value from 'published' to another thing. Maybe they merely examined performance on chooses but this is not on updates. Maybe the values genuniely are unlikey to alter, therefore it is not this type of large deal to denormalize. We can not know came from here.
Normalization isn't about changing strings with amounts or "discussing" strings simply because they have a similar letters.
I'm not sure their design, however the following scenario is perfectly stabilized despite the fact that it uses strings as identifiers.
create table post_statuses( status varchar(20) not null ,primary key(status) ); insert into post_statuses values('publish'); insert into post_statuses values('inherit'); insert into post_statuses values('pending'); create table posts( post_id ... status varchar(20) not null ,primary key(post_id) ,foreign key(status) references post_statuses(status) );
The primary advantage of using natural secrets over surrogate secrets is it reduces the amount of joins needed as well as the likelyhood that entire classes of queries could be clarified from index only. The primary disadvantages are elevated storage and the potential of getting a hell if we have to alter the values.