There is a healthy debate available between surrogate and natural secrets:

SO Post 1

SO Post 2

My estimation, which appears to become using the majority (it is a slim majority), is you should use surrogate secrets unless of course an all natural secret is completely apparent and guaranteed to not change. Then you definitely should enforce originality around the natural key. Meaning surrogate secrets many of the time.

Example of these two approaches, beginning having a Company table:

1: Surrogate key: Table comes with an ID area the PK (as well as an identity). Company names are needed to become unique by condition, so there is a unique constraint there.

2: Natural key: Table uses CompanyName and Condition because the PK -- satisfies both PK and originality.

Let us state that the organization PK can be used in 10 other tables. My hypothesis, without any amounts to support it, would be that the surrogate key approach could be considerably faster here.

The only real convincing argument I have seen for natural secret is for any many to a lot of table that utilizes the 2 foreign secrets like a natural key. I believe for the reason that situation it seems sensible. But you will get into trouble if you want to refactor that's from scope of the publish I believe.

Has anybody seen articles that compares performance variations on some tables which use surrogate secrets versus. exactly the same group of tables using natural secrets? Searching around on SO and Google has not produced anything useful, just lots of theorycrafting.


Important Update: I have began creating a group of test tables that answer this. It appears such as this:

  • PartNatural - parts table that utilizes the initial PartNumber like a PK
  • PartSurrogate - parts table that uses an ID (int, identity) as PK and includes a unique index around the PartNumber
  • Plant - ID (int, identity) as PK
  • Engineer - ID (int, identity) as PK

All is became a member of to some plant and each demonstration of a component in a plant is became a member of for an engineer. If anybody comes with an problem with this particular testbed, its time.

Natural secrets vary from surrogate secrets in value, not type.

Any kind can be used as a surrogate key, just like a VARCHAR for that system-produced slug or something like that else.

However, most used types for surrogate secrets are INTEGER and RAW(16) (or whatever type your RDBMS does use for GUID's),

Evaluating surrogate integers and natural integers (like SSN) takes exactly same time.

Evaluating VARCHARs make take collation into consideration and they're generally more than integers, that creating them less capable.

Evaluating some two INTEGER is most likely also less capable than evaluating just one INTEGER.

On datatypes small in dimensions this difference is most likely percents of percents of times needed to fetch pages, traverse indexes, acquite database latches etc.

And listed here are the amounts (in MySQL):

CREATE TABLE aint (id INT NOT NULL PRIMARY KEY, value VARCHAR(100));
CREATE TABLE adouble (id1 INT NOT NULL, id2 INT NOT NULL, value VARCHAR(100), PRIMARY KEY (id1, id2));
CREATE TABLE bint (id INT NOT NULL PRIMARY KEY, aid INT NOT NULL);
CREATE TABLE bdouble (id INT NOT NULL PRIMARY KEY, aid1 INT NOT NULL, aid2 INT NOT NULL);

INSERT
INTO    aint
SELECT  id, RPAD('', FLOOR(RAND(20090804) * 100), '*')
FROM    t_source;

INSERT
INTO    bint
SELECT  id, id
FROM    aint;

INSERT
INTO    adouble
SELECT  id, id, value
FROM    aint;

INSERT
INTO    bdouble
SELECT  id, id, id
FROM    aint;

SELECT  SUM(LENGTH(value))
FROM    bint b
JOIN    aint a
ON      a.id = b.aid;

SELECT  SUM(LENGTH(value))
FROM    bdouble b
JOIN    adouble a
ON      (a.id1, a.id2) = (b.aid1, b.aid2);

t_source is simply a dummy table with 1,000,000 rows.

aint and adouble, bint and bdouble contain exactly same data, with the exception that aint comes with an integer like a PRIMARY KEY, while adouble has a set of two identical integers.

On my small machine, both queries run for 14.5 seconds, +/- .1 second

Performance difference, if any, is at the fluctuations range.