I'm wondering what a great way could be for an additional situation:

I've an Orders table inside a database that clearly consists of all orders. However these are actually ALL orders, so such as the complete/finished ones which are just flagged as 'complete'. All outdoors orders I wish to calculate some stuff (like open amount, open products, etc). An amount be superior performance smart:

Keep 1 Orders table with all of orders, such as the complete/arhcived ones, and do information by blocking the 'complete' flag?

Or must i create another table, e.g. 'Orders_Archive', to ensure that the Orders table would only contain open orders which i use for that information?

Can there be any (obvious) performance difference during these approaches?

(B.T.W. I am on the PostgreSQL db.)

This can be a prevalent problem in database design: The question of whether or not to separate or "archive" records which are no more "active".

The most typical approaches are:

  • My way through one table, mark orders as "complete" as appropriate. Pros: Easiest solution (both code- and structure-smart), good versatility (e.g. simple to "resurrect" orders). Cons: Tables could possibly get quite large, an issue for both queries as well as for e.g. backup copies.
  • Archive old stuff to split up table. Solves the issues in the first approach, at the expense of greater complexity.
  • Use table with value-based partitioning. Which means realistically (towards the application) things are in a single table, but behind the curtain the DBMS puts stuff into separate areas with respect to the value(s) on some column(s). You'd most likely make use of the "complete" column, or even the "order completion date" for that partitioning.

The final approach type of combines the great areas of the very first two, but needs support within the DBMS and it is more complicated to setup.


Tables that only store "aged" data are generally known to as "archive tables". Some DBMS even provide special storage engines of these tables (e.g. MySQL), that are enhanced to permit quick retrieval and good storage efficiency, at the expense of slow changes/card inserts.

Or must i create another table, e.g. 'Orders_Archive', to ensure that the Orders table would only contain open orders which i use for that information?

Yes. They call that data warehousing. Folks do that because it increases the transaction system to get rid of the hardly-used history. First, tables are physically more compact and process faster. Second, a lengthy-running history report does not hinder transactional processing.

Can there be any (obvious) performance difference during these approaches?

Yes. Bonus. You are able to restructure your history to ensure that it's no more in 3NF (for upgrading) however in a Star Schema (for confirming). The benefits are huge.

Buy Kimball's The Information Warehouse Toolkit book to explore star schema design and moving history from active tables into warehouse tables.

As you are using postgresql, you are able to make the most of partial index. Suppose for incomplete order you frequently use orderdate, you are able to specify index such as this:

create index order_orderdate_unfinished_ix on orders ( orderdate )
  where completed is null or completed = 'f';

Whenever you put that condition, postgresql won't index the completed orders, thus saving harddisk space making the index considerably faster since it consists of only little bit of data. Which means you obtain the benefit with no problems of table separation.

Whenever you separate data into ORDERS and ORDERS_ARCHIVE, you'll have to adjust existing reviews. For those who have plenty of reviews, that may be painful.

See full description of partial index within this page: http://www.postgresql.org/docs/9.0/static/indexes-partial.html

EDIT: for archiving, I favor to produce another database with identical schema, then slowly move the old data from transaction db for this archive db.