If I've got a table column with data and make a catalog about this column, will the index take equivalent disc space because the column itself?

I am interested because I am attempting to understand if b-trees really keep copies of column data in leaf nodes or they in some way indicate it?

Sorry if the a "Will Java replace XML?" kind question.

UPDATE:

produced a table without index having a single GUID column, added 1M rows - 26MB

same table having a primary key (clustered index) - 25MB (less!), index size - 176KB

same table having a unique key (nonclustered index) - 26MB, index size - 27MB

So only nonclustered indexes take just as much space because the data itself.

All dimensions were completed in SQL Server 2005

I am almost sure the correct answer is a DB dependent, but generally – yeah, they take additional space. This occurs due to two reasons:

  1. By doing this you may use the very fact the information in BTREE leafs is sorted

  2. You will get research speed advantage as it's not necessary to seek back and forth to fetch neccessary stuff.

PS just checked our mysql server: for any 20GB table indexes take 10GB of space :)

The B-Tree indicates the row within the table, however the B-Tree itself still takes some space on disk.

Some database, have particular table which embed the primary index and the information. In Oracle, it's known as IOT -- index-organized table.

Each row inside a regular table could be recognized by an interior ID (but it is database specific) which is often used through the B-Tree to recognize the row. In Oracle, it's known as rowid and appears like AAAAECAABAAAAgiAAA :)

If I've got a table column with data and create a catalog about this column, will the index take equivalent disc space because the column itself?

Inside a fundamental B-Tree, you will find the same quantity of node as the amount of item within the column.

Consider 1,2,3,4:

    1 
  / 
2
   \ 3 
      \ 4

The precise space can nonetheless be a little different (the index is most likely a little bigger because it have to store links between nodes, it might not be balanced perfectly, etc.), and that i guess database may use optimisation to compress area of the index. However the order of magnitude between your index and also the column data ought to be the same.

Knowing with this article, it'll, actually, take a minimum of the equivalent space because the data within the column (in PostgreSQL, anyway). The content also would go to suggest an approach to reduce disk and memory usage.

A method to look for yourself is always to use e.g. the derby DB, produce a table having a million rows along with a single column, check it's size, create a catalog around the column and appearance it's size again. For the ten-fifteen minutes to do this, tell us the outcomes. :)