I am attempting to comprehend the performance of database indexes when it comes to Large-O notation. Not understanding much about this, I'd reckon that:

  • Querying on the primary key or unique index provides you with a O(1) research time.
  • Querying on the non-unique index will even provide a O(1) time, although maybe the '1' is reduced compared to the initial index (?)
  • Querying on the column with no index can give a O(N) research time (full table scan).

Is generally correct ? Will querying on the primary key ever give worse performance than O(1) ? My specific problem is for SQLite, but I'd want to consider knowing as to the extent this varies between different databases too.

Most relational databases structure indices as B-trees.

If your table includes a clustering index, the information pages are saved because the leaf nodes from the B-tree. Basically, the clustering index becomes the table.

For tables w/o a clustering index, the information pages on the table are saved inside a heap. Any non-clustered indices are B-trees in which the leaf node from the B-tree identifies a specific page within the heap.

The worst situation height of the B-tree is O(log n), and also, since searching relies upon height, B-tree searches run in something similar to (around the average)

O(logt n)

where t may be the minimization factor ( each node should have a minimum of t-1 secrets and for the most part 2*t* -1 secrets (e.g., 2*t* children).

That's generate an income comprehend it.

And various database systems, obviously, might use different data structures underneath the hood.

And when the query doesn't make use of an index, obviously, then your search is definitely an iteration within the heap or B-tree that contains the information pages.

Searches really are a little cheaper when the index used satisfies the query otherwise, a lookaside to fetch the related datapage in memory is needed.

The indexed queries (unique or otherwise) tend to be more typically O(log n). Very simplistically, you are able to think about it much like a binary search inside a sorted array. More precisely, it is dependent around the index type. But a b --tree search, for instance, continues to be O(log n).

If there's no index, then, yes, it's O(N).

Other solutions provide a good beginning point however i would certainly include that to obtain O(1), primary index itself will have to be hash-based (that is typically not the default choice) so more generally it's logarithmic (B-tree).

You're correct for the reason that secondary indexes routinely have same complexity, but worse actual performance -- this because index and data aren't clustered, therefore the constant (quantity of disk seeks) is larger.

Should you Choose exactly the same posts you look for then

  • Primary or Unqiue is going to be O(log n): it is a b-tree search
  • non-unique index can also be O(log n) + a little: it is a b-tree search
  • no index = O(N)

Should you require information from another "source" (index intersection, bookmark/key research etc) since the index is non-covering, then you may have O(n + log n) or O(log n + log n + log n) due to multiple index hits + intermediate sorting.

If statistics show that you need a higher % of rows (eg not so selective index) then your index might be overlooked and be a scan = O(n)

It is dependent on which your totally.

  • An ailment from the form Column = Value enables using a hash-based index, that has O(1) research time. However, many databases, including SQLite, do not support them.
  • An ailment using relational operators (<, >, <=, >=) can take advantage of the purchased index, typically implemented having a binary tree, that has O(log n) research time.
  • More difficult expressions which cannot make use of an index require O(n) time.

Since you are mainly thinking about SQLite, you might like to read its Query Optimizer Overview which describes in greater detail how indexes are selected.