This is supposed to function as a listing of databases as well as their designs the major internet sites use and will be a great reference for anybody considering scaling their how do people how big Twitter, Facebook as well as Google.

Please keep the solutions low and make certain to cite any sources used.


Also, please bold both web-site title and also the database for simpler checking.

  • Hive (Data warehouse for Hadoop, supports tables along with a variant of SQL known as hiveQL). Employed for "simple summarization jobs, business intelligence and machine learning and several other programs"
  • Cassandra (Multi-dimensional, distributed key-value store). Presently employed for Facebook's private texting.

Presently running 610 (potential 1000) Hadoop nodes in one cluster with Hive datastore. Both Hive and Cassandra happen to be open-acquired by Facebook.

Facebook stats:

  • A lot more than 200 million active customers
  • A lot more than 100 million customers log onto Facebook at least one time every day
  • A lot more than $ 30 million customers update their statuses at least one time every day
  • Average user has 120 buddies on the website


  • Oracle (Relational Database)
  • MySQL (Relational Database)

Databases duplicated on multiple servers for top availability. Each specific Service uses its very own domain-specific DB.

LinkedIn stats:

  • 22 million people
  • 4+ million unique site visitors/month
  • 40 million page sights/day
  • two million searches/day


Stack Overflow - SQL Server.

  • MySQL (Relational Database) for scaling out reads
  • MemcacheDB (Key-Value Store) for scaling out creates

Both data stores are distributed across multiple servers.

Digg stats:

  • 30M customers
  • 26M uniques monthly
  • 2 billion demands per month
  • 13,000 demands another, peak at 27,000 demands another.