I wish to understand what specific problems/solutions/advices/best-practices[don't punish me for that word] are developing while dealing with huge databases.

Under huge I imply databases, that have tables with countless rows and/or databases with petabytes of information.

Platform-oriented solutions is going to be great too.


  • Discover the particulars from the specific database engine, how it operates

  • How you can optimize queries (hints, execution plans)

  • How you can tune the database (not just indexes, but physical storage and representation, OS integration).

  • Query "methods" like temporary tables to keep temporary results that may be used again,

  • How you can evaluate involve denormalization for performance improvement

  • Using profiling tools for that database, to recognize the bottlenecks.

A few bits of advice from the production DBA (my experience is MS SQL, however these should affect other platforms):

  • Maintenance turns into a significant problem (nightly backup copies, DBCCs, weekly reindex/optimisation jobs, etc). Super easy to begin exceeding an acceptable nightly or weekend maintenance window. This is not only a techical problem, it is also a business problem ("exactly what do you mean, it'll take 4 hrs to revive the database in the last good backup?")

  • Designers need to comprehend that they're going to have to work in a different way. "You mean I can not just DELETE (500m rows) FROM MassiveTable and expect results?

I am sure I'll think about more...

My first advice is always to bring in help you never know what they're doing and never depend on SO, otherwise you may be set for some very costly mistakes. My second is always to choose the best platform software and hardware. The particulars will be based greatly on needs.

Recommend you to definitely look at this presentation about SQL Antipatterns http://www.slideshare.internet/billkarwin/sql-antipatterns-strike-back

The presentation can help (yes, it assisted us a lot) take action towards the apparently deadlocked situation.

Any RDBMS can are afflicted by poor performance whether it will get large, particularly when complex join the weather is being used. Database schemas have to be made to scale for big levels of traffic, too. Most systems are very good at handling loads, but you may also encounter issues if you have one database that should be distributed across multiple machines.

Lots of new tools are appearing to cope with database scalability. Probably the most promising is Memcached, which stores lots of data in memory, which enables for considerably faster access and helps in synchronization between multiple database servers. A few of the NoSQL solutions, which augment traditional SQL systems with architectures that don't enforce schemas.

Some good examples of NoSQL technology is Cassandra, CouchDB, Google BigTable, MongoDB. Many people swear these systems will end up crucial in controlling "the approaching data explosion".