You will find very helpful documents explaining the server architectures like Linkedin, Bebo, Amazon . com and etc.

Having seen Bebo, I truly surprised because they are using 500+ database servers for his or her application.

Want to understand how can they maintain SQL transactions, joins, lookups if data spans across multiple database servers?

High Scalability also offers an item on Bebo. It's well worth a read.

I believe the important thing think would be that the databases are federated instead of distributed. So all the details relevant to some given user is in one physical database. That handles the majority of the difficulties with joining, transactionality, etc.

High Scalability does not bring it up, however i presume there has to be some centralised database serving as a registry : the information for user #217873828 is within database Profile42. There's most likely an identical centralisation for reference data, although the majority of that's apt to be in cache instead of read from the database.

There is a great article on High Scalability about ebay. They go towards the extreme using the application doing everything and merely while using dbs for "dumb" storage. The application does joins, referential integrity, etc. It's almost strange to think about things for the reason that manner given what experience just about everyone has and what role databases play within our programs. Apparently it's effective, though. :)

http://highscalability.com/ebay-architecture