* A person uploads a picture and makes its way into some good info about this image
* Information and image get submitted (to any or all servers)
* User will get confirmation that image is submitted
* A large number of servers, distributed around the globe
* Image should finish on disk, since it will likely be offered
* Information should finish up inside a database
* Images are small, no larger than 5mb
We considered various architectural solutions and technologies (git murder, rsync to title a couple of), but we are still not 100% how to overcome this. Current option would be far too slow and we are searching to enhance (we push files to any or all servers from your "upload" server).
Any ideas? Thanks ahead of time
First, let us assume for simplicity the information is written to some file and both files are zipped up together. So below I am likely to assume there's just one file (the zip file). Case a detail (and is actually completely unnecessary for bit-torrent!)
Bit-torrent (or something like that that actually works similarly) is essentially the quickest method of doing this, for big files. The moment a server has downloaded a bit of the file, it'll start attempting to upload it holiday to a servers that require it. You can modify bit-torrent to prefer geographically closer Insolvency practitioners to be able to minimise inter-LAN bandwidth usage.
If you don't have to use bit-torrent, or maybe the files are small therefore it wouldn't seem sensible, simply make one server upload to 2 others, then individuals two others upload to 2 others each, etc. Or you might make use of a fan-out factor in excess of 2. Test out what works well with you.
Have a look at Riak. It provides excellent support for massive distribution and data replication. We have used it effectively for some time now and it is proven very resilient.
I'd model it on Riak using the image and metadata saved individually, having a outcomes of them. Both of them finish up inside a "database" as well as on disk by doing this, with a good way to obtain form together and accessible using a URL.
Note: for replication over WAN you will need the enterprise version, which is not free.