I am focusing on a method that will have to store lots of documents (Ebooks, Word files etc.) I am using Solr/Lucene to find revelant information removed from individuals documents however i also require a spot to keep original files to ensure that they may be opened up/downloaded through the customers.

I believed about several options:

  • file system - most likely not too wise decision to keep 1m documents
  • sql database - however i will not need the majority of it's relational features as I have to store just the binary document and it is id which means this is probably not the quickest solution
  • no-sql database - haven't any expierience together so I am unsure if they're worthwhile either, you will find also most of them so I'm not sure which to choose

The storage I am searching for ought to be:

  • fast
  • scallable
  • open-source (not crucial but nice to possess)

Are you able to recommend what's the easiest way of storing individuals files come in your opinion?

A filesystem -- because the title indicates -- was created and optimized to keep large amounts of files within an efficient and scalable way.

For me...

I'd store files compressed onto disk (file system) and employ a database to keep an eye on them.

and posibly use Sqlite if this sounds like its only job.

File System : While taking into consideration the large picture, The DBMS make use of the file system again. And also the File product is devoted to keep the files, so that you can begin to see the optimizations (as LukeH pointed out)

You are able to follow Facebook because it stores lots of files (15 billion photos):

  • They Initially began with NFS share offered by commercial storage home appliances.
  • They gone to live in their onw implementation http file server known as Haystack

This is a facebook note if you wish to find out more http://www.facebook.com/note.php?note_id=76191543919

Concerning the NFS share. Bear in mind that NFS shares usually limits quantity of files in a single folder for performance reasons. (This may be a little counter intuitive should you think that all recent file systems use b-trees to keep their structure.) So if you're using comercial NFS shares like (NetApp) you will probably have to keep files in multiple folders.

It can be done for those who have any type of id for the files. Just divide it Ascii representation directly into categories of couple of figures making folder for every group. For instance we use integers for ids so file with id 1234567891 is saved as storage/0012/3456/7891.

Hope that can help.