I've got a web application that stores lots of user produced files. Presently they are all saved around the server filesystem, that has several disadvantages for me personally.

  • Whenever we move "folders" (as based on our application) we have to maneuver the files on disk (even though this is more because of strange design choices for the initial designers than the usual dependence on storing things around the filesystem).
  • It's difficult to create tests for file system actions I've got a mock filesystem class that logs actions like move, remove etc, without carrying out them, which pretty much does the task, however i do not have 100% confidence within the tests.
  • I'll be adding another jobs which require to gain access to the files using their company plan to perform additional tasks (e.g. indexing in Solr, producing pictures, movie format conversion), so I have to reach the files remotely. Carrying this out over network shares appears dodgy...
  • Coping with permissions around the filesystem as sometimes provided us with problems previously, although since we have gone to live in a pure Linux atmosphere this ought to be a smaller amount of an problem.

So, my primary questions are

  • Do you know the disadvantages of storing files as BLOBs in MySQL?
  • Perform the same problems exist with NoSQL systems like Cassandra?
  • Does anybody have other suggestions that could be appropriate, e.g. MogileFS, etc?

Not really a direct answer however, many pointers to quite interesting and in some way similar questions (yeah, they're about blobs and pictures but this really is IMO comparable).

Do you know the disadvantages of storing files as BLOBs in MySQL?

Perform the same problems exist with NoSQL systems like Cassandra?

PS: I'd rather not function as the killjoy however i don't believe that any NoSQL solution will solve your condition (NoSQL is simply irrelevant for many companies).

perhaps a hybrid solution.

Make use of a database to keep metadata about each file - and employ the file system to really keep file.

any restructuring of 'folders' might be modelled within the DB and dereferenced in the actual OS location.

You are able to store files as much as 2GB easily in Cassandra by splitting them into 1MB posts approximately. This really is pretty common.

You can store it as being one large column too, however you'd need to browse the whole factor into memory when being able to access it.

When the OS or application does not need use of the files, plus there is no real have to keep files around the file system. If you wish to backup the files simultaneously you backup the database, plus there is less help to storing them outdoors the database. Therefore, it may be a legitimate means to fix keep files within the database.

One more bad thing is that processing files within the db has more overhead than processing files in the file system level. However, as lengthy because the advantages over-shadow the disadvantages, also it appears it might inside your situation, you may try it out.

My primary concern could be controlling disk storage. As the database files get large, controlling your whole database will get more difficult. You won't want to re-locate from the fry pan and in to the fire.