Within the tables within the schema I'm focusing on, I have to cope with couple-of 1000 "data-sheets" that are mostly PDF documents, and often graphic-image files like PNG, Digital etc. The schema models a Electronics Distributor's portal, where new items get put into their portfolio frequently.

These documents (data-sheets) are added, during the time of introduction of something new, however they need updates every so often (s.a. because of more recent version from the document, not the merchandise itself), so I'd think the update to become an asynchronous procedure.

With all this, must i keep just the file-title/path from the data-sheets (&lifier similar documents) during my table, using the actual file standing on filesystem, or must i go ahead and take blob approach. I'm almost sure that it ought to be the previous approach, but nonetheless desired to take community advise, and find out if you will find some issues to watchout for.

For completeness, allow me to just point out that some databases give you a "hybrid" of those two approaches, for instance Oracle BFILE or MS SQL Server FILESTREAM.

There's also a fascinating discussion at Request Tom on storing files in Oracle BLOBs (the bottom line is: "BLOBs are superior to files").


BTW, you do not always have to chose one over another... If you really can afford storage overhead and you're simply operating inside a read-mostly atmosphere, you can keep "master" data within the BLOB for integrity but "cache" that same data inside a apply for quick read-only access. Some factors:

  • You'd have to make certain the file is up-to-date/removed if BLOB is up-to-date/removed.
  • Consider creating/upgrading the file on-demand.
  • Consider evicting old files in the "cache" even when corresponding BLOBs remain.
  • Think about using several "caches" (e.g. for those who have a middle tier and it is given to multiple physical machines, each machine might have its very own file cache).
  • And lastly, you'd have to make certain all of this works robustly inside a concurrent atmosphere.

So, this isn't the easiest approach but, based on your requirements, may be considered a good tradeoff between integrity, performance and implementation effort.