The organization Sometimes for is attempting to change an item which uses flat extendable to some database format. We are handling pretty large files of information (ie: 25GB/file) plus they get up-to-date really quick. We have to run queries that at random access the information, too as with a contiguous way. I'm attempting to convince them of the benefits of utilizing a database, however, many of my co-workers appear unwilling to this. So I'm wondering if everyone can assist me here with a few reasons or links to posts of why we ought to use databases, or at best clarify why flat files are better (if they're).

  1. Databases are designed for querying tasks, so it's not necessary to walk over file by hand. Databases can handle very complicated queries.
  2. Databases are designed for indexing tasks, therefore if tasks like get record with id = x can be quite fast
  3. Databases are designed for multiprocess/multithread access.
  4. Databases are designed for access from network
  5. Databases can view for data integrity
  6. Databases can update data easily (see 1) )
  7. Databases are reliable
  8. Databases are designed for transactions and concurrent access
  9. Databases + ORMs allow you to manipulate data in very programmer friendly.

This really is a solution I have already given a while ago:

It is dependent positioned on the domain-specific application needs. A large amount of occasions direct text file/binary files access could be very fast, efficient, in addition to supplying you all of the file access abilities of your OS's file system.

In addition, your programming language probably already includes a built-in module (or perhaps is simple to make one) for specific parsing.

If the thing you need is many appends (Card inserts?) and consecutive/couple of access little/no concurrency, files would be the approach to take.

However, whenever your needs for concurrency, non-consecutive reading through/writing, atomicity, atomic permissions, your information is relational through the character etc., you'll be best having a relational or OO database.

There's a great deal that may be accomplished with SQLite3, which is very light (under 300kb), Acidity compliant, designed in C/C++, and highly ubiquitous (whether it is not already incorporated inside your programming language -for instance Python-, there's surely one available). It may be helpful even on db files as large as 1GB, possible more.

In case your needs where bigger, there wouldn't be also attorney at law, get a full-blown RDBMS.

While you say inside a comment that "the machineInch is basically a lot of scripts, then you definitely should have a look at pgbash.

Don't construct it if you're able to purchase it.

I heard this quote lately, also it really appears fitting like a guide line. Request yourself this... The length of time was spent focusing on the file handling part of your application? I suspect a reasonable period of time was spent optimizing this code for performance. Should you have had used a relational database all along, you'd have spent substantially a shorter period handling this part of the application. You'd have experienced additional time for that true "business" facet of your application.

Databases completely.

However, should you still need to have storing files, don't be capable to undertake a brand new RDBMS (like Oracle, SQLServer, etc), than consider XML.

XML is really a structure extendable that provides you a chance to store things like a file but provide you with query energy within the file and data there. XML Files are simpler to see than flat files and can be simply changed using an XSLT for better human-readability. XML can also be a terrific way to transport data around should you must.

I highly recommend a DB, but when you cannot go down that path, XML is definitely an ok second.

Why not a non-relational (NoSQL) database for example Amazon's SimpleDB, Tokio Cabinet, etc? I have heard that Google, Facebook, LinkedIn are utilizing those to store their huge datasets.

Do you know us in case your information is structured, in case your schema is bound, if you want easy replicability, if access occasions are essential, etc?

SQL random query capabilities are a good enough reason behind me. With a decent schema and indexing around the tables, this really is fast and effective and can have good performance.