When confronted with small projects, what do you experience feeling may be the break even point for storing data in simple text files, hash tables, etc., versus utilizing a real database? For small projects with simple data management needs, a genuine database is unnecessary complexity and violates YAGNI. However, sooner or later the complexness of the database is clearly worthwhile. What exactly are some signs that the issue is too complex for straightforward ad-hoc techniques and requires a genuine database?

Note: To individuals accustomed to enterprise conditions, this can most likely seem just like a strange question. However, my problem domain is bioinformatics. The majority of my programming is prototypes, not production code. I am mainly a website expert and secondarily a programmer. The majority of my code is formula-centric, not data management-centric. The objective of this is basically that i can work out how much work I would save over time basically learn how to use proper databases during my code rather than the greater ad-hoc techniques I typically use.

I believe sooner or later you'll miss the querying abilities of the database, but you can look at some plain and simple database options:

For me personally, the road is entered once I must query my data with techniques which involve greater than a single relationship. Relevant two flat data structures on disk is rather simple, but when we obtain beyond that, a collection-based language like SQL and formal database associations really reduce complexity.

1) Concurrency. Have you got multiple people being able to access exactly the same dataset? Then it is going to get pretty involved to broker all the various visitors and authors inside a scalable fashion should you roll your personal system.

2) Formatting and associations: Is the data something which does not fit nicely right into a table structure? Lengthy nucleotide sequences and things like that? That isn't really easily tabular data.

Another example: Nobody would consider applying software like Illustrator to keep PSDs inside a relational format, since the data structures don't really lend themselves to that particular kind of storage or query pattern.

3) Acidity (kind of a corollary to #1): If Atomicity, Consistency, Integrity, and Sturdiness aren't challenges having a flat file, go having a flat file.

I'd only write my very own on-disk format under special conditions. Re-using another person's code is almost always faster.

For relational data, I'd use SQLite. For key/value pairs, I'd use BerkeleyDB (possibly via KiokuDB). For straightforward objects, I'd use JSON or YAML, but only when I only were built with a couple of.

With SQLite and BDB, "a genuine database" generally is two lines of code away. It's unequalled that.