I have just began creating the 'schema' of the distributed store database.

I keep getting mental debates how much to denormalize. I learn how to get it done, and why it'll increase performance when the denormalization matches the queries well to reduce gathering data from multiple places...

...But, it's frequently stated that pre-mature optimisation isn't good. The benefits of a design that's relational, with references rather than duplicate data that's embedded are obvious: Elegant, flexible, don’t worry about keeping duplicate data consistent, etc.

And So I am now wondering maybe it's a reasonable technique to design the schema in an exceedingly relational way, while using application layer to collect the information as necessary, and just change this later as needed.

If traffic becomes an problem, I'm already on the technology that may scale flat with a few design changes (isolation, denormalization).

Appears like it may be the best option among:

  • begin with RDBMS, proceed to distributed store as needed
  • begin with distributed store, with full denormalized design (scale-ready)
  • begin with distributed store with relational design, denormalize + isolate as needed

Ideas?

Thanks

Have you thought about scaling your correctly stabilized relational db the old fashioned way? NoSQL has acquired prestige by permitting simple or poorly designed php/light applications to scale by changing the bottleneck with something crude but effective. For those who have a stylish design you do not need NoSQL to scale out.

I believe thaty the final approach sounds very affordable, however i possess some comments. There's no more relational dbms, so I recommend to make use of OO design rather than relational. For instance when we have one-to-many relation, whith "possessing" semantics - we are able to put each side in a single object and store it as being one object. I believe this method can be viewed as perfectly "stabilized" within the NOSQL word.

I'd opt for option 2, presuming that you will find no serious disadvantages towards the scalable schema.

Knowing that you'll most likely want to use a distributed store later, you might as well opt for something resembling the ultimate system from the beginning instead of needing to cope with multiple schema versions. NoSQL designs aren't always more complicated than relational designs, but they're different. Most of the NoSQL platforms are actually mature enough that they're as simple to use for initial development as SQL is.

You may even discover that you don't have to do anything whatsoever too complex to obtain horizontal scaling - many of the joins inside a typical relational database exist since it does not support multiple values or hierarchical structures.