I am presently developing the building blocks of the a credit card applicatoin, and searching for methods to optimize performance. My setup is dependant on the CakePHP framework, however i believe my real question is highly relevant to any technology stack, as it requires data caching.

Let us have a typical publish-author relation, that is symbolized by 2 tables during my db. After I query the database for any specific blog publish, simultaneously the built-in ORM functionality in CakePHP also brings the writer from the publish, comments around the publish, etc. All this is came back as you large-ass nested array, that we store in cache utilizing a unique identifier for that concerned blog publish.

When upgrading your blog publish, it's child play to eliminate the cache for that publish, and also have it regenerated using the next request.

But what goes on if not the primary entity (within this situation your blog publish) will get up-to-date, but instead a few of the related data? For instance, a comment might be erased, or even the author could update his avatar. What are the approaches (designs) that we could consider for monitoring updates to related data, and using updates to my cache accordingly?

I am curious to listen to whether you've also encounter similar challenges, and just how you've handled to potentially overcome the hurdle. You can offer an abstract perspective, if you are using another stack in your finish. Your sights are anyhow much appreciated, thank you!

It is relatively simple, cache records could be

  • added
  • destroyed

You need to take proper care of wrecking cache records when related data change (so in application layer additionally to upgrading the information you need to destroy certain kinds of cached records whenever you update certain tables you keep an eye on dependencies by hard-coding it).

If you want to be wise about this you might have your cache object condition their dependencies and cache the final update occasions for the DB tables too.

Then you may

  • fetch cached data, examine dependencies,
  • get update occasions for relevant DB tables and
  • just in case the record is stale (update duration of a table that the large ass cache entry is dependent on is later then your duration of the cache entry) drop it and obtain fresh data in the database.

You can even integrate the above mentioned to your persistence layer.

Obviously the above mentioned is perfect for when you wish to possess consistent cache. Sometimes, as well as for some data, you are able to relax the consistency needs and you will find situations where simple TTL is going to be adequate (for any trivial example, for those who have ttl of just one sec, you need to mostly be from challenge with customers and may help information systems with greater occasions you'll probably still be ok - for instance let us say you're caching their email list of country ISO codes the application might be perfectly ok should you say let us cache this for 86400 sec).

In addition, you might track the occasions of knowledge given to user, for instance

  • let us say user has seen data A from cache which we all know this data was produced/modified sometimes t1
  • user makes changes towards the data A (and causes it to be data B) and commits the modification
  • the applying layer may then examine when the data A continues to be as with DB (when the cached data where the consumer made choices and/or changes was indeed fresh)
  • whether it was not fresh then there's a conflict and user should read the changes

It has an expense of additional read of information A from DB, however it happens only on creates. Also, the conflict can happen not just due to the cache, but additionally due to multiple customers attempting to alter the data (i.e. it relates to securing methods).

One Method for memcached is by using tags ( http://code.google.com/p/memcached-tag/ ). For Instance, you've your Publish "large-ass nested array" allows say, it inclused the autors information, the publish itself and it is proven around the frontpage as well as in some box within the sidebar. Therefore it will get the tags: frontpage, , sidebar, publish-id - if someone changes the writer Information you flush every cache entry using the tag . But thats just one Solution, and just for Cache Backends that support Tags, for instance not APC (afaik). Hope That gave an example.