We now have developed PaaS solution for PHP. Included in that people offer designers to determine Apache error_log and access_log files through our API.

Presently we write the logs into files on disk seperated per deployment (vhost).

Because this does not scale too well having a greater quantity of nodes and deployments, despite the fact that files take presctiption distributed filesystem (GlusterFS), we wish to change to something better.

Specifically for billing and record reasons we'd prefer to not parse log files each time.

As MongoDBs copped collections look awesome for logging we took it with this. But works out they do not appear to utilize auto sharding which type of spoils the purpose for all of us since we expect a lot more creates then reads.

Another option was Cassandra that we like for it's every node is equal approach, however they do not have something similar to assigned collections.

Works out neither of these two solutions provides a distinct feature that can help me come to a decision, or I do not view it.

What exactly I'd need to know is has anybody used among the two systems for logging before? What exactly are your encounters, are you able to produce some suggestions? Or exist other solutions that suit our needs better?

Works out neither of these two solutions provides a distinct feature that can help me come to a decision, or I do not view it.

Honestly, we are dealing with this test at this time with a few serious log data. (by at this time, I am talking about, a couple of people were up late last evening running these tests).

In my experience, listed here are the 2 distinguishing feature: simplicity of use and proven scaling.

Simplicity of use

  • MongoDB was easy. In a few hrs I went from blank computer for an active Mongo instance with imported data from MySQL along with a couple of completed map-reduces.
  • Within the same time period, team Cassandra sitting around re-producing Java files looking to get the Hadoop set up to operate over a current Cassandra implementation to ensure that they might even run map-reduces.

Proven Scaling

  • MongoDB sharding continues to be in beta. It's slated for launch within the next couple of days. That's pretty tight.
  • Cassandra sharding is proven on some large instances.

And So I think the reply is really likely to be specific for your preferences. I honestly believe that Cassandra might be a far more stable &lifier proven product, however i also know from experience the learning and setup curve is steeper. So it may be really worth trying some both.

You should check out this short article from Cloudkick if you're thinking about using Cassandra: 4 Several weeks with Cassandra, an appreciation story.

They're using Cassandra to keep different metrics for his or her system, that is somewhat much like storing log files.


Should you have not yet made the decision things to use, here is a great solution using MongoDB like a after sales:

Graylog2 is definitely an free syslog implementation that stores your logs in MongoDB. It includes a server designed in Java that accepts your syslog messages via TCP or UDP and stores it within the database. The 2nd part is really a Ruby on Rails web interface that enables you to definitely see the log messages.