I have been coding php for some time now and also have a pretty firm grip onto it, MySQL, well, allows just say I makes it work.
Let me create a stats script to trace the stats of other websites like the apparent statcounter, google statistics, mint, etc.
I, obviously, want to code this correctly and that i aren't seeing MySQL liking 20,000,000 to 80,000,000 card inserts ( 925 card inserts per second "roughly**" ) daily.
I have been doing a bit of research also it appears like I ought to store each visit, "entry", right into a csv as well as other type of flat file after which import the information I want from this.
Shall We Be Held on course here? I simply require a push within the right direction, the direction being a method to inhale 1,000 psuedo "MySQL" card inserts per second and the right way to do it.
Example Place: IP, time(), http_referer, etc.
I have to collect this data during the day, after which in the finish during the day, or perhaps in certain times, update ONE row within the database with, for instance, the number of extra unique hits we've got. I understand how to achieve that obviously, just attempting to provide a visualization since I am horrible at explaining things.
If anybody might help me, I am a great coder, I'd be willing to come back the favor.
We handled this in the place I have been working the this past year so over summer time. We did not require much granularity within the information, what exactly labored perfectly for all of us was coalescing data by different cycles. For instance, we'd possess a single day's price of real-time stats, next it would be pressed into some daily sums, after which off right into a monthly table.
This clearly has some huge disadvantages, namely a loss of revenue of granularity. We considered lot of different approaches at that time. For instance, while you stated, CSV or some similar format may potentially serve in an effort to handle per month of information at any given time. The large issue is card inserts however.
Begin by aiming some sample schema when it comes to Precisely what important information to help keep, and by doing this, you'll guide yourself (through revisions) as to the is useful for you.
Another note for that huge quantity of card inserts: we'd potentially spoken through the thought of dumping realtime statistics right into a little daemon which may actually store as much as an hrs price of data, then non-realtime, inject that in to the database prior to the next hour was up. Only a thought.
For that type of activity you are searching at, you have to consider the problem from the new perspective: decoupling. That's, you have to learn how to decouple the information-recording steps to ensure that delays and problems don't propogate support the road.
You will find the right idea in logging hits to some database table, insofar as that guarantees in-order, non-contended access. This really is something the database provides. Regrettably, it comes down in a cost, such as the database completes the
INSERT before returning to you. Thus it from the hit is combined using the invocation from the hit. Any delay in recording the hit will slow the invocation.
MySQL offers a method to decouple it's known as
INSERT DELAYED. Essentially, you know the database "place this row, however i can't hang in there while you're doing soInch and also the database states "okay, I acquired your row, I'll place it when I've got a minute". It's imaginable this reduces securing issues since it allows one thread in MySQL perform the place, not whichever you connect with. Unfortuantely, it is only for MyISAM tables.
Another solution, the industry more general means to fix the issue, is to possess a logging daemon that accepts your logging information and merely en-queues it to wherever it needs to go. The secret to creating this fast may be the en-queueing step. This the kind of solution syslogd provides.
For me it is a positive thing to stay with MySQL for signing up the visits, since it provides tools to evaluate your computer data. To lower the burden I'd possess the following suggestions.
- Create a fast collecting table, without any indixes except primary key, myisam, one row per hit
- Create a stabilized data structure for that hits and slowly move the records daily to that particular database.
This provides a more compact performance hit for logging along with a well indexed stabilized structure for querying/examining.