Everything is the following:

A number of remote work stations collect area data and ftp the collected area data to some server through ftp. The information is distributed like a CSV file that is saved inside a unique directory for every workstation within the FTP server.

Each workstation transmits a brand new update every ten minutes, leading to the prior data to become overwritten. We wish to in some way concatenate or store this data instantly. The workstation's processing is restricted and can't be extended as it is an embedded system.

One suggestion offered ended up being to operate a cronjob within the FTP server, however there's a Tos restriction to simply allow cronjobs in half hour times as it is shared-hosting. Given the amount of work stations uploading and also the 10 minute interval between uploads it appears such as the cronjob's half hour limit between calls may well be a problem.

Can there be every other approach that could be recommended? The accessible server-side scripting languages are perl, php and python.

Improving to some devoted server may be necessary, but I'd still prefer to get input regarding how to solve this issue in the most elegant manner.

You may think about a persistent daemon that keeps polling the prospective sites:

grab_lockfile() or exit()

while (1) 

    sleep(60)



Your cron job can just attempt to start the daemon every half an hour. When the daemon can't grab the lockfile, it simply dies, so there is no be worried about multiple daemons running.

Another method of consider is always to submit the files via HTTP Publish after which process them using a CGI. By doing this, you guarantee that they have been worked with correctly during the time of submission.

Most contemporary Linux's will support inotify to allow your process know once the items in a diretory has transformed, which means you don't even have to poll.

Edit: Regarding the comment below from Mark Baker :

"Be cautious though, as you will be informed the moment the file is produced, not when it is closed. So you will need a way to make certain you do not get partial files."

Which will happen using the inotify watch you place around the directory level - the best way to make certain after this you don't get the partial file would be to set an additional inotify watch around the new file and search for the IN_CLOSE event to ensure that you realize the file continues to be written to totally.

When your process has seen this, you are able to remove the inotify watch about this new file, and process it at the leisure.

The half hour limitation is fairly silly really. Beginning processes in linux isn't an costly operation, therefore if all you are doing is checking for brand new files there is no valid reason to avoid it more frequently than that. We now have cron jobs running every minute plus they haven't any noticeable impact on performance. However, I understand it isn't your rule and when you are likely to stay with that host company you do not have an option.

You will need a lengthy running daemon of some type. The easiest way would be to just poll regularly, and most likely that is what I'd do. Inotify, which means you get informed the moment personal files is produced, is really a more sensible choice.

You should use inotify from perl with Linux::Inotify, or from python with pyinotify.

Be cautious though, as you will be informed the moment the file is produced, not when it is closed. So you will need a way to make certain you do not get partial files.

With polling it's not as likely you will see partial files, however it may happen eventually and will also be an awful hard-to-reproduce bug if this does happen, so better to handle the problem now.