I've got a network hard drive that consists of a couple of hundred 1000 mp3 files, organized by [artist]/[album] hierarchy. I have to identify recently added artist folders and/or recently added album folders programmatically when needed (not monitoring, but by request).

Our dev server is Home windows-based, the development server is going to be FreeBSD. A mix-platform option would be optimal since the production server might not continually be *nix, and Let me spend very little time on repairing the (inevitable) variations between your dev and production server as you possibly can.

I've got a working proof-of-concept that's Home windows platform-dependent: utilizing a Scripting.FileSystemObject COM object I'm iterating through all top-level (artist) sites and checking how big your directory. If there's a big change, then your directory is further investigated to locate new album folders. Because the sites are iterated, the road and quality is collected into an assortment, that we write serialized right into a apply for the next time. This array can be used on the subsequent call, both to recognize transformed artist sites (new album added) in addition to determining brand-new artist sites.

This feels convoluted, so that as I pointed out it's platform-dependent. To boil it lower, my goals are:

  • Identify new top-tier sites
  • Identify new second-tier sites
  • Identify new loose files inside the top-tier sites

Execution time isn't a concern here, and security isn't a hurdle: it is really an internal-only project only using intranet assets, therefore we can perform whatever needs to be achieved to facilitate the preferred finish result.

Here's my working proof-of-concept:

    // read the cached list of artist folders
    $folder_list_cache_file = 'seartistfolderlist.pctf';
    $fh = fopen($folder_list_cache_file, 'r');
    $folder_list_cache = fread($fh, filesize($folder_list_cache_file));
    fclose($fh);

    if (!$folder_list_cache)
        $folder_list_cache = '';

    $folder_list_cache = unserialize($folder_list_cache);
    if (!is_array($folder_list_cache))
        $folder_list_cache = array();

    // container arrays
    $found_artist_folders = array();
    $newly_found_artist_folders = array();
    $changed_artist_folders = array();

    $filesystem = new COM('Scripting.FileSystemObject');

    $dir = "//network_path_to_folders/";
    if ($handle = opendir($dir)) {
        // loop the directories
        while (false !== ($file = readdir($handle))) {
            // skip non-entities
            if ($file == '.' || $file == '..')
                continue;

            // make a key-friendly version of the artist name, skip invalids
            // ie 10000-maniacs
            $file_t = trim(post_slug($file));
            if (strlen($file_t) < 1)
                continue;

            // build the full path
            $pth = $dir.$file;

            // skip loose top-level files
            if (!is_dir($pth))
                continue;

            // attempt to get the size of the directory
            $size = 'ERR';
            try {
                $f = $filesystem->getfolder($pth);
                $size = $f->Size();
            } catch (Exception $e) {
                /* failed to get size */
            }

            // if the artist is not known, they are newly added
            if (!array_key_exists($file_t, $folder_list_cache)) {
                $newly_found_artist_folders[$file_t] = $file;
            } elseif (array_key_exists($file_t, $folder_list_cache) && $size != $folder_list_cache[$file_t]['size']) {
                // if the artist is known but the size is different, a new album is added
                $changed_artist_folders[] = $file;
            }

            // build a list of everything, along with file size to write into the cache file
            $found_artist_folders[$file_t] = array (
                'path'=>$file,
                'size'=>$size
            );
        }
        closedir($handle);
    }

// write the list to a file for next time
    $fh = fopen($folder_list_cache_file, 'w') or die("can't open file");
    fwrite($fh, serialize($found_artist_folders));
    fclose($fh);

     // deal with discovered additions and changes....

Another factor to say: since these are Tunes, the dimensions I am coping with are large. So large, actually, that I must be careful for PHP's limitation on unsized integers. The drive is presently at 90% usage of 1.7TB (yes, SATA in RAID), a brand new group of multi-TB drives is going to be added soon simply to be chock-full quickly.

EDIT

I didn't mention the database because I figured it might be a pointless detail, but there's a database. This script is seeking new inclusions in digital part of our music library in the finish from the code where it states "cope with discovered additions and changes", it's reading through ID3 tags and doing Amazon . com searches, then adding the brand new stuff to some database table. Someone can come along and evaluate the new additions and screen the information, then it will likely be added it towards the "official" database of albums readily available for play. Most of the tunes we are coping with are by local artists, therefore the ID3 and Amazon . com searches don't provide the track game titles, album title, etc. For the reason that situation, a persons intervention is crucial to complete the missing data.