I am going to write some example programs and associated documents evaluating methods for being able to access information saved in relational databases. To show real-existence needs, I have to incorporate a realistic dataset of 100s of 1000's of details.

Is anybody conscious of openly available, free datasets of this magnitude, of datasets of human names with human-level variance, or hierarchical datasets of either large business hierarchies, or large hierarchical, categorized, product catalogues?

Please point me within the right direction, if you're.


Part 1, human names: http://timecenter.cs.aau.dk/software.htm

Part 2, hierarchical data: no answer yet

The wikipedia dump is fairly massive: obligatory wikipedia link.

Your personal PC's directory tree is really a large hierarchical structure with a lot of details. You most likely possess a couple of 1000 "Details" that are file names, modification dates, dimensions, extra OS info, etc., etc.

If that is not big enough, look for a server that you could login to. That'll be bigger.

Not big enough? Obtain a web crawler and begin moving a large site. That may be as huge as you will find the persistence to crawl.