I've two large list(might be a hundred million products), the origin of every list could be either from the database table or perhaps a flat file. both lists are of comparable dimensions, both unsorted. I have to discover the distinction between them. and so i have 3 situations:
1. List1 is really a database table(assume each row only have one item(key) that's a string), List2 is really a large file.
2. Both lists come from 2 db tables.
3. both lists come from two files.
just in case 2, I intend to use:
choose a.item from MyTable a in which a.product not in (choose b.item form MyTable b)
this clearly is inefficient, it is possible to better way?
Another approach is:
I intend to sort each list, after which walk lower each of them to obtain the diff. When the list comes from personal files, I must see clearly right into a db table first, then use db sorting to output their email list. May be the run time complexity still O(nlogn) in db sorting?
either approach is really a discomfort and appears could be very slow once the list involved has 100s of countless products. any suggestions?
- Get both sets in to the database under all situations...this type of sorting and determination is exactly what db's are for. Other things is going to be reinventing the wheel.
The next will most likely be faster than the usual NOT IN (but test drive it to be certain):
choose a.item from MyTable a LEFT JOIN MyTable B On The.JoinColumn = B.JoinColumn where B.JoinColumn IS NULL
Make certain that the JoinColumns are indexed. The indexing can make the entire question of sorting go poof.
This isn't a real database question.
Step One. Get both lists sorted. Maybe the db list has already been sorted, but when not, then either export it in sorted order, or create a catalog if the same list is going to be needed sorted multiple occasions.
Step Two. Make use of a sort utility to create a sorted copy of a listing inside a text file. If these lists are past the capacity from the UNIX sort utility, break them up, sort each one of these, and can include the merging of those inside your application.
Step Three. Write the application to use a merge formula from the two lists and identify the variations by doing this. Observe that when the text file is within many portions, you'd require a secondary merge formula to give the primary formula in sorted order.
Observe that if you fail to use UNIX or Linux to sort the written text files, then obtain the source code from the UNIX sort command and port it for your O/S. This article explains why.