I'm searching for some dedupe software that's suitable for MS SQL Server. I've got a rather extensive and untidy table that consists of addresses from around the globe in most different languages. The table is to establish to deal with dupes as parent/child records so some functionality to deal with a match is needed (ie not only removing a dupe).
Edit: Here's the dwelling
ParentID MasterID PropertyName Address1 Address2 PostalCode City StateProvinceCode CountryCode PhoneNumber
The MasterID is exclusive for every record.
ParentID consists of the MasterID for that parent record of every entry, and also the parent record is how the MasterID = ParentID.
CountryCode may be the two letter ISO country code (not telephone code).
Address replicates are infamously hard to find. You will find about 10 valid methods to write one address, that make for problems.
Because you have business rules that permit replicates a few of the time makes me think you may be best moving your personal software program to locate unacceptable dupes and take away them.
Previously I've carried this out with addresses by putting the address via a free geo-coding service (Google's mapping API for example) and searching for points which are inside a certain threshold of one another (10 ft or something like that). You now can determine whether it qualifies being an "unacceptable duplicate" and remove it.
To locate distances between coordinates I would suggest locating the Great Circle Distance. Best of luck!