How do you find duplicate addresses inside a database, or better stop people already when filling out the shape ? I suppose the sooner the greater?

Can there be any great way of abstracting street, postal code etc to ensure that typos and straightforward tries to get 2 sign ups could be detected? like:

Quellenstrasse 66/11 
Quellenstr. 66a-11

I am speaking German addresses... Thanks!

You could utilize the Google GeoCode API

Wich actually gives recent results for each of your good examples, just attempted it. This way you receive structured results that you could save inside your database. When the research fails, request the consumer to create the address in one other way.

The sooner you are able to stop people, the simpler it will be over time!

Not too acquainted with your db schema or data entry form, I'd advise a route something similar to the next:

  • have distinct fields inside your db for every address "part", e.g. street, city, postal code, Länder, etc.

  • have your computer data entry form divided similarly, e.g. street, city, etc

The reasoning behind the above mentioned is the fact that each part will probably have it's own particular "rules" for checking slightly-transformed addressed, ("Quellenstrasse"->"Quellenstr.", "66/11"->"66a-11" above) so that your validation code can see if the values as presented for every area appear in their particular db area. Otherwise, you'll have a class that is applicable the transformation rules for every given area (e.g. "strasse" turned to "str") and inspections again for replicates.

Clearly the above mentioned method has it's disadvantages:

  • it may be slow, based on your computer data set, departing the consumer waiting

  • customers may attempt to circumvent it by putting address "Parts" within the wrong fields (appending publish code to city, etc). but from experience we have discovered that presenting even simple checking such as the above may prevent a lot of customers from entering pre-existing addresses.

Once you have the fundamental checking in position, you can try optimising the db accesses needed, refining the guidelines, etc to satisfy your unique schema. You could also have a look at MySQL's match() function for exercising similar text.

Before you begin trying to find duplicate addresses inside your database, you need to first make certain you keep addresses inside a standard format.

Most nations possess a standard method of formatting addresses, in america it is the USPS CASS system: http://www.usps.com/ncsc/addressservices/certprograms/cass.htm

But many other nations have the identical service/standard. Do this site for additional worldwide formats: http://bitboost.com/ref/worldwide-address-formats.html

This not just works well for finding replicates, but additionally helps you save money when mailing you clients (the postal service charges less when the address is within a typical format).

Based on the application, in some instances you might like to store a "vanity" address record along with the standard address record. This prevents your Very important personel clients happy. A "vanity" address may be something similar to:

62 West 90 First Street
Apartment 4D
Manhattan, New You are able to, NY 10001

As the standard address might seem like this:

62 W 91ST ST APT 4D
NEW You are able to NY 10024-1414

One factor you might like to take a look at are Soundex searches, that are quite helpful for misspellings and contractions.

Nevertheless this isn't an in-database validation therefore it might be what you are searching for.