I am attempting to avoid reinventing the wheel if this involves storing street addresses inside a table only one time. Originality constraints will not operate in some common situations:
100 W 5th Ave 100 West 5th Ave 100 W 5th 200 N 6th Ave Suite 405 200 N 6th Ave #405
I possibly could implement some business logic or perhaps a trigger to normalize all fields before placing and employ originality constraints across several fields within the table, but it might be simple to miss certain cases with something which varies around street addresses.
What might be best will be a universal identifier for every address, possibly according to Gps navigation coordinates. Before storing a brand new address lookup its GUID if the GUID already is available within the Address table.
A business like Mapquest, the Postal Serice, FedEx, or the federal government most likely includes a system such as this.
Has anybody found a great choice for this?
Here's my Address table now (produced by JPA):
CREATE TABLE address ( id bigint NOT NULL, "number" character varying(255), dir character varying(255), street character varying(255), "type" character varying(255), trailingdir character varying(255), unit character varying(255), city character varying(255), state character varying(255), zip integer, zip4 integer, CONSTRAINT address_pkey PRIMARY KEY (id) )
Lookup the address in the search engines maps and employ the spelling they will use.
First, carefully reconsider the reason why you feel compelled to keep addresses only one time and identify them with a unique ID. It adds complexity, fights the altering character and wide selection of addresses, and could not really properly cope with the actual problem you are really attempting to solve. See http://semaphorecorp.com/mpdd/mpdd.html regarding duplicate address issues.
The next produces a 35-character identifier that distinctively identifies U.S. addresses THAT RECEIVE MAIL:
House or PO box number [USPS 10 figures maximum]
Optional unit abbreviation [APT/STE/etc, USPS 4 figures maximum]
Optional apartment number [USPS 8 figures maximum]
Zipcode [USPS 5 figures]
+4 code [USPS 4 figures]
MMYR [month year, 4 figures]
The system abbreviation is needed to differentiate (rare) cases like STE 1 and APT one in exactly the same building (see Publication 28 at usps.com for that listing of all unit types). +4 codes are not available for addresses (typically rural) that do not receive mail delivery (eg, they will use a PO box in the publish office), which means you can't create an ID for individuals addresses. If you want to differentiate private post office box (PMB) drops at places just like a UPS store, you will need to add the PMB number, consider PMBs are controlled through the stores not the Postal Service the amount of figures necessary is unpredictable (although four to five chars ought to be enough). The identifier is going to be guaranteed unique only inside the given USPS ZIP+4 database month/year edition, because the same address may have another +4 code or any other component inside a different month.
You'll need support for normal expressions like syntax. You are able to develop some type of automata function which will parse tokens and check out matching them after which expand or contract them into abbreviations. I'd consider glob() like functions that provide support to *? etc on unix like a quick dirty fix.
This guy provides a full implementation from the code to validate a previous address with USPS online.
I wasn't searching for address validation or normalization, although address validation may be beneficial. I want a distinctive identifier for every home address to prevent duplicate records.
It appears like geocoding can offer an answer. With geocoding the input could be a home address and also the output is going to be latitude and longitude coordinates with sufficient precision to solve a particular building.
There is a more severe trouble with home address ambiguity than I figured. This really is in the Wikipedia page on geocoding:
"...you will find multiple 100 Washington Roads in Boston, Massachusetts because several metropolitan areas happen to be annexed without altering street names."
The Wikipedia page on geocoding has a listing of assets (many free) to do geocoding.