Geocoding Basics

Geocoding Basics

 

What is geocoding? Simply, it’s the process of matching a location such as:

  • an address (“Street-level geocoding”)
  • postal (ZIP) code (“ZIP-Level or Postal-Level geocoding)
  • city name (“City-Level Geocoding, not used as much anymore)
  • county name, etc…

…to latitude and longitude coordinates on the Earth.  The resulting coordinates are sometimes called a “geocode.”

Why would you want to geocode?

When you look for the closest pizza restaurants on your phone or computer, all of the pizza places have already been geocoded.  If you don’t allow your phone or computer to auto-locate you (by your GPS location, IP address, WiFi location, etc.), then you have to enter in where you are at and that entry has to be translated into something the software can use to find those closest pizza restaurants. The process of translating what you typed to something the software can use (coordinates) is called geocoding.

There are many, many examples of where geocoding is used.  Here are a few:

  • An insurance company will look at the risk of insuring a home or building based on where the property is located in relation to various perils – such as flooding, fires, wind-damage, hurricanes, storm surge, tornadoes, etc. The insurance company needs to know exactly where the property is located.
  • Logistics companies rely in accurate geocoding to pick-up, move and deliver products and services.
  • Emergency response organizations (911 and E911) rely on accurate geocoding to assign resources, whether it’s a fire, crime or larger issue.
  • Marketers look at customers, prospects, etc. on maps – sometimes millions of them – and mine for relationships with other customers, store locations, etc.
  • Utilities use geocoding to provide service, asset management, call-before-you-dig, etc.
  • Governments use geocoding for providing services, mapping crime, optimizing polling locations, dispatching snow plows, asset maintenance, etc.
  • Healthcare companies and researchers use geocoding to help site hospitals, clinics, analyze disease spread, etc.
  • Any company or organization collecting and/or paying certain taxes needs to determine what the tax rate is and the rules are for a location.

In the past few years, Data Scientists have been a driving force in using geocoding because it can help identify relationships that otherwise would be lost. Additionally, one of the biggest issues with Big Data/Machine Learning/Artificial Intelligence systems is a lack of sufficient data quality and bad geocodes can be a big issue.  Also, an organization may have data from multiple systems and sometimes there is no good way to handle duplicates. Or, the data in these multiple systems was geocoded using different systems.  (What’s right?)_ Clean and geocoded data can make a big difference.

Digging in….

One of the most common “levels” to geocode at are ZIP Codes because they are prevalent in databases.  If you have ZIP Codes and city information (city and State or city and province, etc.), you would want to geocode using the ZIP Code data because it’s more accurate – because most cities have more than one ZIP Code.  In some cases, the city, then, can become a “check” on the ZIP Code.  If the ZIP Code isn’t in or doesn’t overlap the city boundary, then there may be a problem with the data.

Geocoding ZIP Codes is a relatively easy process because it’s a relatively simple match against a database of ZIP Codes with coordinates.  If a match can’t be made, either the input ZIP Code is invalid or the database doesn’t contain it.

Geocoding at the ZIP Code level is sufficient for many research use cases and use cases where the analysis is over a larger area, but for many applications, address-level geocoding is required.  In the examples list above, in almost every case, address levels geocoding is required.

In general, the best geocodes that can be calculated from an address are at the parcel or building level. Here are three examples showing he quality of the results from some geocoders. Click on an image to enlarge it.

High Quality Geocode at Building/Parcel Level.
The best geocoders have this accuracy.

Low Quality Geocode - Along A Street
The geocoder could use any one of those X's - really any location along the street.
The pushpin points to the correct location.

Lower Quality Geocode - Anywhere Within A ZIP Code
The geocoder could use any location within the ZIP Code - using whatever set coordinates it has for the ZIP Code. In this case, every one of your records would be put in the same location - stacked on top of each other.

Garbage in, Clean Data and Geocodes Out

A good geocoder, which means one with well-written software and with accurate and complete underlying map data, can do wonders with incomplete and/or partially incorrect data.  At the same time, a good geocoder will provide information about what it was able to find and what it was able to do with your records.  In other words, given an address that is somewhat ambiguous, what did the geocoder do to fix the address and to what level and confidence did it find coordinates?  Geocoding systems have varying levels of diagnostic and result codes.

For some applications, specific software for address validation and correction may be useful.  You would use this before feeding your data into the geocoder.

International Geocoding

There are some of the same issues for geocoding in non-US countries.  There are some differences, too.  We’ll talk about international issues in another post.  FYI, here are the countries supported.

What is reverse geocoding?

Reverse geocoding is the process of determining what is at or near a set of coordinates.  In recent years as GPS has become more prevalent, the need for reverse geocoding has dramatically increased.  GPS systems work with coordinates. You can think of the nearest street address being returned, but there are many other pieces of information that can be calculated – nearest intersection, nearest Point of Interest (POI), enveloping ZIP Code or Census Block, etc.

Finally,

In other blog posts, we’ll look at the issues with address level geocoding. It’s much more complicated than ZIP Code geocoding and it is dependent on the underlying map data and the parsing and matching algorithms used on your data. If you can’t wait and have questions, please contact us.

To probe further on our site:

 

Geocoding:

 Reverse Geocoding:

Address Validation/Correction:

Previous

Next