top of page
Rajdeep Sengupta
Blake Marsh
Jacob Dice

Address Validation Techniques with GIS Data

It is often difficult to track entities by address over time due to reporting inconsistencies. We propose a two-step process to clean and verify address data using Python-based tools. First, we standardize address components and convert the address to a coordinate set using open source APIs. Second, using scikit-learn, we build clusters of points within a given distance tolerance around each coordinate set to compensate for variations over time and identify same entities. We apply these methods to the Summary of Deposits data, a database of bank branch locations commonly used in banking research. The process improves identification of branch openings, closings, and relocations over time, which is of interest to policymakers and researchers.
bottom of page