Skip to content

maxx-mill/osm-address-cleaner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

OSM Address Cleaner

A Python tool to download and clean OpenStreetMap (OSM) building data with address normalization capabilities.

Features

  • Download OSM building data by:
    • Place name (e.g., "Manhattan, New York")
    • Bounding box coordinates (north, south, east, west)
  • Comprehensive address cleaning:
    • Street name normalization
      • Expands common abbreviations (St → Street, Rd → Road, etc.)
      • Handles directional prefixes/suffixes (N, S, E, W, NE, NW, SE, SW)
      • Special handling for Mc/Mac names (e.g., "Mc Donald" → "McDonald")
      • Proper handling of ordinal numbers (1st, 2nd, 3rd)
      • Smart title casing with preserved articles
    • City name cleaning
      • Expands common abbreviations (St. → Saint, Ft. → Fort, Mt. → Mount)
      • Proper capitalization for complex names
    • US ZIP code validation
  • Generates cleaning reports with statistics

Installation

  1. Clone this repository:
git clone https://github.com/yourusername/osm_address_cleaner.git
cd osm_address_cleaner
  1. Install required dependencies:
pip install -r requirements.txt

Usage

Basic Usage

Run the script with default settings (downloads Manhattan buildings as an example):

python cleaner.py

Python API

from cleaner import download_osm_buildings, clean_osm_addresses

# Download buildings by place name
download_osm_buildings(
    place_name="Chicago, Illinois",
    save_path="data/raw_osm_buildings.geojson"
)

# Or download by bounding box (north, south, east, west)
download_osm_buildings(
    bbox=(42.0, 41.8, -87.5, -87.8),
    save_path="data/raw_osm_buildings.geojson"
)

# Clean the addresses
clean_osm_addresses(
    input_path="data/raw_osm_buildings.geojson",
    output_path="data/cleaned/cleaned_osm_buildings.gpkg",
    report_path="reports/cleaning_report.csv"
)

Address Cleaning Examples

The tool performs comprehensive cleaning of address components:

# Street Names
"MC DONALD   RD""McDonald Road"
"N.  MAIN ST.""North Main Street"
"123RD   AVE""123rd Avenue"

# City Names
"st. LOUIS""Saint Louis"
"FT. WORTH""Fort Worth"
"MT VERNON""Mount Vernon"

Project Structure

osm_address_cleaner/
├── cleaner.py          # Main script
├── requirements.txt    # Python dependencies
├── data/              # Data directory
│   └── cleaned/       # Cleaned output files
└── reports/           # Cleaning reports

Dependencies

  • osmnx: OpenStreetMap data download
  • geopandas: Geospatial data handling
  • pandas: Data processing
  • shapely: Geometric operations

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages