A Python tool to download and clean OpenStreetMap (OSM) building data with address normalization capabilities.
- Download OSM building data by:
- Place name (e.g., "Manhattan, New York")
- Bounding box coordinates (north, south, east, west)
- Comprehensive address cleaning:
- Street name normalization
- Expands common abbreviations (St → Street, Rd → Road, etc.)
- Handles directional prefixes/suffixes (N, S, E, W, NE, NW, SE, SW)
- Special handling for Mc/Mac names (e.g., "Mc Donald" → "McDonald")
- Proper handling of ordinal numbers (1st, 2nd, 3rd)
- Smart title casing with preserved articles
- City name cleaning
- Expands common abbreviations (St. → Saint, Ft. → Fort, Mt. → Mount)
- Proper capitalization for complex names
- US ZIP code validation
- Street name normalization
- Generates cleaning reports with statistics
- Clone this repository:
git clone https://github.com/yourusername/osm_address_cleaner.git
cd osm_address_cleaner- Install required dependencies:
pip install -r requirements.txtRun the script with default settings (downloads Manhattan buildings as an example):
python cleaner.pyfrom cleaner import download_osm_buildings, clean_osm_addresses
# Download buildings by place name
download_osm_buildings(
place_name="Chicago, Illinois",
save_path="data/raw_osm_buildings.geojson"
)
# Or download by bounding box (north, south, east, west)
download_osm_buildings(
bbox=(42.0, 41.8, -87.5, -87.8),
save_path="data/raw_osm_buildings.geojson"
)
# Clean the addresses
clean_osm_addresses(
input_path="data/raw_osm_buildings.geojson",
output_path="data/cleaned/cleaned_osm_buildings.gpkg",
report_path="reports/cleaning_report.csv"
)The tool performs comprehensive cleaning of address components:
# Street Names
"MC DONALD RD" → "McDonald Road"
"N. MAIN ST." → "North Main Street"
"123RD AVE" → "123rd Avenue"
# City Names
"st. LOUIS" → "Saint Louis"
"FT. WORTH" → "Fort Worth"
"MT VERNON" → "Mount Vernon"osm_address_cleaner/
├── cleaner.py # Main script
├── requirements.txt # Python dependencies
├── data/ # Data directory
│ └── cleaned/ # Cleaned output files
└── reports/ # Cleaning reports
- osmnx: OpenStreetMap data download
- geopandas: Geospatial data handling
- pandas: Data processing
- shapely: Geometric operations
This project is licensed under the MIT License - see the LICENSE file for details.