This repository demonstrates how large language models (LLMs) can be used to harmonize geospatial datasets from user-provided URLs.
Instead of writing custom scripts for each dataset, the workflow is:
- A user provides dataset URLs
- The LLM inspects the datasets
- The LLM decides how to harmonize them (CRS, extent, resolution)
- Shared Python functions perform the harmonization
- Outputs and maps are generated
The harmonization workflow supports:
- downloading datasets from URLs (direct download, ZIP extraction, OPeNDAP streaming)
- extracting archives (e.g., ZIP files)
- identifying raster and vector inputs
- reprojecting to a common CRS
- clipping to a shared extent
- aligning raster resolution
- optionally rasterizing vector data
- saving harmonized outputs
- generating a static PNG visualization and an interactive HTML map
Install dependencies:
pip install -r requirements.txtRun the example:
python examples/colorado_fire_risk/colorado_harmonization.pyThis repository includes a worked example that harmonizes:
- FBFM40 Fire Behavior Fuel Models (raster) — Landfire 2024 Scott and Burgan 40-class model
- MACAv2 Winter Precipitation (raster) — CCSM4 RCP8.5 Dec–Mar mean 2006–2099, streamed via OPeNDAP
- MTBS Burned Area Boundaries (vector) — USGS fire perimeters, kept as vector
- Microsoft Building Footprints (vector, rasterized) — Colorado buildings at ~270 m
All datasets are harmonized to:
- CRS: EPSG:4326
- Extent: Colorado bounding box (
-109.05, 36.99, -102.04, 41.01) - Resolution: ~270 m (0.00243°)
Goal:
Visualize fire behavior fuel models, projected winter precipitation, past burned areas, and human infrastructure together to understand fire risk patterns across Colorado.
See examples/colorado_fire_risk/colorado_harmonization.py.
You can prompt an LLM with something like:
"Download these datasets, harmonize them to EPSG:4326 over Colorado, and generate a map."
The LLM should:
- download and inspect the datasets
- determine raster vs vector inputs
- ask about resolution mismatches if needed
- reproject and clip datasets
- optionally rasterize vector data
- generate harmonized outputs and a visualization
The expected behavior is defined in AGENTS.md.
src/
geospatial_harmonizer.py # core harmonization library — import from here
examples/
colorado_fire_risk/ # reference example — learn from here, don't modify
colorado_harmonization.py
output/ # generated outputs (data gitignored, viz tracked)
workflows/ # your analyses go here
my_project/ # one folder per project
my_script.py
output/ # generated outputs co-located with the script
docs/ # website source (MkDocs)
AGENTS.md # LLM behavior and workflow rules
requirements.txt
If you are a scientist using this as a template:
- Read
examples/colorado_fire_risk/colorado_harmonization.pyto understand the pattern - Create a new folder in
workflows/for each analysis - Outputs land in your project's own
output/folder, next to the script
If you are an LLM agent:
- New analyses go in
workflows/<project_name>/, not inexamples/ - Set
output_dir=Path(__file__).parent / "output"— never hardcode paths - Core library is
src/geospatial_harmonizer.py— read it before writing harmonization code - Full rules are in
AGENTS.md
Run a harmonization workflow directly:
from pathlib import Path
from src.geospatial_harmonizer import DatasetSpec, ExampleWorkflow, run_harmonization_example
workflow = ExampleWorkflow(
name="my_workflow",
datasets=[
DatasetSpec(
name="my_raster",
url="https://example.com/data.tif",
data_type="raster",
),
DatasetSpec(
name="my_vector",
url="https://example.com/data.zip",
data_type="vector",
rasterize=True,
),
],
target_crs="EPSG:4326",
target_extent=(-109.05, 36.99, -102.04, 41.01),
target_resolution=0.00243,
output_dir=Path("./output/my_run"),
create_visualization=True,
verbose=True,
)
output_files, interactive_map = run_harmonization_example(workflow)- The LLM handles decision-making and orchestration
- The Python code handles geospatial processing
- Examples demonstrate real workflows
- The system is reusable across datasets
This repository includes a documentation site built with MkDocs.
To preview locally:
pip install mkdocs mkdocs-material
mkdocs serveThen open http://127.0.0.1:8000.
This repository is designed as a teaching and demonstration tool.
It shows how LLMs can:
- reason about geospatial data
- make harmonization decisions
- orchestrate reusable processing code
rather than requiring custom scripts for each dataset.