Skip to content

A project that aims to compress the UK Postcode database into a tiny binary form factor what can be used at runtime for super fast postcode to lat/long lookup

License

Notifications You must be signed in to change notification settings

robertpitt/postcode-db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

UK Postcode Binary Encoder

A high-performance, ultra-compact binary format for UK postcode geolocation data. Compresses 1.79M postcodes from ~50MB CSV to just 6.2MB binary (88% compression) with 920K+ lookups/second and O(1) exact lookups.

🌐 Try the Live Demo β†’

Architecture

The implementation is split into focused, reusable classes:

Core Classes

  • PostcodeEncoder - Main orchestrator class that coordinates the encoding process
  • PostcodeNormalizer - Handles postcode validation, normalization, and inward ID encoding
  • BitmapManager - Manages 4000-bit presence bitmaps and rank directories for each outward
  • BinaryFileWriter - Handles the binary file format layout and writing
  • BitWriter - Utility for bit-packing coordinate deltas

Key Features

  • Ultra-Compact: 88% compression ratio (50MB β†’ 6.2MB) with 7.8/8.0 entropy efficiency
  • Blazing Fast: 920,000+ lookups/second with <1ms per lookup latency
  • O(1) Lookups: Constant-time exact postcode lookup via bitmap + rank directory + bit extraction
  • Efficient Queries: Fast prefix enumeration of all postcodes in an outward (e.g., "ST6")
  • Memory Optimized: <10MB memory footprint, designed for direct memory mapping
  • High Precision: 1e-4Β° coordinate precision with <1m average error (0.35m typical)
  • Type Safe: Full TypeScript implementation with comprehensive error handling

Usage

Encoding (CSV to Binary)

import { PostcodeEncoder } from "./src/PostcodeEncoder";

const encoder = new PostcodeEncoder("postcodes.csv");
encoder.build("postcodes.bin");

Querying (Binary Database)

import * as fs from "fs";
import { PostcodeClient } from "./src/PostcodeClient";

// Load the binary database
const buffer = fs.readFileSync("postcodes.pcod");
const client = new PostcodeClient(buffer);

// Exact postcode lookup (O(1))
const result = client.lookup("ST6 5AA");
if (result) {
  console.log(`${result.postcode}: ${result.lat}, ${result.lon}`);
}

// Enumerate all postcodes in an outward (prefix query)
const postcodes = client.enumerateOutward("ST6");
console.log(`Found ${postcodes.length} postcodes in ST6`);

// Get database statistics
const stats = client.getStats();
console.log(
  `Database: ${stats.totalPostcodes} postcodes, ${stats.fileSize} bytes`
);

// Check if postcode exists
const isValid = client.isValidPostcode("M1 1AA");

// Find outwards by prefix
const manchesterOutwards = client.findNearbyOutwards("M");

Dataset

This project uses the ONS Postcode Database that can be downlaod from https://geoportal.statistics.gov.uk/datasets/e14b1475ecf74b58804cf667b6740706, Once you have downloaded the dataset you can put the full ONS_{month}_{year}_UK.csv file in the data/ directory and run the following command to extract just the valid postcodes, lat and long values into a file called postcodes.csv, this file is what you can then use to build the binary database

Building the Database

  1. Ensure that you have a file called postcodes.csv in the root directory, it should contain 3 columns only
  2. Execute the yarn run build command, you should see an output like { totalOutwards: 2943, totalPostcodes: 1790884, fileSize: 6509848 } and a new file called postcodes.pcod apear in the root of the directory.

Performance & Compression

Compression Statistics

Metric Value Details
Input Size ~50MB Raw CSV with 1.79M postcodes
Output Size 6.2MB Binary format (.pcod file)
Compression Ratio 88% 8:1 compression ratio
Entropy Efficiency 7.8/8.0 Near-optimal bit utilization
Coordinate Precision 1e-4Β° ~11m resolution at UK latitudes
Average Error 0.35m Quantization error vs source

Lookup Performance

Operation Performance Details
Exact Lookup 920K+ ops/sec O(1) constant time
Latency <1ms Per lookup on M3 Mac
Memory Usage <10MB Total runtime footprint
Prefix Query Variable Depends on outward size
Build Time 1.65s For 1.79M postcodes

Real-World Benchmark

Below is the output from yarn ts-node scripts/build-and-test.ts on a Mac Pro M3, which builds the database and validates every postcode lookup:

πŸ—οΈ  Building postcode database...
βœ… Build completed in 1.65s

πŸ§ͺ Loading database and starting tests...

πŸ” Testing postcode lookups...

============================================================
           POSTCODE DATABASE TEST REPORT
============================================================

πŸ“Š PROCESSING STATISTICS:
   Total rows processed:     1,790,884
   Valid rows:               1,790,884
   Invalid rows:             0
   Invalid rate:             0.00%

πŸ” LOOKUP STATISTICS:
   Successful lookups:       1,790,884
   Failed lookups:           0
   Lookup success rate:      100.00%

πŸ“ COORDINATE ACCURACY:
   Coordinate mismatches:    0
   Accuracy rate:            100.00%
   Max coordinate error:     0.66m
   Average coordinate error: 0.35m

⏱️  PERFORMANCE:
   Build time:               1.65s
   Test time:                1.95s
   Total time:               3.59s
   Lookup rate:              920,763 lookups/sec

Compression Analysis

The binary format achieves exceptional compression through:

  • Bit-packed coordinate deltas: Variable-width encoding based on sector ranges
  • Adaptive storage modes: Bitmap vs list encoding based on density
  • Quantized coordinates: 1e-4Β° precision (sufficient for postcode accuracy)
  • Entropy optimization: 7.8/8.0 bits/byte efficiency (97.5% of theoretical maximum)

Entropy Distribution:

  • 95.8% of data chunks achieve >7.5 bits/byte entropy
  • Coordinate data sections: 7.8 bits/byte (highly compressed)
  • Index sections: 6.3 bits/byte (structured but efficient)

Comparison with Traditional Approaches

Approach File Size Lookup Speed Memory Usage Precision
This Binary Format 6.2MB 920K ops/sec <10MB 0.35m avg
Raw CSV 50MB ~1K ops/sec* 50MB+ Source precision
SQLite Database ~50MB ~10K ops/sec* 50MB+ Source precision
JSON Format ~120MB ~5K ops/sec* 120MB+ Source precision
Compressed CSV (gzip) ~30MB ~500 ops/sec* 50MB+ Source precision

*Estimated performance for linear/indexed searches without specialized optimization

Key Advantages:

  • 8Γ— smaller than raw CSV
  • 92Γ— faster than typical database lookups
  • 5Γ— smaller than compressed alternatives
  • Zero dependencies - pure binary format
  • Memory mappable - no loading overhead
  • Deterministic performance - O(1) guaranteed

File Format

+======================================================================================+
|                                    PCDB v3 FILE                                      |
+======================================================================================+
|                                   HEADER (32 bytes)                                  |
| 0x00  magic: 'PCDB' (char[4])                                                        |
| 0x04  version: u8 (=3)                                                               |
| 0x05  flags:   u8 (=0)                                                               |
| 0x06  outwardCount: u16                                                              |
| 0x08  totalUnitCount: u32                                                            |
| 0x0C  latOffset: i32  (global min latInt)                                            |
| 0x10  lonOffset: i32  (global min lonInt)                                            |
| 0x14  reserved[12] (zeros)                                                           |
+======================================================================================+
|                           OUTWARD INDEX TABLE (9 bytes x outwardCount)               |
|                                                                                      |
|  Entry i (9 bytes):                                                                  |
|    +--------------------------------------+                                          |
|    | outwardCode: char[4] (ASCII, NUL-pad) |                                          |
|    | sectorCount: u8                      |                                          |
|    | sectorIndexOffset: u32 (ABSOLUTE) ---+-----> points to start of Outward Block i |
|    +--------------------------------------+                                          |
|                                                                                      |
+======================================================================================+
|                                   OUTWARD BLOCK i (variable)                         |
|                        (begins at index[i].sectorIndexOffset, ABSOLUTE)              |
|                                                                                      |
|  +--------------------------------------------------------------------------------+  |
|  | SECTOR TABLE (14 bytes x sectorCount)                                          |  |
|  |                                                                                |  |
|  |  SectorEntry s (14 bytes):                                                     |  |
|  |    +-----------------------------------------------------------------------+   |  |
|  |    | sectorNumber : u8        (0..9)                                       |   |  |
|  |    | unitCount    : u16                                                   |   |  |
|  |    | unitsRelOff  : u24 (LE)  -------------------------------+------------+   |  |
|  |    |                               (RELATIVE to Outward Block|start)          |  |
|  |    | baseLatStored: i24 (LE) = sector.latMin - header.latOffset            |   |  |
|  |    | baseLonStored: i24 (LE) = sector.lonMin - header.lonOffset            |   |  |
|  |    | flagsAndBits : u16 (LE)                                              |   |  |
|  |    |    bit0      = 1  (coordinates are bit-packed)                       |   |  |
|  |    |    bit1      = 1:list mode | 0:bitmap mode                           |   |  |
|  |    |    bits2..6  = bitsLat (5 bits)                                      |   |  |
|  |    |    bits7..15 = bitsLon (9 bits)                                      |   |  |
|  |    +-----------------------------------------------------------------------+   |  |
|  +--------------------------------------------------------------------------------+  |
|                                                                                      |
|  +--------------------------------------------------------------------------------+  |
|  | UNIT DATA BLOBS (concatenated, one per sector, same order as sector table)     |  |
|  |                                                                                |  |
|  |  Blob for SectorEntry s at: (OutwardBlockStart + unitsRelOff[s])               |  |
|  |    +-----------------------------+   immediately followed by   +-------------+ |  |
|  |    | UNIT PRESENCE               |---------------------------->| COORD STREAM | |  |
|  |    +-----------------------------+                              +-------------+ |  |
|  |                                                                                |  |
|  |    UNIT PRESENCE (choose ONE):                                                 |  |
|  |      * Bitmap mode: EXACTLY 85 bytes (680 bits). Bit set => unitIndex present. |  |
|  |        byte = unitIndex/8, bit = unitIndex%8.                                  |  |
|  |      * List mode: Varint sequence of unit indices using delta coding:          |  |
|  |          first value = absolute first unitIndex (varint)                       |  |
|  |          subsequent values = delta to previous index (varint)                  |  |
|  |        (Encoder picks list if total varint bytes < 85)                         |  |
|  |                                                                                |  |
|  |    COORD STREAM (bit-packed, no per-value padding):                            |  |
|  |      For each of unitCount units (in ascending unitIndex order):               |  |
|  |        write latDelta : unsigned, width = bitsLat                              |  |
|  |        write lonDelta : unsigned, width = bitsLon                              |  |
|  |      Total length = ceil(unitCount x (bitsLat + bitsLon) / 8) bytes            |  |
|  |                                                                                |  |
|  |    Reconstruct per unit:                                                       |  |
|  |      latInt = header.latOffset + baseLatStored + latDelta                      |  |
|  |      lonInt = header.lonOffset + baseLonStored + lonDelta                      |  |
|  |      lat = latInt / 100000.0                                                   |  |
|  |      lon = lonInt / 100000.0                                                   |  |
|  +--------------------------------------------------------------------------------+  |
|                                                                                      |
+======================================================================================+
|                                   OUTWARD BLOCK i+1 ... (repeat)                     |
+======================================================================================+

LEGEND / NOTES

  • All multi-byte integers are LITTLE-ENDIAN.
  • Coordinates are quantized: lat/lon x 100000 -> integer (latInt/lonInt).
  • Ordering:
    • Outwards sorted lexicographically
    • Sectors sorted by sectorNumber (0-9)
    • Units sorted by unitIndex (ascending)
  • Offsets:
    • sectorIndexOffset is ABSOLUTE (file offset).
    • unitsRelOff is RELATIVE to the start of its Outward Block (i.e., sector table start).
  • Capacity:
    • Bitmap presence uses 85 bytes = 680 bits (code implements 680 bits).
  • bitsLat/bitsLon are minimal widths for sector-local deltas:
    • latDelta = unit.latInt - sector.latMin
    • lonDelta = unit.lonInt - sector.lonMin

Web Interface

πŸš€ Live Demo: https://robertpitt.github.io/postcode-db/

A modern web interface is available in the playground/ directory. This provides a user-friendly way to search postcodes through a web browser.

Quick Start

# Install dependencies
yarn install

# Start development server
yarn dev

# Build for production
yarn build:web

Visit http://localhost:3000 to use the web interface.

Features

  • Real-time search: Type partial or full postcodes for instant results
  • Responsive design: Works on desktop and mobile
  • Modern UI: Built with Tailwind CSS and Font Awesome icons
  • Map integration: Direct links to Google Maps
  • GitHub Pages ready: Automatic deployment workflow included

Deployment

The project includes a GitHub Actions workflow that automatically:

  1. Builds the postcode database
  2. Compiles the web interface
  3. Deploys to GitHub Pages

Simply push to the main branch to trigger deployment.

Inspirations

About

A project that aims to compress the UK Postcode database into a tiny binary form factor what can be used at runtime for super fast postcode to lat/long lookup

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages