A comprehensive machine learning pipeline for predicting earthquake magnitudes from early-stage 3-component seismic waveforms using 1D CNNs with data augmentation and weighted loss strategies.
- End-to-end pipeline from raw seismic data to magnitude prediction
- Multi-component analysis using Z, N, E seismic components
- Physics-informed data augmentation for rare high-magnitude events
- Weighted loss function to handle severe class imbalance
- Comprehensive evaluation across different sampling rates and configurations
| Configuration | Sampling Rate | Augmentation | MAE | R² | Accuracy (±0.2 mag) |
|---|---|---|---|---|---|
| Best Model | 100 Hz | ✅ Yes | 0.276 | 0.166 | 41.6% |
| Baseline | 50 Hz | ❌ No | 0.313 | -0.007 | 32.7% |
pip install torch==2.2.2 torchvision==0.17.2
pip install numpy==1.26.4 pandas==2.2.3 scikit-learn==1.7.1
pip install obspy==1.4.2 matplotlib==3.10.0
pip install scipy==1.16.0 tqdm==4.67.1pip install -r requirements.txtfrom seismic_magnitude_prediction import run_experiment
# Run best configuration
results = run_experiment(
seconds=5,
sample_rate=100,
data_file="seismic_data_5_seconds_sampling_rate_100.pkl",
target_length=500,
batch_size=32,
num_epochs=100,
learning_rate=0.001,
is_augmented=True
)📁 Project Structure
├── seismic_magnitude_prediction.py # Main ML pipeline
├── data_preprocessing.py # Data extraction & validation
├── requirements.txt
├── README.md
└── data/
├── Turkey_data/
│ ├── Catalog/ # Text catalog files
│ └── DATA/ # MiniSEED waveform files
└── processed/
└── *.pkl # Processed datasets
Catalog Parsing: Extract event metadata from Turkish seismic catalogs Waveform Extraction: Extract 5-second snippets around P-wave arrivals Quality Control: 7-level validation system for data integrity Standardization: Resample to 100 Hz and normalize signals
Input: 3-channel waveforms (Z, N, E components) × 500 time samples Architecture: 4-layer 1D CNN with batch normalization and global pooling Output: Single regression value (earthquake magnitude)
Data Augmentation: Physics-informed noise, time shift, amplitude scaling Weighted Loss: Higher penalties for rare high-magnitude events (10x weight for mag 6.5+)
Source: Turkish Seismic Network
Total Samples: 17,773 (50 Hz) / 12,671 (100 Hz)
Magnitude Range: 3.5 - 7.5
Components: 3-channel seismic recordings (Z/N/E)
Duration: 5 seconds post P-wave arrival
Magnitude 3.5-4.5: ~90% of samples (dominant)
Magnitude 4.5-5.5: ~8% of samples
Magnitude 5.5-6.5: ~1% of samples
Magnitude 6.5-7.5: <0.1% of samples (critical but rare)
100 Hz + Augmentation significantly outperforms other configurations
Data augmentation is crucial for high-resolution data (prevents overfitting)
Class imbalance remains the primary challenge for high-magnitude prediction
Mag 3.5-4.5: MAE = 0.224 ✅ (Excellent)
Mag 4.5-5.5: MAE = 0.766 ⚠️ (Acceptable)
Mag 5.5+: MAE > 1.3 ❌ (Poor - insufficient data)
📋 Requirements System Requirements
Python 3.11+
CUDA 12.1+ (optional, for GPU acceleration)
8GB+ RAM (for large seismic datasets)
Tested Environment
This project was developed and tested with:
Python 3.11.13
PyTorch 2.2.2 with CUDA 12.1 support
ObsPy 1.4.2 for seismic data processing
NumPy 1.26.4 (ObsPy compatibility requirement)
Core Dependencies
# Core ML & Data Processing
torch==2.2.2
torchvision==0.17.2
numpy==1.26.4
pandas==2.2.3
scikit-learn==1.7.1
# Seismic Data Processing
obspy==1.4.2
# Visualization
matplotlib==3.10.0
# Data Utilities
scipy==1.16.0
tqdm==4.67.1
# Development Environment (optional)
jupyter==1.1.1
ipython==9.4.0
MIT License - see LICENSE file for details.
@misc{seismic2025,
title={Seismic Magnitude Prediction using Deep Learning},
author={Sam84723},
year={2025},
url={https://github.com/sam84723/seismic_magnitude_predictor}
}🙏 Acknowledgments
Dr. Itzhak Lior from HUJI university
Turkish Seismic Network for providing the dataset
ObsPy community for seismic data processing tools
PyTorch team for the deep learning framework