AWS EC2 Network Latency Benchmark for Trading Applications

This repository contains a comprehensive network latency benchmarking solution designed specifically for trading applications running on AWS EC2. The benchmark suite measures round-trip latency between trading clients and servers, providing valuable insights for latency-sensitive financial applications.

Project Overview

Purpose

The primary goal of this project is to measure and analyze network latency in a simulated trading environment on AWS EC2 instances. In high-frequency trading (HFT) applications, even microseconds of latency can significantly impact trading outcomes and profitability. This benchmark helps:

Evaluate the network performance of different EC2 instance types for trading workloads
Measure the impact of various system and JVM optimizations on latency
Provide data-driven insights for architecting low-latency trading systems on AWS
Compare performance across different AWS regions and placement groups
Identify bottlenecks and optimization opportunities in trading infrastructure

Why This Matters

Financial markets operate at extremely high speeds, where being just a few microseconds faster than competitors can mean the difference between profit and loss. This benchmark suite allows you to:

Make informed decisions about EC2 instance selection for trading applications
Understand the real-world impact of system-level tuning on latency
Quantify the performance benefits of various optimization techniques
Establish baseline performance metrics for your trading infrastructure
Test the impact of network conditions on trading application performance

Components

The benchmark suite consists of:

Java Trading Client: A high-performance client that sends limit and cancel orders and measures round-trip times
Rust Mock Trading Server: A lightweight server that simulates a trading exchange by responding to client orders
CDK Infrastructure: AWS CDK code to deploy the required EC2 instances and networking components
Ansible Playbooks: Scripts to provision instances, run tests, and collect results
OS-Tuned AMI Builder: Automated pipeline to create pre-optimized Amazon Machine Images with performance tuning baked in
Analysis Tools: Utilities to process and visualize latency data using HDR Histograms

Sequence Diagram

The benchmark contains a simple HFT client and Matching Engine written in Java to simulate a basic order flow sequence for latency measurements, as per the following diagram:

Prerequisites

Before using this benchmark suite, ensure you have the following prerequisites:

AWS CLI: Configured with appropriate credentials and default region
AWS CDK: Installed and bootstrapped in your AWS account
Ansible: Version 2.9+ installed on your local machine
SSH Key Pair: Generated and registered with AWS for EC2 instance access (e.g., ~/.ssh/virginia.pem)

Getting Started

Option A: Quick Start with Pre-Built OS-Tuned AMI (Recommended)

For fastest deployment and optimal performance, build a pre-tuned AMI first:

cd deployment
./build-tuned-ami.sh --key-file ~/.ssh/virginia.pem

This creates an AMI with all OS-level optimizations pre-applied (CPU isolation, network tuning, hugepages, etc.). The process takes ~20-30 minutes but eliminates the need to run OS tuning on every deployment.

Important - CPU Allocation Considerations:

CPU isolation settings are baked into the AMI at build time based on the builder instance's vCPU count
For optimal performance, build the AMI on the same instance type you plan to deploy to (or within the same size class)
Example: Building on c7i.4xlarge (16 vCPUs) then deploying on c7i.48xlarge (192 vCPUs) will only isolate 12 cores instead of 176
See the AMI Builder README for recommended build strategies

Then deploy using the tuned AMI:

cd cdk
cdk deploy --context deploymentType=cluster --context baseAmi=ami-xxxxxxxxx

See deployment/AMI_BUILDER_README.md for detailed AMI builder documentation.

Option B: Standard Deployment with Manual OS Tuning

1. Deploy Infrastructure with CDK

Deploy the required AWS infrastructure using CDK. You have several deployment options:

Default Single Instance Deployment

Create SSH keypair manually and name it as for example frankfurt then

cd deployment/cdk
npm install
cdk deploy --context region=eu-central-1 --context availabilityZone=eu-central-1a --context keyPairName=frankfurt --context instanceType1=c7i.4xlarge --context instanceType2=c6in.4xlarge

Client-Server Architecture with Cluster Placement Group

For optimal network performance between client and server, deploy them in a Cluster Placement Group:

cd deployment/cdk
npm install
cdk deploy --context deploymentType=cluster

You can also specify instance types for client and server:

cdk deploy --context deploymentType=cluster --context clientInstanceType=c7i.4xlarge --context serverInstanceType=c6in.4xlarge

Multi-AZ Deployment

To test latency across multiple availability zones:

cd deployment/cdk
npm install
cdk deploy --context deploymentType=multi-az

AMI Builder Deployment

To build an OS-tuned AMI for reuse across deployments:

cd deployment/cdk
npm install
cdk deploy --context deploymentType=ami-builder --context instanceType=c7i.4xlarge

Or use the automated build script (recommended):

cd deployment
./build-tuned-ami.sh --instance-type c7i.4xlarge --key-file ~/.ssh/virginia.pem

2. Run the Benchmark Tests

After deploying the infrastructure, use the following Ansible playbooks to run the benchmark tests:

cd ../ansible

# Provision EC2 instances, and deploy both client and server applications
ansible-playbook provision_ec2.yaml --key-file ~/.ssh/your-key.pem -i ./inventory/inventory.aws_ec2.yml

# Stop any existing tests
ansible-playbook stop_latency_test.yaml --key-file ~/.ssh/virginia.pem -i ./inventory/inventory.aws_ec2.yml

# Apply OS-level performance tuning
ansible-playbook tune_os.yaml --key-file ~/.ssh/virginia.pem -i ./inventory/inventory.aws_ec2.yml

# Start the mock trading server
ansible-playbook restart_mock_trading_server.yaml --key-file ~/.ssh/virginia.pem -i ./inventory/inventory.aws_ec2.yml

# Start the HFT client
ansible-playbook restart_hft_client.yaml --key-file ~/.ssh/virginia.pem -i ./inventory/inventory.aws_ec2.yml

# Start the test run for desired duration
ansible-playbook start_latency_test.yaml --key-file ~/.ssh/virginia.pem -i ./inventory/inventory.aws_ec2.yml

# Let the test run for desired duration, then stop it
ansible-playbook stop_latency_test.yaml --key-file ~/.ssh/virginia.pem -i ./inventory/inventory.aws_ec2.yml

3. Collect and Analyze Results

After running the tests, collect and analyze the latency results:

cd ..
./show_latency_reports.sh --inventory $(PWD)/ansible/inventory/inventory.aws_ec2.yml --key ~/.ssh/virginia.pem

This script will:

Fetch histogram logs from the EC2 instances
Process the logs to generate latency reports
Create a summary report with key latency metrics

Understanding the Results

The latency reports include several important metrics:

Min/Max/Mean Latency: Basic statistics about the observed latencies
Percentile Latencies: Values at key percentiles (50th, 90th, 99th, 99.9th, etc.)
Coordinated Omission Free: Adjusted metrics that account for coordinated omission
Histogram Distribution: Visual representation of the latency distribution

These metrics help identify not just average performance but also worst-case scenarios that are critical for trading applications.

Advanced Configuration

Pre-Built OS-Tuned AMIs

For production deployments, we recommend using pre-built OS-tuned AMIs:

Benefits:

Faster Deployments: Skip 10-15 minute OS tuning process on every deployment
Consistency: Guaranteed identical OS optimizations across all instances
Immutable Infrastructure: Version-controlled tuning configurations via AMI tags
Dynamic Scaling: CPU isolation automatically adapts to instance size (2-192 vCPUs supported)

Build Strategy: Build separate AMIs for different instance size classes for optimal performance:

Small (4-8 vCPUs): Build on c7i.2xlarge
Medium (16-32 vCPUs): Build on c7i.4xlarge
Large (48-96 vCPUs): Build on c7i.24xlarge
X-Large (128-192 vCPUs): Build on c7i.48xlarge

See deployment/AMI_BUILDER_README.md for complete documentation.

Tuning OS Parameters

The tune_os.yaml playbook applies various system-level optimizations with dynamic CPU core allocation:

CPU Optimizations:

Automatically detects vCPU count and scales housekeeping cores
Disables hyperthreading, C-states, and P-states
Sets CPU governor to performance
Isolates cores for trading applications (scales from 1 to 176 cores)
Moves IRQs and kernel workqueues to housekeeping cores

Other Optimizations:

Network stack parameters (busy polling, TSO/GSO disabled)
Memory settings (hugepages, THP disabled, NUMA)
I/O scheduler configuration
Kernel parameters

You can customize these settings in the playbook based on your specific requirements.

JVM Tuning

The Java client is launched with specific JVM parameters to optimize performance. These parameters control:

Memory allocation and garbage collection
Thread affinity and scheduling
JIT compilation behavior
Memory pre-touch and large pages

Optimization Techniques

The benchmark implements several optimization techniques commonly used in high-frequency trading applications:

Dynamic CPU Isolation: Automatically scales isolated cores based on instance size (supports 2-192 vCPUs)
Thread Processor Affinity: Pins threads to specific CPU cores to prevent cache thrashing
Composite Buffers: Reduces unnecessary object allocations and copy operations
Separate Execution and IO Threads: Keeps network I/O threads dedicated to communication
HDR Histogram for Latency Recording: Efficiently records latency measurements with high precision
io_uring Transport: Uses Linux io_uring for zero-copy networking when available
OS-Level Tuning: Network stack, memory management, and I/O scheduler optimizations

Contributing

See CONTRIBUTING for details on how to contribute to this project.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
.github		.github
af_xdp_zero_copy_perf_benchmark		af_xdp_zero_copy_perf_benchmark
assets/images		assets/images
cpp-client		cpp-client
deployment		deployment
ec2_timestamping_programs		ec2_timestamping_programs
microbenchmarks/src/test/java/com/aws/trading		microbenchmarks/src/test/java/com/aws/trading
mock-trading-server		mock-trading-server
release-configs		release-configs
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
dependency-reduced-pom.xml		dependency-reduced-pom.xml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AWS EC2 Network Latency Benchmark for Trading Applications

Project Overview

Purpose

Why This Matters

Components

Sequence Diagram

Prerequisites

Getting Started

Option A: Quick Start with Pre-Built OS-Tuned AMI (Recommended)

Option B: Standard Deployment with Manual OS Tuning

1. Deploy Infrastructure with CDK

Default Single Instance Deployment

Client-Server Architecture with Cluster Placement Group

Multi-AZ Deployment

AMI Builder Deployment

2. Run the Benchmark Tests

3. Collect and Analyze Results

Understanding the Results

Advanced Configuration

Pre-Built OS-Tuned AMIs

Tuning OS Parameters

JVM Tuning

Optimization Techniques

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

License

aws-samples/trading-latency-benchmark

Folders and files

Latest commit

History

Repository files navigation

AWS EC2 Network Latency Benchmark for Trading Applications

Project Overview

Purpose

Why This Matters

Components

Sequence Diagram

Prerequisites

Getting Started

Option A: Quick Start with Pre-Built OS-Tuned AMI (Recommended)

Option B: Standard Deployment with Manual OS Tuning

1. Deploy Infrastructure with CDK

Default Single Instance Deployment

Client-Server Architecture with Cluster Placement Group

Multi-AZ Deployment

AMI Builder Deployment

2. Run the Benchmark Tests

3. Collect and Analyze Results

Understanding the Results

Advanced Configuration

Pre-Built OS-Tuned AMIs

Tuning OS Parameters

JVM Tuning

Optimization Techniques

Contributing

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages