Skip to content

Detect AWS usage anomalies in near-real time using OpenSearch Anomaly Detection and CloudTrail for improved cost management and security

License

Notifications You must be signed in to change notification settings

NithinChandranR-AWS/near-realtime-aws-usage-anomaly-detection

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Enhanced Multi-Account AWS Usage Anomaly Detection System

A comprehensive solution for detecting usage anomalies across multiple AWS accounts with natural language insights powered by Amazon Q for Business.

🌟 Features

Multi-Account Support

  • Organization-wide CloudTrail: Centralized logging from all AWS accounts
  • Cross-account anomaly detection: Unified visibility across your entire organization
  • Account-aware insights: Context-rich alerts with account metadata

Enhanced Anomaly Detection

  • High-cardinality detection: Account ID and region-based categorization
  • Multiple service support: EC2, Lambda, and EBS anomaly detection
  • Intelligent thresholds: Account type-aware threshold configuration

Natural Language Insights

  • Amazon Q for Business integration: Query anomalies using natural language
  • Cost impact analysis: Automatic cost implications for detected anomalies
  • Security recommendations: Contextual security guidance for each anomaly type

Comprehensive Monitoring

  • Real-time dashboards: CloudWatch dashboards with system health metrics
  • Proactive alerting: SNS-based notifications with detailed context
  • System health monitoring: Automated health checks and custom metrics

πŸ—οΈ Architecture

graph TB
    subgraph "Organization Accounts"
        A1[Account 1] --> CT[Organization CloudTrail]
        A2[Account 2] --> CT
        A3[Account N] --> CT
    end
    
    CT --> CWL[CloudWatch Logs]
    CWL --> LAM[Multi-Account Logs Lambda]
    LAM --> OS[OpenSearch Domain]
    
    OS --> AD[Anomaly Detectors]
    AD --> AL[Alerting]
    AL --> SNS[SNS Topics]
    
    OS --> QC[Q Business Connector]
    QC --> QB[Q Business Application]
    QB --> IC[Identity Center]
    
    subgraph "Monitoring"
        SHM[System Health Monitor]
        CWD[CloudWatch Dashboard]
        DLQ[Dead Letter Queue]
    end
    
    subgraph "User Access"
        U1[Security Team] --> OSD[OpenSearch Dashboards]
        U1 --> QBI[Q Business Interface]
        U1 --> CWD
    end
Loading

πŸš€ Quick Start

Prerequisites

  1. AWS Account Setup:

    • AWS Organizations enabled
    • Management account access
    • CDK v2.110.0+ installed
  2. Local Environment:

    # Install required tools
    npm install -g aws-cdk
    pip install -r requirements.txt
  3. AWS Credentials:

    aws configure
    # Ensure you have admin permissions in the management account

Deployment

  1. Clone and Setup:

    git clone <repository-url>
    cd aws-usage-anomaly-detection
  2. Deploy Multi-Account System:

    ./deploy_multi_account_enhanced.sh
  3. Validate Deployment:

    python3 validate_enhanced_deployment.py

πŸ“‹ Deployment Options

Single Account Mode

cdk deploy UsageAnomalyDetectorStack

Multi-Account Mode

cdk deploy --context deployment-mode=multi-account --all

Manual Stack Deployment

# 1. Organization Trail (Management Account)
cdk deploy OrganizationTrailStack

# 2. Base OpenSearch Stack
cdk deploy EnhancedUsageAnomalyDetectorStack

# 3. Multi-Account Enhancements
cdk deploy MultiAccountAnomalyStack

# 4. Q Business Integration (Optional)
cdk deploy QBusinessInsightsStack

πŸ”§ Configuration

Environment Variables

Variable Description Default
DEPLOYMENT_MODE Deployment mode (single-account/multi-account) single-account
AWS_DEFAULT_REGION AWS region for deployment us-east-1
ENABLE_Q_BUSINESS Enable Q Business integration true
ENABLE_COST_ANALYSIS Enable cost impact analysis true

Account Type Configuration

Configure account types using AWS Organizations tags:

{
  "AccountType": "production|staging|development",
  "Environment": "prod|staging|dev",
  "CostCenter": "engineering|security|operations"
}

Anomaly Thresholds

Customize thresholds in lambdas/CrossAccountAnomalyProcessor/config.py:

THRESHOLDS = {
    'production': {'ec2': 10, 'lambda': 1000, 'ebs': 20},
    'staging': {'ec2': 5, 'lambda': 500, 'ebs': 10},
    'development': {'ec2': 2, 'lambda': 100, 'ebs': 5}
}

πŸ“Š Monitoring and Alerting

CloudWatch Dashboard

Access the monitoring dashboard:

  1. Go to CloudWatch Console
  2. Navigate to Dashboards
  3. Open "MultiAccountAnomalyDetection"

SNS Alerts

Subscribe to system alerts:

aws sns subscribe \
  --topic-arn <SystemAlertsTopicArn> \
  --protocol email \
  --notification-endpoint [email protected]

Custom Metrics

The system publishes custom metrics to the MultiAccountAnomalyDetection namespace:

  • OverallHealthScore: System health percentage (0-100)
  • ProcessingSuccessRate: Event processing success rate
  • LambdaErrorRate: Lambda function error rates
  • OpenSearchUnassignedShards: OpenSearch cluster health

πŸ€– Amazon Q for Business Integration

Setup

  1. Identity Center Configuration:

    • Automatic setup during deployment
    • Creates "QBusinessAdmins" group
    • Configures application assignments
  2. User Access:

    # Add users to Q Business admin group
    aws identitystore create-group-membership \
      --identity-store-id <IdentityStoreId> \
      --group-id <QBusinessAdminGroupId> \
      --member-id <UserId>

Natural Language Queries

Example queries you can ask Q Business:

  • "Show me EC2 anomalies from the last 24 hours"
  • "What accounts had the highest cost impact this week?"
  • "Are there any security concerns with recent Lambda anomalies?"
  • "Compare anomaly patterns between production and staging accounts"

πŸ” Troubleshooting

Common Issues

  1. CDK Version Compatibility:

    # Upgrade CDK
    npm install -g aws-cdk@latest
    pip install -r requirements.txt --upgrade
  2. Organization Permissions:

    # Verify organization access
    aws organizations list-accounts
  3. OpenSearch Access:

    # Check domain status
    aws opensearch describe-domain --domain-name <domain-name>

Validation Script

Run comprehensive validation:

python3 validate_enhanced_deployment.py

Log Analysis

Check Lambda function logs:

# Multi-account logs processor
aws logs tail /aws/lambda/MultiAccountAnomalyStack-MultiAccountLogsFunction --follow

# Q Business connector
aws logs tail /aws/lambda/MultiAccountAnomalyStack-QBusinessConnectorFunction --follow

# System health monitor
aws logs tail /aws/lambda/MultiAccountAnomalyStack-SystemHealthMonitorFunction --follow

πŸ”’ Security Considerations

IAM Permissions

The system follows the principle of least privilege:

  • Lambda Functions: Minimal permissions for their specific tasks
  • Cross-Account Access: Secure trust relationships
  • OpenSearch: Fine-grained access control
  • Q Business: Identity Center-based authentication

Data Encryption

  • In Transit: All API calls use TLS
  • At Rest: OpenSearch and S3 encryption enabled
  • CloudTrail: KMS encryption for log files

Network Security

  • VPC Deployment: Optional VPC deployment for OpenSearch
  • Security Groups: Restrictive security group rules
  • Private Endpoints: VPC endpoints for AWS services

πŸ“ˆ Performance and Scaling

Capacity Planning

Component Default Scaling
Lambda Concurrency 1000 Auto-scaling
OpenSearch Instances t3.small.search Manual scaling
CloudWatch Logs Unlimited Pay-per-use

Cost Optimization

  • Reserved Instances: Consider reserved OpenSearch instances
  • Log Retention: Configure appropriate log retention periods
  • Lambda Memory: Optimize Lambda memory allocation

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

Development Setup

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
python -m pytest tests/

# Run linting
flake8 lambdas/

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

  • Documentation: Check this README and inline code comments
  • Issues: Create GitHub issues for bugs and feature requests
  • Validation: Use the validation script for deployment issues

πŸ”„ Updates and Maintenance

Regular Maintenance

  1. Update Dependencies:

    pip install -r requirements.txt --upgrade
    npm update
  2. Monitor System Health:

    • Check CloudWatch dashboards daily
    • Review SNS alerts
    • Run validation script weekly
  3. Review Anomaly Patterns:

    • Analyze false positives
    • Adjust thresholds as needed
    • Update account classifications

Version Updates

The system supports rolling updates:

# Update with zero downtime
cdk deploy --all --require-approval never

πŸ“Š System Metrics

After deployment, monitor these key metrics:

  • Processing Success Rate: >95%
  • Lambda Error Rate: <1%
  • OpenSearch Health: Green
  • Alert Response Time: <5 minutes

For detailed metrics, check the CloudWatch dashboard or run the validation script.

About

Detect AWS usage anomalies in near-real time using OpenSearch Anomaly Detection and CloudTrail for improved cost management and security

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 81.0%
  • JavaScript 16.1%
  • Shell 2.9%