A comprehensive solution for detecting usage anomalies across multiple AWS accounts with natural language insights powered by Amazon Q for Business.
- Organization-wide CloudTrail: Centralized logging from all AWS accounts
- Cross-account anomaly detection: Unified visibility across your entire organization
- Account-aware insights: Context-rich alerts with account metadata
- High-cardinality detection: Account ID and region-based categorization
- Multiple service support: EC2, Lambda, and EBS anomaly detection
- Intelligent thresholds: Account type-aware threshold configuration
- Amazon Q for Business integration: Query anomalies using natural language
- Cost impact analysis: Automatic cost implications for detected anomalies
- Security recommendations: Contextual security guidance for each anomaly type
- Real-time dashboards: CloudWatch dashboards with system health metrics
- Proactive alerting: SNS-based notifications with detailed context
- System health monitoring: Automated health checks and custom metrics
graph TB
subgraph "Organization Accounts"
A1[Account 1] --> CT[Organization CloudTrail]
A2[Account 2] --> CT
A3[Account N] --> CT
end
CT --> CWL[CloudWatch Logs]
CWL --> LAM[Multi-Account Logs Lambda]
LAM --> OS[OpenSearch Domain]
OS --> AD[Anomaly Detectors]
AD --> AL[Alerting]
AL --> SNS[SNS Topics]
OS --> QC[Q Business Connector]
QC --> QB[Q Business Application]
QB --> IC[Identity Center]
subgraph "Monitoring"
SHM[System Health Monitor]
CWD[CloudWatch Dashboard]
DLQ[Dead Letter Queue]
end
subgraph "User Access"
U1[Security Team] --> OSD[OpenSearch Dashboards]
U1 --> QBI[Q Business Interface]
U1 --> CWD
end
-
AWS Account Setup:
- AWS Organizations enabled
- Management account access
- CDK v2.110.0+ installed
-
Local Environment:
# Install required tools npm install -g aws-cdk pip install -r requirements.txt
-
AWS Credentials:
aws configure # Ensure you have admin permissions in the management account
-
Clone and Setup:
git clone <repository-url> cd aws-usage-anomaly-detection
-
Deploy Multi-Account System:
./deploy_multi_account_enhanced.sh
-
Validate Deployment:
python3 validate_enhanced_deployment.py
cdk deploy UsageAnomalyDetectorStack
cdk deploy --context deployment-mode=multi-account --all
# 1. Organization Trail (Management Account)
cdk deploy OrganizationTrailStack
# 2. Base OpenSearch Stack
cdk deploy EnhancedUsageAnomalyDetectorStack
# 3. Multi-Account Enhancements
cdk deploy MultiAccountAnomalyStack
# 4. Q Business Integration (Optional)
cdk deploy QBusinessInsightsStack
Variable | Description | Default |
---|---|---|
DEPLOYMENT_MODE |
Deployment mode (single-account/multi-account) | single-account |
AWS_DEFAULT_REGION |
AWS region for deployment | us-east-1 |
ENABLE_Q_BUSINESS |
Enable Q Business integration | true |
ENABLE_COST_ANALYSIS |
Enable cost impact analysis | true |
Configure account types using AWS Organizations tags:
{
"AccountType": "production|staging|development",
"Environment": "prod|staging|dev",
"CostCenter": "engineering|security|operations"
}
Customize thresholds in lambdas/CrossAccountAnomalyProcessor/config.py
:
THRESHOLDS = {
'production': {'ec2': 10, 'lambda': 1000, 'ebs': 20},
'staging': {'ec2': 5, 'lambda': 500, 'ebs': 10},
'development': {'ec2': 2, 'lambda': 100, 'ebs': 5}
}
Access the monitoring dashboard:
- Go to CloudWatch Console
- Navigate to Dashboards
- Open "MultiAccountAnomalyDetection"
Subscribe to system alerts:
aws sns subscribe \
--topic-arn <SystemAlertsTopicArn> \
--protocol email \
--notification-endpoint [email protected]
The system publishes custom metrics to the MultiAccountAnomalyDetection
namespace:
OverallHealthScore
: System health percentage (0-100)ProcessingSuccessRate
: Event processing success rateLambdaErrorRate
: Lambda function error ratesOpenSearchUnassignedShards
: OpenSearch cluster health
-
Identity Center Configuration:
- Automatic setup during deployment
- Creates "QBusinessAdmins" group
- Configures application assignments
-
User Access:
# Add users to Q Business admin group aws identitystore create-group-membership \ --identity-store-id <IdentityStoreId> \ --group-id <QBusinessAdminGroupId> \ --member-id <UserId>
Example queries you can ask Q Business:
- "Show me EC2 anomalies from the last 24 hours"
- "What accounts had the highest cost impact this week?"
- "Are there any security concerns with recent Lambda anomalies?"
- "Compare anomaly patterns between production and staging accounts"
-
CDK Version Compatibility:
# Upgrade CDK npm install -g aws-cdk@latest pip install -r requirements.txt --upgrade
-
Organization Permissions:
# Verify organization access aws organizations list-accounts
-
OpenSearch Access:
# Check domain status aws opensearch describe-domain --domain-name <domain-name>
Run comprehensive validation:
python3 validate_enhanced_deployment.py
Check Lambda function logs:
# Multi-account logs processor
aws logs tail /aws/lambda/MultiAccountAnomalyStack-MultiAccountLogsFunction --follow
# Q Business connector
aws logs tail /aws/lambda/MultiAccountAnomalyStack-QBusinessConnectorFunction --follow
# System health monitor
aws logs tail /aws/lambda/MultiAccountAnomalyStack-SystemHealthMonitorFunction --follow
The system follows the principle of least privilege:
- Lambda Functions: Minimal permissions for their specific tasks
- Cross-Account Access: Secure trust relationships
- OpenSearch: Fine-grained access control
- Q Business: Identity Center-based authentication
- In Transit: All API calls use TLS
- At Rest: OpenSearch and S3 encryption enabled
- CloudTrail: KMS encryption for log files
- VPC Deployment: Optional VPC deployment for OpenSearch
- Security Groups: Restrictive security group rules
- Private Endpoints: VPC endpoints for AWS services
Component | Default | Scaling |
---|---|---|
Lambda Concurrency | 1000 | Auto-scaling |
OpenSearch Instances | t3.small.search | Manual scaling |
CloudWatch Logs | Unlimited | Pay-per-use |
- Reserved Instances: Consider reserved OpenSearch instances
- Log Retention: Configure appropriate log retention periods
- Lambda Memory: Optimize Lambda memory allocation
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
python -m pytest tests/
# Run linting
flake8 lambdas/
This project is licensed under the MIT License - see the LICENSE file for details.
- Documentation: Check this README and inline code comments
- Issues: Create GitHub issues for bugs and feature requests
- Validation: Use the validation script for deployment issues
-
Update Dependencies:
pip install -r requirements.txt --upgrade npm update
-
Monitor System Health:
- Check CloudWatch dashboards daily
- Review SNS alerts
- Run validation script weekly
-
Review Anomaly Patterns:
- Analyze false positives
- Adjust thresholds as needed
- Update account classifications
The system supports rolling updates:
# Update with zero downtime
cdk deploy --all --require-approval never
After deployment, monitor these key metrics:
- Processing Success Rate: >95%
- Lambda Error Rate: <1%
- OpenSearch Health: Green
- Alert Response Time: <5 minutes
For detailed metrics, check the CloudWatch dashboard or run the validation script.