π AWS re:Invent 2025 Workshop | 300-Level Expert Session
Duration: 120 minutes | Level: 300 - Expert
Discover how to leverage generative AI to transform PostgreSQL database management through an integrated solution combining incident detection and response (IDR) with the Model Context Protocol (MCP) for performance optimization. Build a comprehensive system utilizing Amazon Aurora PostgreSQL-Compatible Edition with pgvector that creates a robust vector store from diverse data sources including database documentation, runbooks, and incident records.
What You'll Build:
- AI-powered incident detection and response system with Mahavat Agent
- MCP-based agents for database performance optimization
- Vector-enabled knowledge base with runbooks and documentation
- Intelligent remediation recommendations using generative AI
- Real-time performance monitoring and automated scaling
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AWS Workshop Studio Environment β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β VS Code IDE (Code Editor) β
β βββ Mahavat Agent V1 (IDR) βββ Mahavat Agent V2 (Unified) β
β βββ MCP Servers (Local STDIO) βββ Streamlit UI β
β βββ Workshop Repository βββ Load Testing Tools β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Authentication & Security β
β βββ AWS Cognito (User Pool) βββ IAM Roles & Policies β
β βββ Admin/Readonly Users βββ Workshop Studio Integration β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Database Infrastructure β
β βββ Main Aurora PostgreSQL 17.x (pgvector enabled) β
β βββ IDR Aurora Serverless v2 (ACU scaling tests) β
β βββ IDR Provisioned Instance (IOPS testing) β
β βββ DynamoDB (Incident tracking) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β AI & Knowledge Management β
β βββ Amazon Bedrock (Claude Sonnet 4, Titan Embed) β
β βββ Knowledge Base (S3 + pgvector) β
β βββ Vector Store (Runbooks, Documentation) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Monitoring & Observability β
β βββ CloudWatch Alarms & Metrics β
β βββ Performance Insights β
β βββ Automated Incident Creation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββ mahavat_agent/
β βββ mahavat_agent_v1.py # IDR Agent - Incident Detection & Response
β βββ mahavat_agent_v2.py # Unified Agent with MCP integration
β βββ pi_mcp_server.py # Performance Insights MCP server
β βββ idr_mcp_server.py # IDR MCP server
β βββ postgres_query_provider.py # PostgreSQL query provider
β βββ requirements.txt # Python dependencies
βββ database-workload/
β βββ simulation-2.py # Database workload simulation
β βββ simulation-3.py # Advanced workload patterns
βββ load-test/
β βββ stress_test.py # Database stress testing
β βββ acu-test.sh # ACU scaling tests
β βββ iops-test.sh # IOPS performance tests
βββ runbooks/
β βββ acu_remediation.md # Aurora Serverless ACU remediation
β βββ iops_remediation.md # IOPS optimization runbook
βββ scripts/
βββ workshop-setup-complete-dynamic.sh # Complete workshop setup
βββ validate-environment.sh # Environment validation
βββ database/ # Database setup scripts
βββ 01-extensions.sql
βββ 02-roles.sql
βββ 03-tables.sql
- Access Workshop Studio environment
- Verify VS Code IDE access
- Validate infrastructure deployment
Hands-On Activities:
- Start IDR Agent - Launch Mahavat Agent V1 with Streamlit UI
- Configure CloudWatch Alarms - Set up IOPS monitoring and incident triggers
- Create Knowledge Base - Deploy Bedrock Knowledge Base with vector storage
- Add Runbooks - Upload and sync remediation runbooks to vector store
- Simulate IOPS Incident - Trigger performance issues and observe detection
- Get Runbook Recommendations - Experience AI-powered runbook retrieval
- Remediate IOPS Incident - Follow AI recommendations to resolve issues
Key Learning:
- Vector similarity search for incident matching
- Automated runbook recommendations using pgvector
- Integration with DynamoDB for incident tracking
- CloudWatch alarm integration with Lambda triggers
Hands-On Activities:
- Start Unified Agent - Launch Mahavat Agent V2 with MCP integration
- Configure ACU Alarms - Set up Aurora Serverless v2 scaling monitoring
- Upload ACU Runbooks - Add serverless-specific remediation guides
- Simulate ACU Incident - Trigger capacity scaling scenarios
- Experience MCP Queries - Natural language database performance queries
- Remediate ACU Incident - Use MCP-powered recommendations
- Performance Analysis - Deep dive into Performance Insights data
Key Learning:
- Model Context Protocol implementation for database management
- Aurora Serverless v2 ACU scaling patterns
- Performance Insights integration through MCP
- Natural language to SQL translation with Claude Sonnet 4
Deep Dive:
- Agent code walkthrough and architecture patterns
- MCP server implementation details
- Vector store optimization techniques
- Customization strategies for production use
This workshop is delivered through AWS Workshop Studio - no personal AWS account required!
- Access Workshop Portal - Use provided Workshop Studio URL
- Login - Use your registration credentials
- Launch Environment - Click "Open VS Code IDE"
- Verify Setup - All infrastructure is pre-deployed
# Validate workshop environment
./scripts/validate-environment.shIDR Agent (Module 1):
cd mahavat_agent
./mahavat_agent_v1.shUnified Agent (Module 2):
cd mahavat_agent
./mahavat_agent_v2.sh- Vector Similarity Search: Match incidents to historical patterns using pgvector
- Automated Runbook Retrieval: AI-powered remediation guide recommendations
- Context-Aware Responses: Leverage database state and CloudWatch metrics
- DynamoDB Integration: Track incident lifecycle and resolution status
- Natural Language Queries: "Show me slow queries from the last hour"
- Performance Insights Integration: Direct access to PI data through MCP
- Intelligent Analysis: AI-powered performance bottleneck identification
- Proactive Recommendations: Prevent issues before they impact users
- Vector Store: Searchable documentation and runbooks using pgvector
- Continuous Learning: Improve responses from incident history
- Multi-Modal Context: Combine metrics, logs, and documentation
- Bedrock Integration: Titan embeddings for semantic search
| Service | Purpose | Configuration |
|---|---|---|
| Amazon Aurora PostgreSQL 17.x | Primary database with pgvector extension | r7g.xlarge, Multi-AZ |
| Aurora Serverless v2 | ACU scaling demonstration | 0.5-16 ACU range |
| Aurora Provisioned | IOPS testing and optimization | gp3 storage, configurable IOPS |
| Amazon Bedrock | Claude Sonnet 4, Titan Embed v2 | us-west-2 region |
| Amazon DynamoDB | Incident tracking and state management | On-demand billing |
| AWS Cognito | User authentication (admin/readonly) | User pool with 2 users |
| Amazon CloudWatch | Performance metrics and alarming | Custom metrics, Lambda triggers |
| AWS Performance Insights | Database performance analysis | 7-day retention |
| Amazon S3 | Knowledge base document storage | Versioned bucket |
| AWS Lambda | Incident creation automation | Python 3.9 runtime |
# Aurora Serverless ACU scaling test
./load-test/acu-test.sh
# IOPS performance and scaling test
./load-test/iops-test.sh
# Comprehensive database stress test
python load-test/stress_test.py# Basic workload patterns
python database-workload/simulation-2.py
# Advanced performance scenarios
python database-workload/simulation-3.py- CloudWatch Dashboards: Pre-configured performance dashboards
- Performance Insights: Query-level performance analysis
- Custom Metrics: Application-specific monitoring
- Automated Alerting: Lambda-triggered incident creation
- Complex Multi-System Failures: Incidents requiring contextual analysis
- Knowledge Retention: Preserve tribal knowledge in searchable vector stores
- Rapid Response: Reduce MTTR with automated runbook retrieval
- Pattern Recognition: Learn from historical incident data
- Continuous Improvement: Evolve responses based on outcomes
- Structured Queries: Natural language to SQL translation with context
- Dynamic Tool Selection: Choose appropriate data sources per query intent
- Context Preservation: Maintain conversation state across multiple tools
- Security Integration: Row-level security with persona-based access
- Real-time Analysis: Direct access to live performance data
- Vector Index Optimization: HNSW indexes for large-scale similarity search
- Caching Strategy: Redis for frequently accessed runbooks and queries
- Monitoring Integration: Custom CloudWatch metrics for agent performance
- Security: IAM roles, Cognito integration, and data encryption
- Scalability: Aurora Serverless v2 for variable workloads
- Custom Runbooks: Add domain-specific remediation procedures
- Integration: Connect with existing monitoring and ticketing systems
- Custom MCP Servers: Build specialized tools for your environment
- Advanced Analytics: Implement predictive incident detection
- Model Context Protocol - Standardized AI tool protocol
- pgvector - Vector similarity search for PostgreSQL
- Aurora PostgreSQL - Managed PostgreSQL database
- Strands Agent Framework - MCP-compatible agent development
- Amazon Bedrock - Claude Sonnet 4 and Titan models
- Performance Insights - Database performance monitoring
- Aurora Serverless v2 - Auto-scaling database
- Workshop Studio - AWS workshop platform
- GitHub Repository: riv25-dat301 (reInvent-2025 branch)
- Workshop Guide: Available in VS Code IDE environment
- Sample Data: Pre-loaded incident scenarios and runbooks
This workshop is maintained by AWS and the community. For issues, improvements, or questions:
- π Report issues through Workshop Studio feedback
- π‘ Suggest improvements via workshop evaluation
- β Star the repository for updates
- π΄ Fork for your own customizations
This library is licensed under the MIT-0 License. See the LICENSE file.
AWS re:Invent 2025 | DAT301 - 300 Level Expert Session
AI powered PostgreSQL: Incident detection & MCP integration
Workshop Authors: Ramesh Kumar Venkatraman, Chirag Dave