From 2db281d3dac2c80b8cceb58ba79f399cb92eb0d9 Mon Sep 17 00:00:00 2001 From: Matt Dotson Date: Wed, 15 Oct 2025 17:55:28 +0000 Subject: [PATCH 1/7] fix(docs): clarify networking requirements for multi-region Power Platform geographies --- docs/bcdr_considerations.md | 29 +++++++++++------------------ 1 file changed, 11 insertions(+), 18 deletions(-) diff --git a/docs/bcdr_considerations.md b/docs/bcdr_considerations.md index 000a5600..17e1961f 100644 --- a/docs/bcdr_considerations.md +++ b/docs/bcdr_considerations.md @@ -4,7 +4,7 @@ This document explains the disaster recovery (DR) and regional resilience capabi ## Overview -This template is designed to deploy an enterprise-grade integration between Microsoft Copilot Studio and Azure AI Search, following Azure Well-Architected Framework best practices for security and reliability. It provisions a complete primary-region footprint and lays down optional networking scaffolding for a secondary (failover) region. However, the template does **not** stand up duplicate workload resources or automate failover; those tasks remain with the adopter. +This template is designed to deploy an enterprise-grade integration between Microsoft Copilot Studio and Azure AI Search, following Azure Well-Architected Framework best practices for security and reliability. It provisions a complete primary-region footprint and—where the selected Power Platform geography is backed by two Azure regions—provisions required dual-region networking scaffolding (virtual networks, subnets, private DNS integration) in both regions. This dual-networking is a compliance prerequisite for Enterprise Policy / virtual network delegation in multi-region geographies and is **not optional**, even if you have not yet implemented active regional failover. The template does **not** stand up duplicate workload resources or automate failover; those tasks remain with the adopter. ## Scenarios @@ -35,7 +35,7 @@ This document discusses the considerations for deploying the template in three r **Networking:** -- Manually disable secondary-region networking scaffolding to minimize cost and complexity. **GAP**: No template parameter currently toggles this scaffolding off. +- In multi-region Power Platform geographies (for example: United States, Europe, Canada, Australia) dual-region virtual networks are **required** to create the Enterprise Policy virtual network delegation; even if secondary region is not needed. Do not remove secondary-region networking in these geographies. In a geography that is truly single-region (verify with [current Microsoft documentation](https://learn.microsoft.com/power-platform/admin/vnet-support-setup-configure)), a single VNet footprint would be sufficient; **Application Insights (optional):** @@ -76,9 +76,7 @@ This document discusses the considerations for deploying the template in three r - Expect near-zero data loss (RPO) and sub-5-minute failover (RTO) for intra-region zone failures with production environments. - Defer enabling Power Platform self-service disaster recovery to the Regional failover ready tier (it is a cross-region capability, not zone-level). -**Networking:** -- Deploy VNets, subnets, NAT Gateways, and private endpoints in Azure regions that support availability zones (zone redundancy is enabled by default). -- Manually disable secondary-network scaffolding to minimize cost and complexity. **GAP**: No template parameter currently toggles this scaffolding off +- In multi-region Power Platform geographies (for example: United States, Europe, Canada, Australia) dual-region virtual networks are **required** to create the Enterprise Policy virtual network delegation; even if secondary region is not needed. Do not remove secondary-region networking in these geographies. In a geography that is truly single-region (verify with [current Microsoft documentation](https://learn.microsoft.com/power-platform/admin/vnet-support-setup-configure)), a single VNet footprint would be sufficient; **Application Insights:** @@ -86,7 +84,10 @@ This document discusses the considerations for deploying the template in three r ### Regional failover ready -**Regional failover ready** is for mission-critical workloads that need a backup environment in a paired Azure region to recover from regional incidents. This scenario builds on the Zone-redundant tier by adding cross-region data replication and networking scaffolding to support manual failover to a secondary region. +**Regional failover ready** is for mission-critical workloads that need a backup environment in a paired Azure region to recover from regional incidents. This scenario builds on the Zone-redundant tier by adding cross-region data replication and networking scaffolding to support manual failover to a secondary region. + +> [!IMPORTANT] +> The template does NOT implement automated cross‑region failover (secondary AI Search/OpenAI/Storage resources, data replication, DNS or traffic management, or orchestration). Use the provided dual-region networking plus the guidance below as the foundation for your own secondary provisioning, replication, runbooks, and traffic failover automation. #### Regional Failover Ready Recommendations @@ -203,24 +204,16 @@ The CI/CD infrastructure (GitHub runners and supporting resources) deployed by t **Workaround:** Pre-build Infrastructure-as-Code overlays or scripts that instantiate secondary-region workload resources, configure data replication, and automate DNS failover when needed. -**Status:** This is by design. The template provides the networking foundation for regional failover but does not automate the full disaster recovery workflow. - -### Secondary Region Networking Scaffolding Cannot Be Disabled - -**Impact:** Users deploying Basic (development) scenarios who want to minimize costs cannot disable the secondary-region networking scaffolding through template configuration. - -**Current Behavior:** The template always provisions secondary-region networking resources (VNets, subnets, NAT gateways, private-endpoint subnets) regardless of the desired resilience tier. - -**Workaround:** Manually delete the secondary-region networking resources after deployment, or modify the Terraform code to comment out the secondary region network modules. - -**Status:** No template parameter currently exists to toggle this scaffolding on/off. This is a known limitation that requires manual intervention for cost optimization in development scenarios. +**Status:** Work towards full regional failover support depends on user feedback and customer needs. ## References - [Azure Well-Architected Framework: Reliability](https://learn.microsoft.com/azure/architecture/framework/resiliency/overview) - [Power Platform Disaster Recovery](https://learn.microsoft.com/power-platform/admin/business-continuity-disaster-recovery) +- [Power Platform Virtual Network Setup (Enterprise Policy dual-VNet requirement)](https://learn.microsoft.com/power-platform/admin/vnet-support-setup-configure) +- [Power Platform Virtual Network Overview & FAQ (Failover requires delegation in both regions)](https://learn.microsoft.com/power-platform/admin/vnet-support-overview#frequently-asked-questions) - [Azure AI Search Geo-Redundancy](https://learn.microsoft.com/azure/reliability/reliability-ai-search) --- -**Last updated:** October 3, 2025 +**Last updated:** October 15, 2025 From 703cfd04b8652d6fb925dd23fbee72fda90ea053 Mon Sep 17 00:00:00 2001 From: Matt Dotson Date: Wed, 15 Oct 2025 14:10:05 -0700 Subject: [PATCH 2/7] Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/bcdr_considerations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/bcdr_considerations.md b/docs/bcdr_considerations.md index 17e1961f..262d3af8 100644 --- a/docs/bcdr_considerations.md +++ b/docs/bcdr_considerations.md @@ -35,7 +35,7 @@ This document discusses the considerations for deploying the template in three r **Networking:** -- In multi-region Power Platform geographies (for example: United States, Europe, Canada, Australia) dual-region virtual networks are **required** to create the Enterprise Policy virtual network delegation; even if secondary region is not needed. Do not remove secondary-region networking in these geographies. In a geography that is truly single-region (verify with [current Microsoft documentation](https://learn.microsoft.com/power-platform/admin/vnet-support-setup-configure)), a single VNet footprint would be sufficient; +- In multi-region Power Platform geographies (for example: United States, Europe, Canada, Australia) dual-region virtual networks are **required** to create the Enterprise Policy virtual network delegation, even if the secondary region is not needed. Do not remove secondary-region networking in these geographies. In a geography that is truly single-region (verify with [current Microsoft documentation](https://learn.microsoft.com/power-platform/admin/vnet-support-setup-configure)), a single VNet footprint would be sufficient. **Application Insights (optional):** From 3816d75328287d07281c159c34742d8e05db01e4 Mon Sep 17 00:00:00 2001 From: Matt Dotson Date: Wed, 15 Oct 2025 14:10:11 -0700 Subject: [PATCH 3/7] Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/bcdr_considerations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/bcdr_considerations.md b/docs/bcdr_considerations.md index 262d3af8..d05c6ea9 100644 --- a/docs/bcdr_considerations.md +++ b/docs/bcdr_considerations.md @@ -76,7 +76,7 @@ This document discusses the considerations for deploying the template in three r - Expect near-zero data loss (RPO) and sub-5-minute failover (RTO) for intra-region zone failures with production environments. - Defer enabling Power Platform self-service disaster recovery to the Regional failover ready tier (it is a cross-region capability, not zone-level). -- In multi-region Power Platform geographies (for example: United States, Europe, Canada, Australia) dual-region virtual networks are **required** to create the Enterprise Policy virtual network delegation; even if secondary region is not needed. Do not remove secondary-region networking in these geographies. In a geography that is truly single-region (verify with [current Microsoft documentation](https://learn.microsoft.com/power-platform/admin/vnet-support-setup-configure)), a single VNet footprint would be sufficient; +- In multi-region Power Platform geographies (for example: United States, Europe, Canada, Australia) dual-region virtual networks are **required** to create the Enterprise Policy virtual network delegation, even if the secondary region is not needed. Do not remove secondary-region networking in these geographies. In a geography that is truly single-region (verify with [current Microsoft documentation](https://learn.microsoft.com/power-platform/admin/vnet-support-setup-configure)), a single VNet footprint would be sufficient. **Application Insights:** From 21e252695d3800b59952f49b6ca9ef521063c2d8 Mon Sep 17 00:00:00 2001 From: Matt Dotson Date: Wed, 15 Oct 2025 14:10:28 -0700 Subject: [PATCH 4/7] Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/bcdr_considerations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/bcdr_considerations.md b/docs/bcdr_considerations.md index d05c6ea9..a10af2ae 100644 --- a/docs/bcdr_considerations.md +++ b/docs/bcdr_considerations.md @@ -4,7 +4,7 @@ This document explains the disaster recovery (DR) and regional resilience capabi ## Overview -This template is designed to deploy an enterprise-grade integration between Microsoft Copilot Studio and Azure AI Search, following Azure Well-Architected Framework best practices for security and reliability. It provisions a complete primary-region footprint and—where the selected Power Platform geography is backed by two Azure regions—provisions required dual-region networking scaffolding (virtual networks, subnets, private DNS integration) in both regions. This dual-networking is a compliance prerequisite for Enterprise Policy / virtual network delegation in multi-region geographies and is **not optional**, even if you have not yet implemented active regional failover. The template does **not** stand up duplicate workload resources or automate failover; those tasks remain with the adopter. +This template is designed to deploy an enterprise-grade integration between Microsoft Copilot Studio and Azure AI Search, following Azure Well-Architected Framework best practices for security and reliability. It provisions a complete primary-region footprint and—where the selected Power Platform geography is backed by two Azure regions—provisions required dual-region networking scaffolding (virtual networks, subnets, private DNS integration) in both regions. This dual-networking is a compliance prerequisite for Enterprise Policy virtual network delegation in multi-region geographies and is **not optional**, even if you have not yet implemented active regional failover. The template does **not** stand up duplicate workload resources or automate failover; those tasks remain with the adopter. ## Scenarios From a2682a4cf18b8a3fade13fe2b9b174c421bf34f1 Mon Sep 17 00:00:00 2001 From: Matt Dotson Date: Wed, 15 Oct 2025 23:56:11 +0000 Subject: [PATCH 5/7] feat(docs): add infrastructure resilience and testing guides for Copilot Studio integration --- README.md | 125 ++---------------- ...ations.md => infrastructure_resilience.md} | 80 ++--------- docs/testing.md | 99 ++++++++++++++ 3 files changed, 122 insertions(+), 182 deletions(-) rename docs/{bcdr_considerations.md => infrastructure_resilience.md} (67%) create mode 100644 docs/testing.md diff --git a/README.md b/README.md index f1616a79..51c5be2c 100644 --- a/README.md +++ b/README.md @@ -222,138 +222,37 @@ To clean up all the resources created by this sample: All the Azure and Power Platform resources will be deleted. -## Testing - -This solution includes tests that validate both Copilot Studio and Azure AI Search components after deployment. - -### Copilot Studio Agent Test - -Located in `tests/Copilot/`, this test validates: - -- **Conversation Flow**: End-to-end conversation test with the deployed agent -- **Integration**: Validation that Copilot Studio can successfully query Azure AI Search - -Currently, [the Copilot Studio Client in the Agent SDK does not support the use of Service Principals for authentication](https://github.com/microsoft/Agents/blob/main/samples/basic/copilotstudio-client/dotnet/README.md#create-an-application-registration-in-entra-id---service-principal-login), and testing requires a cloud-native app registration as well as a test account with MFA turned off. The test user account must have access to the Power Platform environment containing the agent as well as access to the agent itself. - -#### Running Tests After Local Deployment Execution - -After a successful local deployment execution, the local .env file contains most of the information needed to run the end-to-end Copilot Studio test. Alternatively, any test input can be set directly through environment variables. - -Run the commands below to execute the test after a deployment. - -```bash -# Navigate to the test directory -cd tests/Copilot - -export POWER_PLATFORM_USERNAME="test@username.here" -export POWER_PLATFORM_PASSWORD="passhere" -export TEST_CLIENT_ID="native-app-guid-here" - -# Run tests using azd environment outputs (recommended) -dotnet test --logger "console;verbosity=detailed" -``` - -#### Running Tests with Manual Environment Variable Configuration - -If you prefer to set environment variables manually or need to override specific values, you can configure all required variables explicitly: - -```bash -# Navigate to the test directory -cd tests/Copilot - -# Power Platform authentication -export POWER_PLATFORM_USERNAME="your-test-user@domain.com" -export POWER_PLATFORM_PASSWORD="your-test-password" -export POWER_PLATFORM_TENANT_ID="your-tenant-id" -export POWER_PLATFORM_ENVIRONMENT_ID="your-environment-id" - -# Native client application ID -export TEST_CLIENT_ID="your-native-app-client-id" - -# Copilot Studio configuration -export COPILOT_STUDIO_ENDPOINT="https://api.copilotstudio.microsoft.com" -export COPILOT_STUDIO_AGENT_ID="crfXX_agentName" - -# Run the test -dotnet test --logger "console;verbosity=detailed" -``` - -**Important Notes:** -- The test account must have **MFA disabled** for automated authentication -- The user must have access to the Power Platform environment and the Copilot Studio agent -- Environment variables take precedence over values from azd .env files - -### AI Search Test (Optional) - -Located in `tests/AISearch/`, this test validates: - -- **Resource Existence**: Verify all search resources (index, datasource, skillset, indexer) exist -- **Configuration Validation**: Check resource configurations match expected settings -- **Content Verification**: Validate index contains expected documents and supports search -- **Pipeline Integration**: End-to-end validation of the complete search pipeline - -Because the Copilot agent end-to-end test includes indirect validation of the AI Search functionality, this test does not need to be run unless direct validation and troubleshooting of the AI Search resources is required. - -#### Prerequisites for AI Search Tests - -Before running AI Search tests, you must complete the following configuration: - -1. **Make AI Search Endpoint Public**: Unless the test is run on the same virtual network as the AI Search resource, the AI Search service must be updated to be accessible to the test script. Configure network access in the Azure portal: - - Navigate to your AI Search service - - Go to **Networking** → **Firewalls and virtual networks** - - Select **All networks** or add the test runner's IP to **Selected IP addresses** - -2. **Assign RBAC Roles**: The user or service principal running the tests must have the following roles: - - Navigate to your AI Search service in the Azure portal - - Go to **Access control (IAM)** → **Add role assignment** - - Select **Search Index Data Contributor** role and assign to the user or service principal that will execute the tests - - Add another role assignment for **Search Service Contributor** role to the same user or service principal - -#### Running AI Search Tests Locally +## Advanced Scenarios -```bash -# Ensure you're authenticated and have an azd environment deployed -az login +### Security Considerations -# Run the test script -cd tests/AISearch -./run-tests.sh -``` +See the [Security Considerations](./docs/security_considerations.md) guide for a concise overview of baseline controls, mitigated risks, and recommended hardening steps for production. -The tests automatically discover configuration from your azd environment outputs. +### Infrastructure Resilience Considerations -## Advanced Scenarios +This guide provides three options for deploying this template: **Basic** (dev/test), **Zone‑redundant** (single‑region production), and Regional failover ready (manual cross‑region recovery). [Infrastructure Resilience Considerations](./docs/infrastructure_resilience.md) provides prescriptive guidance on identity, networking, resiliency, scaling, and cost trade‑offs. The template defaults to Basic which ensures you have full control of and responsibility for choosing the cost, sizing, and resilience for your production environments. ### GitHub Self-Hosted Runners For organizations requiring deployment through CI/CD pipelines, this solution supports secure GitHub self-hosted runners and includes a turnkey bootstrap that provisions private Terraform remote state and a runner in Azure. The configuration emphasizes private networking (private endpoints, no public IP) and least‑privilege access for enterprise environments. -For step‑by‑step setup—including OIDC authentication, running the bootstrap workflow, capturing backend outputs, and targeting jobs to the runner—see the [CI/CD guide](/docs/cicd.md). +For step‑by‑step setup—including OIDC authentication, running the bootstrap workflow, capturing backend outputs, and targeting jobs to the runner—see the [CI/CD guide](./docs/cicd.md). + +### Testing + +Refer to the [Testing Guide](./docs/testing.md) in docs/testing.md for end-to-end instructions covering Copilot Studio agent functional tests and optional Azure AI Search integration tests. It explains required environment variables, two execution paths (auto-populated after azd up or manual configuration), and commands for validating search connectivity, index population, and bot responses before production hardening. ### Bring Your Own Networking If your organization needs to deploy into existing virtual networks and enforce corporate routing, egress, and inspection controls, this template supports bring‑your‑own networking. You can wire services to your VNet/subnets, use private endpoints and private DNS, and keep public exposure disabled while meeting enterprise policies. -For supported topologies, prerequisites, and step‑by‑step wiring (subnet requirements, private endpoints for Azure AI Search and Storage, DNS zones, NAT/firewall egress), see the [Bring Your Own Networking guide](/docs/custom_networking.md). +For supported topologies, prerequisites, and step‑by‑step wiring (subnet requirements, private endpoints for Azure AI Search and Storage, DNS zones, NAT/firewall egress), see the [Bring Your Own Networking guide](./docs/custom_networking.md). ### Custom Resource Group If you need to deploy into a pre-created or centrally managed Azure resource group (to align with enterprise naming, policy, or billing), the template can target an existing resource group rather than creating a new one. This is especially useful when developers don’t have subscription-level permissions—allowing deployments to proceed with resource group–scoped access. -For prerequisites and configuration flags, see the [Custom Resource Group guide](/docs/custom_resource_group.md). - -## Additional Considerations - -### Security Considerations - -See the [Security Considerations](./docs/security_considerations.md) guide for a concise overview of baseline controls, mitigated risks, and recommended hardening steps for production. - -### Production Readiness - -To avoid cost issues when validating the architecture, the default setting of the AI Search resource -is to use one partition and one replica, which is not a production-caliber configuration. If you use -this architecture in a production scenario, update the `ai_search_config` Terraform variable to configure -at least 3 partitions and replicas. +For prerequisites and configuration flags, see the [Custom Resource Group guide](./docs/custom_resource_group.md). ## Resources diff --git a/docs/bcdr_considerations.md b/docs/infrastructure_resilience.md similarity index 67% rename from docs/bcdr_considerations.md rename to docs/infrastructure_resilience.md index a10af2ae..b6162385 100644 --- a/docs/bcdr_considerations.md +++ b/docs/infrastructure_resilience.md @@ -1,20 +1,12 @@ -# Business Continuity and Disaster Recovery Considerations +# Infrastructure Resilience Considerations -This document explains the disaster recovery (DR) and regional resilience capabilities provided by this template, and highlights areas where additional user action is required for full business continuity. +This guide defines three resilience tiers for the Copilot Studio and Azure AI Search integration: **Basic** – single‑region, low-cost experimentation with no redundancy; **Zone‑redundant** – production within one region, surviving an availability zone loss via multi‑replica/service settings; **Regional failover ready** – adds cross‑region networking and geo-capable platform features so you can manually fail over (you still provision secondary resources, replicate data, and script orchestration). The template ships Basic defaults plus the networking/identity foundation to grow without redesign and intentionally leaves sizing, secondary region build-out, and failover automation to you. Regular DR rehearsal is strongly recommended for zone‑redundant and regional tiers. -## Overview - -This template is designed to deploy an enterprise-grade integration between Microsoft Copilot Studio and Azure AI Search, following Azure Well-Architected Framework best practices for security and reliability. It provisions a complete primary-region footprint and—where the selected Power Platform geography is backed by two Azure regions—provisions required dual-region networking scaffolding (virtual networks, subnets, private DNS integration) in both regions. This dual-networking is a compliance prerequisite for Enterprise Policy virtual network delegation in multi-region geographies and is **not optional**, even if you have not yet implemented active regional failover. The template does **not** stand up duplicate workload resources or automate failover; those tasks remain with the adopter. - -## Scenarios - -This document discusses the considerations for deploying the template in three resilience scenarios: **Basic**, **Zone-redundant**, and **Regional failover ready**. - -### Basic (default) +## Basic (default) **Basic** targets the lowest-cost setup for non-critical experimentation where downtime and data loss are acceptable. Deploys Azure AI Search, Azure Storage, Azure OpenAI, Networking and supporting resources in a single primary region using Terraform. Often suitable for development and test environments, this scenario intentionally uses single-instance resources and therefore does **not** satisfy production SLA commitments for Azure AI Search or other services. -#### Basic Recommendations +### Basic Recommendations **Azure Storage:** @@ -43,11 +35,11 @@ This document discusses the considerations for deploying the template in three r - Consider enabling via `include_app_insights = true` for debugging and telemetry during active development. - Can be disabled if not needed or if costs are a concern. -### Zone-redundant +## Zone-redundant **Zone-redundant** is suitable for production workloads that must stay available during a single datacenter or availability-zone outage inside one Azure region. This scenario keeps the primary-region footprint from the Basic tier while adding zone-aware configuration so the workload remains healthy during intraregional failures and meets Azure AI Search SLA minimums and higher performance expectations typical of production environments. -#### Zone-redundant Recommendations +### Zone-redundant Recommendations **Azure Storage:** @@ -82,14 +74,14 @@ This document discusses the considerations for deploying the template in three r - Enable Application Insights via `include_app_insights = true` for production monitoring and telemetry (zone redundancy is automatic by default in supported regions). -### Regional failover ready +## Regional failover ready -**Regional failover ready** is for mission-critical workloads that need a backup environment in a paired Azure region to recover from regional incidents. This scenario builds on the Zone-redundant tier by adding cross-region data replication and networking scaffolding to support manual failover to a secondary region. +**Regional failover ready** is for mission-critical workloads that need a backup environment in a paired Azure region to recover from regional incidents. This scenario builds on the Zone-redundant tier by adding manual failover to a secondary region. > [!IMPORTANT] -> The template does NOT implement automated cross‑region failover (secondary AI Search/OpenAI/Storage resources, data replication, DNS or traffic management, or orchestration). Use the provided dual-region networking plus the guidance below as the foundation for your own secondary provisioning, replication, runbooks, and traffic failover automation. +> This template intentionally stops short of full cross‑region automation. It does not create secondary AI Search, OpenAI, or Storage resources, replicate data, configure DNS / traffic failover, or orchestrate runbooks. Instead, it gives you the dual‑region networking and identity baseline so you can add those pieces when they’re truly needed. GAP items below call out the manual steps; future automation will be guided by your feedback. -#### Regional Failover Ready Recommendations +### Regional Failover Ready Recommendations **Azure Storage:** @@ -130,40 +122,7 @@ This document discusses the considerations for deploying the template in three r - Enable Application Insights via `include_app_insights = true` for production monitoring and telemetry. - **GAP:** For multi-region monitoring, manually deploy additional Application Insights instances in the secondary region or configure cross-region telemetry collection. -## Disaster Recovery Testing and Rehearsal - -This template does not provide automated disaster recovery testing or failover orchestration. For production environments implementing zone-redundant or regional failover configurations, we strongly recommend establishing a regular DR testing practice. - -### Recommended Testing Practices - -**Failover Planning and Documentation:** - -- Document complete failover runbooks with step-by-step procedures for declaring a disaster, activating secondary resources, and switching traffic. -- Identify roles and responsibilities for executing failover, including on-call contacts and escalation paths. -- Define clear success criteria and rollback procedures for each failover scenario. - -**Infrastructure Preparation:** - -- Pre-build Infrastructure-as-Code overlays, Terraform modules, or deployment scripts that can quickly instantiate secondary-region workload resources (AI Search, Storage, OpenAI) when failover is declared. -- Automate data replication and synchronization processes to ensure secondary resources have current data. -- Configure DNS failover automation using Azure Traffic Manager, Front Door, or scripted DNS updates. - -**Regular Failover Drills:** - -- Schedule and execute regular failover drills (quarterly or semi-annually for production environments) to validate your disaster recovery plan. -- Test the complete failover workflow: flip traffic, rehydrate data, validate application functionality, and verify monitoring/alerting. -- Validate connectivity, data freshness, authentication flows, and end-to-end application behavior during simulated regional or zone-level outages. -- Document lessons learned and update runbooks based on drill findings. - -**Post-Failover Validation:** - -- Develop automated smoke tests that verify critical functionality after failover (AI Search queries, Power Platform agent responses, data retrieval). -- Configure monitoring and alerting in both primary and secondary regions to detect performance degradation or service unavailability. -- Establish metrics for Recovery Time Objective (RTO) and Recovery Point Objective (RPO) and measure actual performance during drills. - -**Note:** Disaster recovery testing is a critical operational practice that extends beyond infrastructure provisioning. Organizations should invest in regular rehearsal to ensure confidence in their ability to recover from incidents. - -## CI/CD Infrastructure Considerations +## CI/CD Infrastructure Resilience The CI/CD infrastructure (GitHub runners and supporting resources) deployed by the `cicd/` Terraform configuration is independent of the main workload infrastructure and can be deployed in any Azure region. @@ -189,23 +148,6 @@ The CI/CD infrastructure (GitHub runners and supporting resources) deployed by t **Recommendation:** For simplicity, deploy CI/CD runners in a single, cost-effective region unless you have specific requirements for multi-region build infrastructure. The runner region does not impact disaster recovery capabilities of the workload itself. -## Known Issues - -### Regional Failover Requires Manual Resource Provisioning - -**Impact:** The template provisions only the primary-region resources and secondary networking scaffolding. Users implementing regional failover must manually provision and configure multiple critical components. - -**Current Behavior:** The following items require manual action for regional failover: - -- **Secondary workload resources**: You must manually provision Azure AI Search, Storage, and Azure OpenAI resources in the secondary region when executing a failover plan. -- **Data replication**: AI Search data, Storage blobs (beyond GZRS), and custom indexes must be manually re-ingested or replicated. -- **DNS and traffic management**: Automated failover requires manual configuration of Traffic Manager, Front Door, or DNS automation scripts. -- **Monitoring and validation**: Post-failover smoke tests, monitoring configuration, and runbook maintenance are entirely user-owned. - -**Workaround:** Pre-build Infrastructure-as-Code overlays or scripts that instantiate secondary-region workload resources, configure data replication, and automate DNS failover when needed. - -**Status:** Work towards full regional failover support depends on user feedback and customer needs. - ## References - [Azure Well-Architected Framework: Reliability](https://learn.microsoft.com/azure/architecture/framework/resiliency/overview) diff --git a/docs/testing.md b/docs/testing.md new file mode 100644 index 00000000..2ca0f299 --- /dev/null +++ b/docs/testing.md @@ -0,0 +1,99 @@ +# Testing + +This solution includes tests that validate both Copilot Studio and Azure AI Search components after deployment. + +## Copilot Studio Agent Test + +Located in `tests/Copilot/`, this test validates: + +- **Conversation Flow**: End-to-end conversation test with the deployed agent +- **Integration**: Validation that Copilot Studio can successfully query Azure AI Search + +Currently, [the Copilot Studio Client in the Agent SDK does not support the use of Service Principals for authentication](https://github.com/microsoft/Agents/blob/main/samples/basic/copilotstudio-client/dotnet/README.md#create-an-application-registration-in-entra-id---service-principal-login), and testing requires a cloud-native app registration as well as a test account with MFA turned off. The test user account must have access to the Power Platform environment containing the agent as well as access to the agent itself. + +### Running Tests After Local Deployment Execution + +After a successful local deployment execution, the local .env file contains most of the information needed to run the end-to-end Copilot Studio test. Alternatively, any test input can be set directly through environment variables. + +Run the commands below to execute the test after a deployment. + +```bash +# Navigate to the test directory +cd tests/Copilot + +export POWER_PLATFORM_USERNAME="test@username.here" +export POWER_PLATFORM_PASSWORD="passhere" +export TEST_CLIENT_ID="native-app-guid-here" + +# Run tests using azd environment outputs (recommended) +dotnet test --logger "console;verbosity=detailed" +``` + +### Running Tests with Manual Environment Variable Configuration + +If you prefer to set environment variables manually or need to override specific values, you can configure all required variables explicitly: + +```bash +# Navigate to the test directory +cd tests/Copilot + +# Power Platform authentication +export POWER_PLATFORM_USERNAME="your-test-user@domain.com" +export POWER_PLATFORM_PASSWORD="your-test-password" +export POWER_PLATFORM_TENANT_ID="your-tenant-id" +export POWER_PLATFORM_ENVIRONMENT_ID="your-environment-id" + +# Native client application ID +export TEST_CLIENT_ID="your-native-app-client-id" + +# Copilot Studio configuration +export COPILOT_STUDIO_ENDPOINT="https://api.copilotstudio.microsoft.com" +export COPILOT_STUDIO_AGENT_ID="crfXX_agentName" + +# Run the test +dotnet test --logger "console;verbosity=detailed" +``` + +**Important Notes:** +- The test account must have **MFA disabled** for automated authentication +- The user must have access to the Power Platform environment and the Copilot Studio agent +- Environment variables take precedence over values from azd .env files + +## AI Search Test (Optional) + +Located in `tests/AISearch/`, this test validates: + +- **Resource Existence**: Verify all search resources (index, datasource, skillset, indexer) exist +- **Configuration Validation**: Check resource configurations match expected settings +- **Content Verification**: Validate index contains expected documents and supports search +- **Pipeline Integration**: End-to-end validation of the complete search pipeline + +Because the Copilot agent end-to-end test includes indirect validation of the AI Search functionality, this test does not need to be run unless direct validation and troubleshooting of the AI Search resources is required. + +### Prerequisites for AI Search Tests + +Before running AI Search tests, you must complete the following configuration: + +1. **Make AI Search Endpoint Public**: Unless the test is run on the same virtual network as the AI Search resource, the AI Search service must be updated to be accessible to the test script. Configure network access in the Azure portal: + - Navigate to your AI Search service + - Go to **Networking** → **Firewalls and virtual networks** + - Select **All networks** or add the test runner's IP to **Selected IP addresses** + +2. **Assign RBAC Roles**: The user or service principal running the tests must have the following roles: + - Navigate to your AI Search service in the Azure portal + - Go to **Access control (IAM)** → **Add role assignment** + - Select **Search Index Data Contributor** role and assign to the user or service principal that will execute the tests + - Add another role assignment for **Search Service Contributor** role to the same user or service principal + +### Running AI Search Tests Locally + +```bash +# Ensure you're authenticated and have an azd environment deployed +az login + +# Run the test script +cd tests/AISearch +./run-tests.sh +``` + +The tests automatically discover configuration from your azd environment outputs. From 044a2df904535af0ea376e76f0552a07fcc5ae0d Mon Sep 17 00:00:00 2001 From: Matt Dotson Date: Thu, 16 Oct 2025 06:47:56 -0700 Subject: [PATCH 6/7] Update README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 8d452440..cdec6a5a 100644 --- a/README.md +++ b/README.md @@ -230,7 +230,7 @@ See the [Security Considerations](./docs/security_considerations.md) guide for a ### Infrastructure Resilience Considerations -This guide provides three options for deploying this template: **Basic** (dev/test), **Zone‑redundant** (single‑region production), and Regional failover ready (manual cross‑region recovery). [Infrastructure Resilience Considerations](./docs/infrastructure_resilience.md) provides prescriptive guidance on identity, networking, resiliency, scaling, and cost trade‑offs. The template defaults to Basic which ensures you have full control of and responsibility for choosing the cost, sizing, and resilience for your production environments. +This guide provides three options for deploying this template: **Basic** (dev/test), **Zone‑redundant** (single‑region production), and **Regional failover ready** (manual cross‑region recovery). [Infrastructure Resilience Considerations](./docs/infrastructure_resilience.md) provides prescriptive guidance on identity, networking, resiliency, scaling, and cost trade‑offs. The template defaults to Basic which ensures you have full control of and responsibility for choosing the cost, sizing, and resilience for your production environments. ### GitHub Self-Hosted Runners From ad62e362c3f7ab62e7223986b9da5b750c0412f4 Mon Sep 17 00:00:00 2001 From: Matt Dotson Date: Thu, 16 Oct 2025 06:50:34 -0700 Subject: [PATCH 7/7] Update README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index cdec6a5a..725215d6 100644 --- a/README.md +++ b/README.md @@ -240,7 +240,7 @@ For step‑by‑step setup—including OIDC authentication, running the bootstra ### Testing -Refer to the [Testing Guide](./docs/testing.md) in docs/testing.md for end-to-end instructions covering Copilot Studio agent functional tests and optional Azure AI Search integration tests. It explains required environment variables, two execution paths (auto-populated after azd up or manual configuration), and commands for validating search connectivity, index population, and bot responses before production hardening. +Refer to the [Testing Guide](./docs/testing.md) for end-to-end instructions covering Copilot Studio agent functional tests and optional Azure AI Search integration tests. It explains required environment variables, two execution paths (auto-populated after azd up or manual configuration), and commands for validating search connectivity, index population, and bot responses before production hardening. ### Bring Your Own Networking