Skip to content

Latest commit

 

History

History
478 lines (370 loc) · 13.5 KB

File metadata and controls

478 lines (370 loc) · 13.5 KB

Azure DevOps Server 2020 with Azure Application Proxy - Troubleshooting Guide

Overview

This guide provides comprehensive troubleshooting steps for common issues when using Azure DevOps Server 2020 with Azure Application Proxy.

Quick Diagnostic Checklist

Before diving into specific issues, run through this quick checklist:

  • Azure Application Proxy connector is showing as "Active" in Azure portal
  • Internal URL is accessible from the connector server
  • External URL resolves to the correct Azure Application Proxy service
  • SSL certificate is valid and properly configured
  • Users are assigned to the Application Proxy application
  • Firewall rules allow required traffic
  • Azure DevOps Server services are running

Connectivity Issues

External URL Not Accessible

Symptoms:

  • Users cannot access Azure DevOps via external URL
  • Timeout errors when connecting externally
  • DNS resolution failures

Diagnostic Steps:

# Test DNS resolution
nslookup devops-external.yourdomain.com

# Test connectivity to Azure Application Proxy
Test-NetConnection -ComputerName devops-external.yourdomain.com -Port 443

# Check certificate validity
Invoke-WebRequest -Uri "https://devops-external.yourdomain.com" -UseBasicParsing

Solutions:

  1. DNS Issues:

    • Verify CNAME record points to Azure Application Proxy service
    • Check DNS propagation using online tools
    • Flush DNS cache: ipconfig /flushdns
  2. Certificate Issues:

    • Verify certificate matches external domain
    • Check certificate expiry date
    • Validate certificate chain
  3. Azure Configuration:

    • Verify Application Proxy application configuration
    • Check connector status in Azure portal
    • Validate external URL configuration

Internal URL Not Accessible from Connector

Symptoms:

  • Connector shows as active but external access fails
  • 502 Bad Gateway errors
  • Internal connectivity test failures

Diagnostic Steps:

# Test from connector server to Azure DevOps Server
Test-NetConnection -ComputerName devops-server.local -Port 8080

# Test HTTP response
Invoke-WebRequest -Uri "https://devops-server.local:8080/tfs/" -UseBasicParsing

# Check IIS bindings
Get-IISSite | Get-IISBinding

Solutions:

  1. Network Connectivity:

    • Verify firewall rules between connector and DevOps server
    • Check routing table for correct paths
    • Test with telnet: telnet devops-server.local 8080
  2. IIS Configuration:

    • Verify IIS site is running
    • Check port bindings (80, 443, 8080)
    • Validate SSL certificate binding
  3. Azure DevOps Server:

    • Check application tier services status
    • Verify web service configuration
    • Review event logs for errors

Authentication Issues

Azure AD Authentication Failures

Symptoms:

  • Users prompted repeatedly for credentials
  • "Access Denied" errors after authentication
  • Users redirected to wrong login page

Diagnostic Steps:

# Check Azure AD sign-in logs
Connect-AzAccount
Get-AzureADAuditSignInLogs -Filter "appDisplayName eq 'Azure DevOps Server 2020'"

# Test token acquisition
$context = Get-AzContext
$token = [Microsoft.Azure.Commands.Common.Authentication.AzureSession]::Instance.AuthenticationFactory.Authenticate($context.Account, $context.Environment, $context.Tenant.Id, $null, "https://graph.microsoft.com/", $null).AccessToken

Solutions:

  1. User Assignment:

    • Verify users are assigned to the Application Proxy application
    • Check group membership if using groups
    • Validate user licenses (Azure AD Premium required)
  2. Application Configuration:

    • Verify pre-authentication is set to Azure Active Directory
    • Check single sign-on configuration
    • Validate redirect URIs
  3. Conditional Access:

    • Review conditional access policies
    • Check device compliance requirements
    • Verify location-based restrictions

Service Principal Authentication Issues

Symptoms:

  • Build agents cannot connect
  • API calls return 401 Unauthorized
  • Service account authentication failures

Diagnostic Steps:

# Test service principal authentication
$clientId = "your-client-id"
$clientSecret = "your-client-secret"
$tenantId = "your-tenant-id"

$body = @{
    client_id = $clientId
    client_secret = $clientSecret
    scope = "https://graph.microsoft.com/.default"
    grant_type = "client_credentials"
}

$response = Invoke-RestMethod -Uri "https://login.microsoftonline.com/$tenantId/oauth2/v2.0/token" -Method Post -Body $body

Solutions:

  1. Service Principal Configuration:

    • Verify service principal exists in Azure AD
    • Check client secret expiry
    • Validate permissions granted to service principal
  2. Azure DevOps Configuration:

    • Update service connections to use external URLs
    • Configure proper authentication methods
    • Test service connections

Performance Issues

Slow Response Times

Symptoms:

  • Pages load slowly when accessed externally
  • Timeouts during large operations
  • Git operations taking excessive time

Diagnostic Steps:

# Measure response times
Measure-Command { Invoke-WebRequest -Uri "https://devops-external.yourdomain.com" }

# Check connector performance counters
Get-Counter -Counter "\Azure AD Application Proxy Connector(*)\*"

# Network latency test
Test-NetConnection -ComputerName devops-external.yourdomain.com -TraceRoute

Solutions:

  1. Network Optimization:

    • Check bandwidth utilization
    • Optimize network routes
    • Consider additional connector locations
  2. Application Proxy Settings:

    • Increase backend timeout (default 85 seconds)
    • Enable session affinity if needed
    • Review connector group assignments
  3. Azure DevOps Server Optimization:

    • Check server resource utilization
    • Optimize database performance
    • Review IIS performance settings

High Memory/CPU Usage on Connector

Symptoms:

  • Connector server running out of memory
  • High CPU utilization
  • Connector becomes unresponsive

Diagnostic Steps:

# Check connector resource usage
Get-Process -Name "Microsoft.AAD.App.Proxy.Connector*" | Select-Object ProcessName, CPU, WorkingSet

# Monitor performance over time
Get-Counter -Counter "\Process(Microsoft.AAD.App.Proxy.Connector)\*" -SampleInterval 5 -MaxSamples 12

Solutions:

  1. Resource Allocation:

    • Increase server memory/CPU
    • Optimize connector server configuration
    • Consider dedicated connector servers
  2. Connector Configuration:

    • Install multiple connectors for load distribution
    • Create dedicated connector groups
    • Monitor connector health regularly

SSL/Certificate Issues

Certificate Validation Failures

Symptoms:

  • "Certificate not trusted" errors
  • SSL handshake failures
  • Mixed content warnings

Diagnostic Steps:

# Check certificate details
$cert = Invoke-WebRequest -Uri "https://devops-external.yourdomain.com" | Select-Object -ExpandProperty BaseResponse | Select-Object -ExpandProperty ServicePoint | Select-Object -ExpandProperty Certificate
$cert | Format-List *

# Validate certificate chain
openssl s_client -connect devops-external.yourdomain.com:443 -showcerts

Solutions:

  1. Certificate Installation:

    • Install root and intermediate certificates
    • Verify certificate chain completeness
    • Update certificate trust stores
  2. Certificate Configuration:

    • Verify certificate Subject Alternative Names (SAN)
    • Check certificate binding in IIS
    • Validate private key permissions

Certificate Expiry Issues

Symptoms:

  • Sudden authentication failures
  • Certificate expiry warnings
  • SSL connection errors

Solutions:

  1. Immediate Actions:

    • Install new certificate immediately
    • Update certificate bindings
    • Restart IIS and connector services
  2. Prevention:

    • Set up certificate expiry monitoring
    • Implement automated renewal processes
    • Maintain certificate inventory

Git Operations Issues

Git Clone/Push Failures

Symptoms:

  • "Repository not found" errors
  • Authentication prompts during Git operations
  • Slow Git operations over external connection

Diagnostic Steps:

# Test Git connectivity
git ls-remote https://devops-external.yourdomain.com/DefaultCollection/_git/ProjectName

# Check Git configuration
git config --list | grep url

# Test with verbose output
GIT_CURL_VERBOSE=1 git clone https://devops-external.yourdomain.com/DefaultCollection/_git/ProjectName

Solutions:

  1. Git Configuration:

    • Update remote URLs to use external domain
    • Configure credential caching
    • Set appropriate timeout values
  2. Authentication:

    • Use Personal Access Tokens (PAT)
    • Configure Git credential managers
    • Verify user permissions on repositories

Git LFS Issues

Symptoms:

  • LFS objects not downloading
  • "Batch API not found" errors
  • Large file upload failures

Solutions:

  1. LFS Configuration:

    • Update .lfsconfig with external URLs
    • Verify LFS server accessibility
    • Check LFS authentication configuration
  2. Proxy Settings:

    • Configure Application Proxy for LFS endpoints
    • Adjust timeout settings for large files
    • Test LFS operations through proxy

Build and Release Issues

Build Agent Connectivity

Symptoms:

  • Agents appear offline in external access
  • Build failures due to connectivity
  • Agent registration failures

Diagnostic Steps:

# Check agent status
.\config.cmd --status

# Test connectivity to external URL
Test-NetConnection -ComputerName devops-external.yourdomain.com -Port 443

# Check agent logs
Get-Content "C:\agent\_diag\*.log" | Select-String -Pattern "error"

Solutions:

  1. Agent Configuration:

    • Reconfigure agents with external URL
    • Update agent pool settings
    • Verify agent permissions
  2. Network Configuration:

    • Update firewall rules for agents
    • Configure proxy settings if needed
    • Test agent connectivity

Release Pipeline Failures

Symptoms:

  • Service connection failures
  • Webhook delivery failures
  • Deployment target connectivity issues

Solutions:

  1. Service Connections:

    • Update service endpoints to use external URLs
    • Refresh authentication credentials
    • Test service connections
  2. Webhooks:

    • Update webhook URLs in external systems
    • Verify webhook authentication
    • Test webhook delivery

Logging and Monitoring

Enable Detailed Logging

Application Proxy Connector Logging:

<!-- In Microsoft.AAD.App.Proxy.Connector.exe.config -->
<configuration>
  <system.diagnostics>
    <trace autoflush="true">
      <listeners>
        <add name="textWriterTraceListener" type="System.Diagnostics.TextWriterTraceListener" initializeData="trace.log" />
      </listeners>
    </trace>
  </system.diagnostics>
</configuration>

IIS Detailed Logging:

<!-- In web.config -->
<system.webServer>
  <httpLogging dontLog="false" />
  <tracing>
    <traceFailedRequests>
      <add path="*">
        <traceAreas>
          <add provider="WWW Server" areas="Authentication,Security,Filter,StaticFile,CGI,Compression,Cache,RequestNotifications,Module,Rewrite" verbosity="Verbose" />
        </traceAreas>
        <failureDefinitions>
          <add statusCodes="400-999" />
        </failureDefinitions>
      </add>
    </traceFailedRequests>
  </tracing>
</system.webServer>

Common Log Locations

  • Application Proxy Connector: %ProgramData%\Microsoft\Microsoft AAD Application Proxy Connector\Trace
  • IIS Logs: %SystemDrive%\inetpub\logs\LogFiles
  • Azure DevOps Server Logs: %ProgramData%\Microsoft\Team Foundation\Server Configuration\Logs
  • Windows Event Logs: Applications and Services Logs > Microsoft > Azure AD Application Proxy

Emergency Procedures

Complete Service Outage

  1. Immediate Actions:

    • Check Azure service health
    • Verify on-premises infrastructure
    • Enable internal-only access if needed
  2. Communication:

    • Notify stakeholders
    • Provide status updates
    • Document timeline of events
  3. Recovery Steps:

    • Follow documented recovery procedures
    • Test each component before declaring service restored
    • Conduct post-incident review

Security Incident Response

  1. Immediate Containment:

    • Disable compromised accounts
    • Block suspicious IP addresses
    • Isolate affected systems
  2. Investigation:

    • Collect relevant logs
    • Analyze attack vectors
    • Document evidence
  3. Recovery:

    • Patch vulnerabilities
    • Reset compromised credentials
    • Implement additional security measures

Support Contacts

Microsoft Support

  • Azure Application Proxy: Azure Portal > Help + Support
  • Azure DevOps Server: Microsoft Support Portal
  • Azure AD: Azure Portal > Azure Active Directory > Support

Internal Escalation

  • Level 1: IT Help Desk
  • Level 2: System Administrators
  • Level 3: Security Team / External Consultants

Useful PowerShell Scripts

See the /scripts/powershell/ directory for troubleshooting scripts:

  • test-connectivity.ps1: Comprehensive connectivity testing
  • test-spn.ps1: SPN configuration validation
  • test-saml-sso.ps1: SAML SSO testing and validation
  • configure-spn.ps1: Service Principal Name configuration
  • configure-saml-sso.ps1: SAML Single Sign-On setup
  • configure-ssl.ps1: SSL certificate configuration
  • install-connector.ps1: Application Proxy connector installation

For additional troubleshooting assistance, review the installation and configuration guides or contact Microsoft Support with specific error details and log files.