Ingress migrations are rarely “just routing.” They’re one of the few moments where you’re forced to touch the edge, which means you can also fix the stuff that quietly rots over time:
- inconsistent security headers
- mystery timeouts and body-size limits
- WAF rules living in “someone’s portal”
- certificates handled like sacred relics
- production drift nobody can explain
The core move is simple:
Don’t migrate YAML → YAML. Migrate to a desired state — and encode that state as Terraform.
This write-up is a comprehensive playbook that treats DevOps and Azure Infrastructure as Code (Terraform) as the main story, with the ingress controller swap as the mechanism.
1) Target architecture and “who owns what”
A common (and effective) Azure pattern is:
Client → Azure Front Door (edge) → Application Gateway (gateway) → AKS (services)
Responsibilities by layer
Azure Front Door (outer edge)
- WAF at the global edge (bot/rate/geo/IP controls depending on your plan)
- global routing / failover
- traffic shaping (weighted cutovers)
- optional header manipulation with Rules Engine
Application Gateway + AGIC (gateway layer)
- L7 routing closer to the cluster
- standardized header rewrites (your baseline security headers)
- request-size and timeout guardrails
- TLS termination with Key Vault integration
AKS + apps
- final CORS enforcement (never rely only on edge)
- authN/authZ and business logic security
- app-level rate limiting where appropriate
- correct forwarded header / scheme handling
The key DevOps/IaC point: you now have two control planes (Azure + Kubernetes). Terraform should own the Azure edge/gateway and the “wiring,” while Kubernetes/GitOps owns service-level routing objects (Ingress, Gateway API, Services), with explicit integration points between them.
2) The migration mindset: inventory → contract → code → tests → gradual cutover
Most migrations fail because teams skip the “contract” step and jump straight to implementation.
Step A — Inventory what NGINX actually does
Capture these per host/path (because they’re often different per route):
- TLS versions/ciphers behavior
- HSTS policy
- CORS allowlist + credential behavior + max-age
- request body size limit
- timeouts (idle/read)
- header rewrites (request + response)
- path normalization quirks
- IP allowlists / rate limits / bot blocks
- special behaviors: websockets, gRPC, long polling, large uploads
Step B — Turn inventory into a “route contract”
Instead of a spreadsheet that dies in SharePoint, put a contract file in the repo. This becomes:
- inputs to Terraform modules (what policies apply where)
- inputs to smoke tests (what to assert pre-prod)
- a reviewable artifact in PRs
Example (illustrative):
# edge-contract.yml
hosts:
- host: api.example.com
routes:
- name: api
path_prefix: /api/
max_body_mb: 1
timeout_seconds: 30
waf_profile: strict-v1
security_headers: baseline-v1
cors:
allowed_origins:
- https://app.example.com
- https://admin.example.com
allow_credentials: true
max_age_seconds: 600
This is where IaC stops being “we used Terraform” and becomes “Terraform expresses intent.”
3) Terraform foundation: repo layout, state, environments, and drift
If your IaC foundation is shaky, the migration becomes chaos with better vocabulary.
Suggested repo layout (practical, scalable)
infra/
modules/
frontdoor/
frontdoor_waf/
app_gateway/
appgw_waf/
keyvault_cert/
diagnostics/
env/
dev/
main.tf
variables.tf
dev.tfvars
staging/
prod/
Why this layout works:
- modules encode reusable architecture (Front Door, WAF, App Gateway)
- env expresses environment-specific choices (SKU, WAF mode, domains)
Remote state (Azure Storage) and locking
Use an Azure Storage account backend so:
- state is shared
- state locks prevent two applies at once
- every environment has its own state file
Example backend (in each env):
terraform {
required_version = ">= 1.6.0"
backend "azurerm" {
resource_group_name = "rg-tfstate"
storage_account_name = "sttfstateprod"
container_name = "tfstate"
key = "edge-gateway-prod.tfstate"
}
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.115"
}
}
}
provider "azurerm" {
features {}
}
Environment promotion model (DevOps + IaC)
A strong pattern:
- PR runs fmt/validate/plan
- plan output is reviewed and approved
- apply happens via pipeline with a protected environment
- the same module versions and inputs are promoted to staging → prod
Drift detection (IaC that stays true)
Schedule a nightly terraform plan against prod. If the plan isn’t empty, open an issue or alert.
This single practice prevents the “portal changes that nobody admits.”
4) Putting security controls where they belong — and encoding them in Terraform
Here’s the big reframing:
Security controls are part of your platform API. Terraform is how you publish and enforce that API.
We’ll structure the controls as:
- Front Door WAF policy (edge protection)
- App Gateway rewrite set (security headers baseline)
- App Gateway request limits / timeouts (abuse + reliability budgets)
- Key Vault + Managed Identity (certificate lifecycle)
- Diagnostics settings (observability as code)
5) Terraform: Key Vault + Managed Identity for certificates (no more “PFX rituals”)
One of the highest ROI upgrades is certificate automation.
What you want
- Certificates stored in Key Vault
- App Gateway uses a managed identity to fetch them
- Rotation happens by updating Key Vault, not by redeploying secrets manually
Terraform sketch (core pieces)
resource "azurerm_user_assigned_identity" "agw" {
name = "uai-agw-${var.env}"
location = var.location
resource_group_name = var.rg_name
}
resource "azurerm_key_vault" "kv" {
name = "kv-edge-${var.env}"
location = var.location
resource_group_name = var.rg_name
tenant_id = var.tenant_id
sku_name = "standard"
purge_protection_enabled = true
soft_delete_retention_days = 90
}
# If you use RBAC for Key Vault:
resource "azurerm_role_assignment" "agw_kv_secrets_user" {
scope = azurerm_key_vault.kv.id
role_definition_name = "Key Vault Secrets User"
principal_id = azurerm_user_assigned_identity.agw.principal_id
}
Then in Application Gateway, reference the cert secret:
# inside azurerm_application_gateway ...
ssl_certificate {
name = "api-cert"
key_vault_secret_id = var.kv_cert_secret_id
}
Note: how you create/import the certificate into Key Vault varies by org (ACME automation, manual import, DigiCert integration, etc.). The important IaC point is: the gateway reads from Key Vault via identity, and that wiring is versioned.
6) Terraform: Application Gateway with security header rewrites + guardrails
What changes vs NGINX
NGINX often does this per-Ingress with annotations/snippets. That’s flexible, but inconsistent.
With App Gateway, you can enforce a consistent baseline via rewrite rule sets.
A practical baseline header policy
- HSTS (careful with
preload— only if you truly want it) X-Content-Type-Options: nosniffReferrer-PolicyPermissions-Policy(deny unused capabilities)
Terraform example: rewrite rule set (illustrative)
The exact variable names/condition capabilities can vary, so treat this as a pattern:
resource "azurerm_application_gateway" "agw" {
name = "agw-${var.env}"
location = var.location
resource_group_name = var.rg_name
identity {
type = "UserAssigned"
identity_ids = [azurerm_user_assigned_identity.agw.id]
}
sku {
name = "WAF_v2"
tier = "WAF_v2"
capacity = 2
}
# ... gateway_ip_configuration, frontend_port, frontend_ip_configuration, etc.
rewrite_rule_set {
name = "security-headers-v1"
rewrite_rule {
name = "baseline-headers"
rule_sequence = 100
# Optional: only apply to certain paths/hosts if needed.
condition {
variable = "var_uri_path"
pattern = "^/api/.*"
ignore_case = true
negate = false
}
response_header_configuration {
header_name = "Strict-Transport-Security"
header_value = "max-age=31536000; includeSubDomains"
}
response_header_configuration {
header_name = "X-Content-Type-Options"
header_value = "nosniff"
}
response_header_configuration {
header_name = "Referrer-Policy"
header_value = "no-referrer-when-downgrade"
}
response_header_configuration {
header_name = "Permissions-Policy"
header_value = "geolocation=(), microphone=()"
}
}
}
# Example: attach rewrite set to a routing rule (structure depends on your listeners/rules)
# request_routing_rule { ... rewrite_rule_set_name = "security-headers-v1" }
}
Guardrails: request size and timeouts
This is where security and reliability hold hands. You’re reducing:
- slow-loris style abuse
- accidental giant payloads
- backend pool starvation
In practice you’ll encode:
- per-route timeout limits
- max request body size limits where supported
- backend probe timeouts and unhealthy thresholds
The exact knobs differ by feature and SKU, but the IaC principle is stable:
Define defaults in the module. Allow overrides per route only with justification.
7) Terraform: WAF “detect → enforce” without drama
WAF migrations fail when:
- prevention mode is turned on too early
- exceptions are added as global disables
- nobody owns the exceptions later
The disciplined pattern
- run WAF in Detection in non-prod (or first sprint)
- tag every exception with an owner and an expiry
- switch prod to Prevention when you’ve observed real traffic
App Gateway WAF (example)
Using an azurerm_web_application_firewall_policy:
variable "waf_mode" {
type = string
default = "Detection" # set to "Prevention" in prod tfvars
}
resource "azurerm_web_application_firewall_policy" "agw" {
name = "waf-agw-${var.env}"
location = var.location
resource_group_name = var.rg_name
policy_settings {
enabled = true
mode = var.waf_mode
}
managed_rules {
managed_rule_set {
type = "OWASP"
version = "3.2"
}
# Example exclusion (scope it tightly; avoid global disables)
exclusion {
match_variable = "RequestArgNames"
selector = "someField"
selector_match_operator = "Equals"
}
}
}
Then attach it to the App Gateway:
# inside azurerm_application_gateway ...
waf_configuration {
enabled = true
firewall_mode = var.waf_mode
rule_set_type = "OWASP"
rule_set_version = "3.2"
}
(Depending on your design, you may use a WAF policy resource and associate it, or configure WAF directly on the gateway.)
DevOps rule for exceptions
Every exception is a PR change with:
- link to a ticket
- a scoped selector/path
- an owner
- an expiry date (enforced by review)
Terraform makes this enforceable because your WAF policy becomes code.
8) Terraform: Front Door (edge routing) + WAF as code
Front Door Standard/Premium is typically modeled via the azurerm_cdn_frontdoor_* resources.
What you get at the edge
- global anycast entry point
- TLS at the edge
- WAF at the edge (often the first line of defense)
- routing controls that are perfect for migration cutovers
Terraform example: Front Door profile + endpoint + origin group + route (illustrative)
resource "azurerm_cdn_frontdoor_profile" "fd" {
name = "fd-${var.env}"
resource_group_name = var.rg_name
sku_name = "Premium_AzureFrontDoor"
}
resource "azurerm_cdn_frontdoor_endpoint" "fd_ep" {
name = "fde-${var.env}"
cdn_frontdoor_profile_id = azurerm_cdn_frontdoor_profile.fd.id
}
resource "azurerm_cdn_frontdoor_origin_group" "og" {
name = "og-agw-${var.env}"
cdn_frontdoor_profile_id = azurerm_cdn_frontdoor_profile.fd.id
health_probe {
path = "/healthz"
request_type = "GET"
protocol = "Https"
interval_in_seconds = 30
}
load_balancing {
sample_size = 4
successful_samples_required = 3
}
}
resource "azurerm_cdn_frontdoor_origin" "agw_origin" {
name = "agw-origin"
cdn_frontdoor_origin_group_id = azurerm_cdn_frontdoor_origin_group.og.id
host_name = var.app_gateway_public_fqdn
http_port = 80
https_port = 443
origin_host_header = var.app_gateway_public_fqdn
priority = 1
weight = 1000
}
resource "azurerm_cdn_frontdoor_route" "api" {
name = "route-api"
cdn_frontdoor_endpoint_id = azurerm_cdn_frontdoor_endpoint.fd_ep.id
cdn_frontdoor_origin_group_id = azurerm_cdn_frontdoor_origin_group.og.id
patterns_to_match = ["/api/*"]
supported_protocols = ["Https"]
forwarding_protocol = "HttpsOnly"
https_redirect_enabled = true
link_to_default_domain = false
}
Migration superpower: weighted cutovers
If you keep NGINX as a second origin (or a parallel gateway path), you can shift traffic by changing origin weights in code.
DevOps benefit: controlled rollback is a single weight change PR/apply.
9) Terraform: Edge WAF policy (Front Door) + “detect → enforce” promotion
Front Door WAF policies can be expressed as code (resource names vary slightly by provider version, but the principle is constant).
Example pattern
var.edge_waf_mode = Detectionin stagingvar.edge_waf_mode = Preventionin prod
variable "edge_waf_mode" {
type = string
default = "Detection"
}
resource "azurerm_cdn_frontdoor_firewall_policy" "waf" {
name = "waf-fd-${var.env}"
resource_group_name = var.rg_name
sku_name = azurerm_cdn_frontdoor_profile.fd.sku_name
enabled = true
mode = var.edge_waf_mode
managed_rule {
type = "DefaultRuleSet"
version = "2.1"
}
# Add tightly scoped exclusions if needed, with PR ownership discipline
}
Then associate the WAF policy with the endpoint/domain via the relevant security policy association resource (depending on Front Door configuration).
IaC win: WAF posture becomes an environment parameter, not a portal flip.
DevOps win: promotion to enforcement is a controlled release with rollback.
10) Observability as code: diagnostics, logs, alerts that actually answer questions
If you can’t correlate Front Door → App Gateway → AKS, your migration will be “successful” right until the first incident.
What to collect
- Front Door access logs + WAF logs
- App Gateway access/performance/firewall logs
- AKS ingress/controller logs (AGIC), plus app logs and traces
- correlation ID propagation (
traceparent/x-correlation-id)
Terraform: Diagnostic settings (illustrative)
resource "azurerm_log_analytics_workspace" "law" {
name = "law-edge-${var.env}"
location = var.location
resource_group_name = var.rg_name
sku = "PerGB2018"
retention_in_days = 30
}
resource "azurerm_monitor_diagnostic_setting" "agw_diag" {
name = "diag-agw"
target_resource_id = azurerm_application_gateway.agw.id
log_analytics_workspace_id = azurerm_log_analytics_workspace.law.id
enabled_log {
category = "ApplicationGatewayAccessLog"
}
enabled_log {
category = "ApplicationGatewayFirewallLog"
}
metric {
category = "AllMetrics"
enabled = true
}
}
Do the same for Front Door resources.
The “tells a story” dashboards
Track:
- WAF matches by rule ID + URL + host
- 4xx/5xx by route
- latency percentiles per hop (edge vs gateway vs app)
- burst traffic vs rate limiting outcomes
- backend pool health probe failures
11) DevOps pipeline: Terraform is the delivery mechanism for your edge
Here’s the workflow that makes IaC real:
CI (every PR)
terraform fmt -checkterraform validate- security scanning (tfsec/checkov/your internal rules)
terraform plan(published as PR artifact/comment)- optional: OPA/Conftest rules to enforce guardrails
CD (merge to main)
- apply to dev/staging automatically
- run smoke tests (routing/headers/CORS)
- run ZAP/k6 in pre-prod
- gated approval to apply to prod
- progressive traffic shift via Front Door weights
Example GitHub Actions-style pipeline (illustrative)
name: edge-iac
on:
pull_request:
paths: ["infra/**"]
push:
branches: ["main"]
paths: ["infra/**"]
jobs:
plan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Terraform fmt/validate
run: |
terraform -chdir=infra/env/staging init -backend=false
terraform -chdir=infra/env/staging fmt -check
terraform -chdir=infra/env/staging validate
- name: Terraform plan
run: |
terraform -chdir=infra/env/staging init
terraform -chdir=infra/env/staging plan -out tfplan
apply-staging-and-test:
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
needs: [plan]
steps:
- uses: actions/checkout@v4
- name: Apply (staging)
run: |
terraform -chdir=infra/env/staging init
terraform -chdir=infra/env/staging apply -auto-approve
- name: Smoke tests (headers + routing + CORS)
run: |
./tests/smoke/headers.sh
./tests/smoke/cors.sh
- name: ZAP baseline (staging)
run: ./tests/security/zap-baseline.sh
- name: k6 smoke
run: ./tests/perf/k6-smoke.sh
The point: the pipeline doesn’t just deploy infra. It asserts your security intent still holds.
12) Testing that matches the contract (so you catch regressions before prod)
Your tests should read from the same contract file that Terraform uses.
Header smoke test idea
For each route:
- request
/api/health(or a lightweight endpoint) - assert baseline headers exist
- assert headers don’t accidentally duplicate or conflict
Pseudo-shell example:
#!/usr/bin/env bash
set -euo pipefail
URL="https://api.example.com/api/health"
headers=$(curl -s -D - "$URL" -o /dev/null)
echo "$headers" | grep -qi "strict-transport-security:"
echo "$headers" | grep -qi "x-content-type-options: nosniff"
echo "$headers" | grep -qi "referrer-policy:"
CORS smoke test idea
- OPTIONS with allowed origin → expect allow-origin = that origin
- OPTIONS with disallowed origin → expect no allow-origin (or a deny behavior)
WAF validation idea
- run in detection for a sprint
- confirm rule hits are understood and exceptions are scoped
- promote to prevention and ensure no false positives break core flows
13) The “tiny before/after” that actually reflects IaC
Before (NGINX Ingress snippets)
- behavior lives in scattered annotations
- hard to guarantee consistency across services
- changes are easy to make ad-hoc
metadata:
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "1m"
nginx.ingress.kubernetes.io/configuration-snippet: |
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Content-Type-Options "nosniff" always;
After (Terraform-managed baseline)
- security headers live in App Gateway rewrite set (or Front Door rules if you choose)
- WAF policies live in Terraform
- cert lifecycle is Key Vault + managed identity
- tests ensure policy doesn’t regress
That’s the real upgrade: from “ingress config” → “platform policy as code.”
14) A realistic 30-day rollout (DevOps + IaC aligned)
Week 1 — Contract + module design
- build route inventory and encode it as
edge-contract.yml - design Terraform modules and variable schema
- define SLOs and regression tests
Week 2 — Provision parallel infrastructure
- deploy Front Door + WAF (Detection)
- deploy App Gateway + rewrite sets + diagnostics
- install/configure AGIC to target the gateway
- validate end-to-end in staging
Week 3 — Controlled traffic shift
- add second origin (old NGINX path) if doing weighted migration
- shift traffic gradually (5% → 25% → 50% → 100%)
- tune WAF exceptions (scoped + expiring) based on real logs
Week 4 — Enforce + harden
- promote WAF to Prevention in prod
- tighten request size/timeouts based on measured reality
- enable drift detection
- remove temporary exceptions and document runbooks
15) Common migration gotchas (the stuff that bites teams at 2am)
These aren’t theoretical; they’re the usual suspects:
- Forwarded headers differ (
X-Forwarded-For,X-Forwarded-Proto, host handling) → apps may generate wrong redirects or mixed-content issues - Path rewrite semantics differ →
/apivs/api/and normalization edge cases - Health probes: probe path/auth can accidentally mark backends unhealthy
- WebSockets / gRPC: require specific support and timeout expectations
- WAF false positives: especially around JSON bodies, JWTs, file uploads, and certain query patterns
- CORS duplication: if both edge and app add headers you can get inconsistent behavior
- Cert permissions: Key Vault access miswired → gateway can’t fetch cert → outage
DevOps response: make each of these a check in pre-prod, not a lesson learned in prod.
Takeaway
If Terraform shows up only in the last paragraph, it’ll feel unrelated. In a good migration, Terraform is the spine:
- it encodes the desired state (routes, policies, WAF mode, rewrites, cert wiring)
- it enables reviewable change (PRs + plan diffs)
- it enables safe delivery (progressive cutovers and easy rollback)
- it prevents drift (scheduled plans + alerts)
- it turns security controls into repeatable platform behavior
Ingress migrations are rare. Use them to upgrade your security posture and your delivery maturity in one go — and make the edge as boring, auditable, and testable as your application code.
If you want to continue this in the same direction, the next level is to make the edge-contract.yml the single source of truth and generate:
- Terraform route/rule inputs
- smoke tests (headers/CORS)
- documentation (route inventory)
…so policy, infrastructure, and tests literally cannot drift apart.