Turning an Ingress Migration into a Security Upgrade — with DevOps + Terraform IaC as the Backbone (NGINX → Azure Front Door + App Gateway/AGIC)

Ingress migrations are rarely “just routing.” They’re one of the few moments where you’re forced to touch the edge, which means you can also fix the stuff that quietly rots over time:

  • inconsistent security headers
  • mystery timeouts and body-size limits
  • WAF rules living in “someone’s portal”
  • certificates handled like sacred relics
  • production drift nobody can explain

The core move is simple:

Don’t migrate YAML → YAML. Migrate to a desired state — and encode that state as Terraform.

This write-up is a comprehensive playbook that treats DevOps and Azure Infrastructure as Code (Terraform) as the main story, with the ingress controller swap as the mechanism.


1) Target architecture and “who owns what”

A common (and effective) Azure pattern is:

Client → Azure Front Door (edge) → Application Gateway (gateway) → AKS (services)

Responsibilities by layer

Azure Front Door (outer edge)

  • WAF at the global edge (bot/rate/geo/IP controls depending on your plan)
  • global routing / failover
  • traffic shaping (weighted cutovers)
  • optional header manipulation with Rules Engine

Application Gateway + AGIC (gateway layer)

  • L7 routing closer to the cluster
  • standardized header rewrites (your baseline security headers)
  • request-size and timeout guardrails
  • TLS termination with Key Vault integration

AKS + apps

  • final CORS enforcement (never rely only on edge)
  • authN/authZ and business logic security
  • app-level rate limiting where appropriate
  • correct forwarded header / scheme handling

The key DevOps/IaC point: you now have two control planes (Azure + Kubernetes). Terraform should own the Azure edge/gateway and the “wiring,” while Kubernetes/GitOps owns service-level routing objects (Ingress, Gateway API, Services), with explicit integration points between them.


2) The migration mindset: inventory → contract → code → tests → gradual cutover

Most migrations fail because teams skip the “contract” step and jump straight to implementation.

Step A — Inventory what NGINX actually does

Capture these per host/path (because they’re often different per route):

  • TLS versions/ciphers behavior
  • HSTS policy
  • CORS allowlist + credential behavior + max-age
  • request body size limit
  • timeouts (idle/read)
  • header rewrites (request + response)
  • path normalization quirks
  • IP allowlists / rate limits / bot blocks
  • special behaviors: websockets, gRPC, long polling, large uploads

Step B — Turn inventory into a “route contract”

Instead of a spreadsheet that dies in SharePoint, put a contract file in the repo. This becomes:

  • inputs to Terraform modules (what policies apply where)
  • inputs to smoke tests (what to assert pre-prod)
  • a reviewable artifact in PRs

Example (illustrative):

# edge-contract.yml
hosts:
  - host: api.example.com
    routes:
      - name: api
        path_prefix: /api/
        max_body_mb: 1
        timeout_seconds: 30
        waf_profile: strict-v1
        security_headers: baseline-v1
        cors:
          allowed_origins:
            - https://app.example.com
            - https://admin.example.com
          allow_credentials: true
          max_age_seconds: 600

This is where IaC stops being “we used Terraform” and becomes “Terraform expresses intent.”


3) Terraform foundation: repo layout, state, environments, and drift

If your IaC foundation is shaky, the migration becomes chaos with better vocabulary.

Suggested repo layout (practical, scalable)

infra/
  modules/
    frontdoor/
    frontdoor_waf/
    app_gateway/
    appgw_waf/
    keyvault_cert/
    diagnostics/
  env/
    dev/
      main.tf
      variables.tf
      dev.tfvars
    staging/
    prod/

Why this layout works:

  • modules encode reusable architecture (Front Door, WAF, App Gateway)
  • env expresses environment-specific choices (SKU, WAF mode, domains)

Remote state (Azure Storage) and locking

Use an Azure Storage account backend so:

  • state is shared
  • state locks prevent two applies at once
  • every environment has its own state file

Example backend (in each env):

terraform {
  required_version = ">= 1.6.0"

  backend "azurerm" {
    resource_group_name  = "rg-tfstate"
    storage_account_name = "sttfstateprod"
    container_name       = "tfstate"
    key                  = "edge-gateway-prod.tfstate"
  }

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.115"
    }
  }
}

provider "azurerm" {
  features {}
}

Environment promotion model (DevOps + IaC)

A strong pattern:

  • PR runs fmt/validate/plan
  • plan output is reviewed and approved
  • apply happens via pipeline with a protected environment
  • the same module versions and inputs are promoted to staging → prod

Drift detection (IaC that stays true)

Schedule a nightly terraform plan against prod. If the plan isn’t empty, open an issue or alert.

This single practice prevents the “portal changes that nobody admits.”


4) Putting security controls where they belong — and encoding them in Terraform

Here’s the big reframing:

Security controls are part of your platform API. Terraform is how you publish and enforce that API.

We’ll structure the controls as:

  • Front Door WAF policy (edge protection)
  • App Gateway rewrite set (security headers baseline)
  • App Gateway request limits / timeouts (abuse + reliability budgets)
  • Key Vault + Managed Identity (certificate lifecycle)
  • Diagnostics settings (observability as code)

5) Terraform: Key Vault + Managed Identity for certificates (no more “PFX rituals”)

One of the highest ROI upgrades is certificate automation.

What you want

  • Certificates stored in Key Vault
  • App Gateway uses a managed identity to fetch them
  • Rotation happens by updating Key Vault, not by redeploying secrets manually

Terraform sketch (core pieces)

resource "azurerm_user_assigned_identity" "agw" {
  name                = "uai-agw-${var.env}"
  location            = var.location
  resource_group_name = var.rg_name
}

resource "azurerm_key_vault" "kv" {
  name                        = "kv-edge-${var.env}"
  location                    = var.location
  resource_group_name         = var.rg_name
  tenant_id                   = var.tenant_id
  sku_name                    = "standard"
  purge_protection_enabled    = true
  soft_delete_retention_days  = 90
}

# If you use RBAC for Key Vault:
resource "azurerm_role_assignment" "agw_kv_secrets_user" {
  scope                = azurerm_key_vault.kv.id
  role_definition_name = "Key Vault Secrets User"
  principal_id         = azurerm_user_assigned_identity.agw.principal_id
}

Then in Application Gateway, reference the cert secret:

# inside azurerm_application_gateway ...
ssl_certificate {
  name                = "api-cert"
  key_vault_secret_id = var.kv_cert_secret_id
}

Note: how you create/import the certificate into Key Vault varies by org (ACME automation, manual import, DigiCert integration, etc.). The important IaC point is: the gateway reads from Key Vault via identity, and that wiring is versioned.


6) Terraform: Application Gateway with security header rewrites + guardrails

What changes vs NGINX

NGINX often does this per-Ingress with annotations/snippets. That’s flexible, but inconsistent.

With App Gateway, you can enforce a consistent baseline via rewrite rule sets.

A practical baseline header policy

  • HSTS (careful with preload — only if you truly want it)
  • X-Content-Type-Options: nosniff
  • Referrer-Policy
  • Permissions-Policy (deny unused capabilities)

Terraform example: rewrite rule set (illustrative)

The exact variable names/condition capabilities can vary, so treat this as a pattern:

resource "azurerm_application_gateway" "agw" {
  name                = "agw-${var.env}"
  location            = var.location
  resource_group_name = var.rg_name

  identity {
    type         = "UserAssigned"
    identity_ids = [azurerm_user_assigned_identity.agw.id]
  }

  sku {
    name     = "WAF_v2"
    tier     = "WAF_v2"
    capacity = 2
  }

  # ... gateway_ip_configuration, frontend_port, frontend_ip_configuration, etc.

  rewrite_rule_set {
    name = "security-headers-v1"

    rewrite_rule {
      name          = "baseline-headers"
      rule_sequence = 100

      # Optional: only apply to certain paths/hosts if needed.
      condition {
        variable    = "var_uri_path"
        pattern     = "^/api/.*"
        ignore_case = true
        negate      = false
      }

      response_header_configuration {
        header_name  = "Strict-Transport-Security"
        header_value = "max-age=31536000; includeSubDomains"
      }

      response_header_configuration {
        header_name  = "X-Content-Type-Options"
        header_value = "nosniff"
      }

      response_header_configuration {
        header_name  = "Referrer-Policy"
        header_value = "no-referrer-when-downgrade"
      }

      response_header_configuration {
        header_name  = "Permissions-Policy"
        header_value = "geolocation=(), microphone=()"
      }
    }
  }

  # Example: attach rewrite set to a routing rule (structure depends on your listeners/rules)
  # request_routing_rule { ... rewrite_rule_set_name = "security-headers-v1" }
}

Guardrails: request size and timeouts

This is where security and reliability hold hands. You’re reducing:

  • slow-loris style abuse
  • accidental giant payloads
  • backend pool starvation

In practice you’ll encode:

  • per-route timeout limits
  • max request body size limits where supported
  • backend probe timeouts and unhealthy thresholds

The exact knobs differ by feature and SKU, but the IaC principle is stable:

Define defaults in the module. Allow overrides per route only with justification.


7) Terraform: WAF “detect → enforce” without drama

WAF migrations fail when:

  • prevention mode is turned on too early
  • exceptions are added as global disables
  • nobody owns the exceptions later

The disciplined pattern

  • run WAF in Detection in non-prod (or first sprint)
  • tag every exception with an owner and an expiry
  • switch prod to Prevention when you’ve observed real traffic

App Gateway WAF (example)

Using an azurerm_web_application_firewall_policy:

variable "waf_mode" {
  type    = string
  default = "Detection" # set to "Prevention" in prod tfvars
}

resource "azurerm_web_application_firewall_policy" "agw" {
  name                = "waf-agw-${var.env}"
  location            = var.location
  resource_group_name = var.rg_name

  policy_settings {
    enabled = true
    mode    = var.waf_mode
  }

  managed_rules {
    managed_rule_set {
      type    = "OWASP"
      version = "3.2"
    }

    # Example exclusion (scope it tightly; avoid global disables)
    exclusion {
      match_variable          = "RequestArgNames"
      selector                = "someField"
      selector_match_operator = "Equals"
    }
  }
}

Then attach it to the App Gateway:

# inside azurerm_application_gateway ...
waf_configuration {
  enabled            = true
  firewall_mode      = var.waf_mode
  rule_set_type      = "OWASP"
  rule_set_version   = "3.2"
}

(Depending on your design, you may use a WAF policy resource and associate it, or configure WAF directly on the gateway.)

DevOps rule for exceptions

Every exception is a PR change with:

  • link to a ticket
  • a scoped selector/path
  • an owner
  • an expiry date (enforced by review)

Terraform makes this enforceable because your WAF policy becomes code.


8) Terraform: Front Door (edge routing) + WAF as code

Front Door Standard/Premium is typically modeled via the azurerm_cdn_frontdoor_* resources.

What you get at the edge

  • global anycast entry point
  • TLS at the edge
  • WAF at the edge (often the first line of defense)
  • routing controls that are perfect for migration cutovers

Terraform example: Front Door profile + endpoint + origin group + route (illustrative)

resource "azurerm_cdn_frontdoor_profile" "fd" {
  name                = "fd-${var.env}"
  resource_group_name = var.rg_name
  sku_name            = "Premium_AzureFrontDoor"
}

resource "azurerm_cdn_frontdoor_endpoint" "fd_ep" {
  name                     = "fde-${var.env}"
  cdn_frontdoor_profile_id = azurerm_cdn_frontdoor_profile.fd.id
}

resource "azurerm_cdn_frontdoor_origin_group" "og" {
  name                     = "og-agw-${var.env}"
  cdn_frontdoor_profile_id = azurerm_cdn_frontdoor_profile.fd.id

  health_probe {
    path                = "/healthz"
    request_type        = "GET"
    protocol            = "Https"
    interval_in_seconds = 30
  }

  load_balancing {
    sample_size                 = 4
    successful_samples_required = 3
  }
}

resource "azurerm_cdn_frontdoor_origin" "agw_origin" {
  name                          = "agw-origin"
  cdn_frontdoor_origin_group_id = azurerm_cdn_frontdoor_origin_group.og.id

  host_name          = var.app_gateway_public_fqdn
  http_port          = 80
  https_port         = 443
  origin_host_header = var.app_gateway_public_fqdn

  priority = 1
  weight   = 1000
}

resource "azurerm_cdn_frontdoor_route" "api" {
  name                          = "route-api"
  cdn_frontdoor_endpoint_id     = azurerm_cdn_frontdoor_endpoint.fd_ep.id
  cdn_frontdoor_origin_group_id = azurerm_cdn_frontdoor_origin_group.og.id

  patterns_to_match      = ["/api/*"]
  supported_protocols    = ["Https"]
  forwarding_protocol    = "HttpsOnly"
  https_redirect_enabled = true
  link_to_default_domain = false
}

Migration superpower: weighted cutovers

If you keep NGINX as a second origin (or a parallel gateway path), you can shift traffic by changing origin weights in code.

DevOps benefit: controlled rollback is a single weight change PR/apply.


9) Terraform: Edge WAF policy (Front Door) + “detect → enforce” promotion

Front Door WAF policies can be expressed as code (resource names vary slightly by provider version, but the principle is constant).

Example pattern

  • var.edge_waf_mode = Detection in staging
  • var.edge_waf_mode = Prevention in prod
variable "edge_waf_mode" {
  type    = string
  default = "Detection"
}

resource "azurerm_cdn_frontdoor_firewall_policy" "waf" {
  name                = "waf-fd-${var.env}"
  resource_group_name = var.rg_name
  sku_name            = azurerm_cdn_frontdoor_profile.fd.sku_name

  enabled = true
  mode    = var.edge_waf_mode

  managed_rule {
    type    = "DefaultRuleSet"
    version = "2.1"
  }

  # Add tightly scoped exclusions if needed, with PR ownership discipline
}

Then associate the WAF policy with the endpoint/domain via the relevant security policy association resource (depending on Front Door configuration).

IaC win: WAF posture becomes an environment parameter, not a portal flip.
DevOps win: promotion to enforcement is a controlled release with rollback.


10) Observability as code: diagnostics, logs, alerts that actually answer questions

If you can’t correlate Front Door → App Gateway → AKS, your migration will be “successful” right until the first incident.

What to collect

  • Front Door access logs + WAF logs
  • App Gateway access/performance/firewall logs
  • AKS ingress/controller logs (AGIC), plus app logs and traces
  • correlation ID propagation (traceparent / x-correlation-id)

Terraform: Diagnostic settings (illustrative)

resource "azurerm_log_analytics_workspace" "law" {
  name                = "law-edge-${var.env}"
  location            = var.location
  resource_group_name = var.rg_name
  sku                 = "PerGB2018"
  retention_in_days   = 30
}

resource "azurerm_monitor_diagnostic_setting" "agw_diag" {
  name                       = "diag-agw"
  target_resource_id         = azurerm_application_gateway.agw.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.law.id

  enabled_log {
    category = "ApplicationGatewayAccessLog"
  }

  enabled_log {
    category = "ApplicationGatewayFirewallLog"
  }

  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

Do the same for Front Door resources.

The “tells a story” dashboards

Track:

  • WAF matches by rule ID + URL + host
  • 4xx/5xx by route
  • latency percentiles per hop (edge vs gateway vs app)
  • burst traffic vs rate limiting outcomes
  • backend pool health probe failures

11) DevOps pipeline: Terraform is the delivery mechanism for your edge

Here’s the workflow that makes IaC real:

CI (every PR)

  • terraform fmt -check
  • terraform validate
  • security scanning (tfsec/checkov/your internal rules)
  • terraform plan (published as PR artifact/comment)
  • optional: OPA/Conftest rules to enforce guardrails

CD (merge to main)

  • apply to dev/staging automatically
  • run smoke tests (routing/headers/CORS)
  • run ZAP/k6 in pre-prod
  • gated approval to apply to prod
  • progressive traffic shift via Front Door weights

Example GitHub Actions-style pipeline (illustrative)

name: edge-iac

on:
  pull_request:
    paths: ["infra/**"]
  push:
    branches: ["main"]
    paths: ["infra/**"]

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Terraform fmt/validate
        run: |
          terraform -chdir=infra/env/staging init -backend=false
          terraform -chdir=infra/env/staging fmt -check
          terraform -chdir=infra/env/staging validate
      - name: Terraform plan
        run: |
          terraform -chdir=infra/env/staging init
          terraform -chdir=infra/env/staging plan -out tfplan

  apply-staging-and-test:
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    needs: [plan]
    steps:
      - uses: actions/checkout@v4
      - name: Apply (staging)
        run: |
          terraform -chdir=infra/env/staging init
          terraform -chdir=infra/env/staging apply -auto-approve

      - name: Smoke tests (headers + routing + CORS)
        run: |
          ./tests/smoke/headers.sh
          ./tests/smoke/cors.sh

      - name: ZAP baseline (staging)
        run: ./tests/security/zap-baseline.sh

      - name: k6 smoke
        run: ./tests/perf/k6-smoke.sh

The point: the pipeline doesn’t just deploy infra. It asserts your security intent still holds.


12) Testing that matches the contract (so you catch regressions before prod)

Your tests should read from the same contract file that Terraform uses.

Header smoke test idea

For each route:

  • request /api/health (or a lightweight endpoint)
  • assert baseline headers exist
  • assert headers don’t accidentally duplicate or conflict

Pseudo-shell example:

#!/usr/bin/env bash
set -euo pipefail

URL="https://api.example.com/api/health"

headers=$(curl -s -D - "$URL" -o /dev/null)

echo "$headers" | grep -qi "strict-transport-security:"
echo "$headers" | grep -qi "x-content-type-options: nosniff"
echo "$headers" | grep -qi "referrer-policy:"

CORS smoke test idea

  • OPTIONS with allowed origin → expect allow-origin = that origin
  • OPTIONS with disallowed origin → expect no allow-origin (or a deny behavior)

WAF validation idea

  • run in detection for a sprint
  • confirm rule hits are understood and exceptions are scoped
  • promote to prevention and ensure no false positives break core flows

13) The “tiny before/after” that actually reflects IaC

Before (NGINX Ingress snippets)

  • behavior lives in scattered annotations
  • hard to guarantee consistency across services
  • changes are easy to make ad-hoc
metadata:
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "1m"
    nginx.ingress.kubernetes.io/configuration-snippet: |
      add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
      add_header X-Content-Type-Options "nosniff" always;

After (Terraform-managed baseline)

  • security headers live in App Gateway rewrite set (or Front Door rules if you choose)
  • WAF policies live in Terraform
  • cert lifecycle is Key Vault + managed identity
  • tests ensure policy doesn’t regress

That’s the real upgrade: from “ingress config” → “platform policy as code.”


14) A realistic 30-day rollout (DevOps + IaC aligned)

Week 1 — Contract + module design

  • build route inventory and encode it as edge-contract.yml
  • design Terraform modules and variable schema
  • define SLOs and regression tests

Week 2 — Provision parallel infrastructure

  • deploy Front Door + WAF (Detection)
  • deploy App Gateway + rewrite sets + diagnostics
  • install/configure AGIC to target the gateway
  • validate end-to-end in staging

Week 3 — Controlled traffic shift

  • add second origin (old NGINX path) if doing weighted migration
  • shift traffic gradually (5% → 25% → 50% → 100%)
  • tune WAF exceptions (scoped + expiring) based on real logs

Week 4 — Enforce + harden

  • promote WAF to Prevention in prod
  • tighten request size/timeouts based on measured reality
  • enable drift detection
  • remove temporary exceptions and document runbooks

15) Common migration gotchas (the stuff that bites teams at 2am)

These aren’t theoretical; they’re the usual suspects:

  • Forwarded headers differ (X-Forwarded-For, X-Forwarded-Proto, host handling) → apps may generate wrong redirects or mixed-content issues
  • Path rewrite semantics differ → /api vs /api/ and normalization edge cases
  • Health probes: probe path/auth can accidentally mark backends unhealthy
  • WebSockets / gRPC: require specific support and timeout expectations
  • WAF false positives: especially around JSON bodies, JWTs, file uploads, and certain query patterns
  • CORS duplication: if both edge and app add headers you can get inconsistent behavior
  • Cert permissions: Key Vault access miswired → gateway can’t fetch cert → outage

DevOps response: make each of these a check in pre-prod, not a lesson learned in prod.


Takeaway

If Terraform shows up only in the last paragraph, it’ll feel unrelated. In a good migration, Terraform is the spine:

  • it encodes the desired state (routes, policies, WAF mode, rewrites, cert wiring)
  • it enables reviewable change (PRs + plan diffs)
  • it enables safe delivery (progressive cutovers and easy rollback)
  • it prevents drift (scheduled plans + alerts)
  • it turns security controls into repeatable platform behavior

Ingress migrations are rare. Use them to upgrade your security posture and your delivery maturity in one go — and make the edge as boring, auditable, and testable as your application code.

If you want to continue this in the same direction, the next level is to make the edge-contract.yml the single source of truth and generate:

  • Terraform route/rule inputs
  • smoke tests (headers/CORS)
  • documentation (route inventory)
    …so policy, infrastructure, and tests literally cannot drift apart.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top