Skip to content

2026-05-15 — Fast Container App debug loop

Motivation

Each Azure Container Apps debug cycle has been costing 5-10 minutes:

edit → az acr build (~2-3 min) → az deployment group create (~1-2 min)
     → wait revision (~60 s) → az containerapp logs show (snapshot)

Over the last week the team burned multiple of these cycles re-discovering that the bugs (celery shared_task routing trap, redis startup ordering, terminal exec_server contract, route registration order vs. the frontend_proxy catch-all) are app-logic bugs, not infra bugs — they are reproducible without ARM, without ACR, and without a private endpoint.

The existing scripts/dev/docker-compose.local.yml only carried two sidecars (api + frontend), so none of those failure modes could be caught locally.

User-facing change

A new three-tier debug loop, documented in scripts/dev/README.md:

Tier Tool Cycle time Catches
1 uv run pytest -q api/tests seconds Pure unit logic
2 docker compose -f scripts/dev/docker-compose.full.yml up --build ~5 s incremental Celery routing, redis-wait, terminal exec, route order, WS proxy, SPA fallback
3 scripts/dev/quick-deploy.sh <sidecar> [--logs] ~1-2 min MI / private endpoint / real Storage

Tier 2 is new: a 6-sidecar mirror of the Container App with api/ bind-mounted into api/worker/beat for live uvicorn --reload, a real redis:7-alpine, the real terminal sidecar (exec_server bound to 0.0.0.0 via EXEC_HOST for cross-container reach), and vite dev with HMR for the SPA. Wired with the same env contract the Bicep uses (CELERY_BROKER_URL, TERMINAL_EXEC_UPSTREAM, EXEC_TOKEN, etc.) so the upstream URLs are the only things rebound.

Tier 3 is new: a one-image quick deploy that does a single az acr build for the changed sidecar and a single az containerapp update --container-name <x> --image <new> (no Bicep, no env / probe / secret changes). When the api image changes, worker and beat are bumped to the same tag automatically — they share elb-api, and leaving them on a stale tag was the root cause of last week's "fix lands but task still runs old code" confusion.

Files

No production code changed. The terminal sidecar's exec_server already honours EXEC_HOST (default 127.0.0.1); compose just sets it to 0.0.0.0 so the api container on the bridge network can reach terminal:7682. ttyd's loopback bind is unchanged in production.

Validation evidence

$ bash -n scripts/dev/quick-deploy.sh && echo OK
OK

$ docker compose -f scripts/dev/docker-compose.full.yml config -q && echo OK
OK

A live-build smoke is intentionally not run as part of this commit — docker compose ... up --build triggers two image builds (api + terminal) that take several minutes; the developer running the loop will pay that cost once, then iterate at near-zero marginal cost via the bind mounts.

Out of scope

  • No change to postprovision.sh. Sidecar layout / env / probes / secrets still flow through Bicep there.
  • No change to the production EXEC_TOKEN story — the dev token is a fixed string in compose only.
  • No CI integration of docker-compose.full.yml. (Tier 1 pytest already runs in CI.)