CLI Rolling Update (git pull + build + deploy)¶
This page is the workstation-driven path for rolling out new code to a
deployed dashboard. It is paired with the script
scripts/dev/cli-upgrade.sh.
Quick rolling update (TL;DR)
Pick the path that matches who you are. Both deploy all six sidecars
(api / worker / beat / frontend / terminal / redis) by rebuilding the
three custom images (elb-api, elb-frontend, elb-terminal) and
swapping the Container App template via postprovision.sh.
You are deploying a tagged release from origin/main (or a
release branch) without local edits. This is the safest path —
the SPA header vA.B.<build> · <short-sha> will match exactly
what is in git, so future "what shipped?" questions are trivial.
# 1. Land the release on your workstation.
git fetch --tags origin
git checkout main && git pull --ff-only
# 2. Preview the plan (no build, no PATCH).
scripts/dev/cli-upgrade.sh full --dry-run
# 3. Deploy.
scripts/dev/cli-upgrade.sh full --yes
- No
--allow-dirty: the script refuses to proceed if the tree is dirty, which is exactly the guardrail you want. --pullis intentionally not passed in step 3 — you already pulled in step 1 and saw what landed.
You are iterating on api/, web/, terminal/, or infra/ and
want to ship the working tree. Commit first so the SPA header
SHA matches the deployed code (az acr build packages whatever
is on disk regardless of git state — see
§ "Working tree, git, and the SPA header").
# 1. Commit (or stash, then unstash after deploy).
git add -A && git commit -m "feat(scope): summary"
# 2. Preview the plan.
scripts/dev/cli-upgrade.sh full --dry-run
# 3. Deploy.
scripts/dev/cli-upgrade.sh full --yes
If you absolutely must deploy uncommitted edits (e.g. quick
production hotfix you will commit immediately after), add
--allow-dirty to acknowledge the SHA mismatch:
Then commit the same diff right after the deploy succeeds,
and record the commit SHA in the per-feature change note under
docs/features_change/.
Only edited api/ code (no infra/, terminal/, or sidecar
layout change)? The faster api scope rebuilds one image and
patches api+worker+beat in ~60 s:
For either path: snapshot + /api/health poll + auto-rollback still
run. Tune the budget with --health-timeout 300 when the terminal
sidecar was rebuilt or the app was scaled to zero.
Prefer the in-browser upgrade when possible
The browser-driven In-app Upgrade does the
same thing without a workstation: it polls the configured git remote
for a new release tag, runs az acr build for the three sidecar
images, PATCHes the Container App template, and auto-rolls back on
failure. Use the CLI path only when that flow is not available.
When to use which path¶
| Situation | Use |
|---|---|
UPGRADE_GIT_REMOTE is configured and the SPA is reachable |
In-app Upgrade — no shell needed. |
In-app upgrade is disabled (UPGRADE_GIT_REMOTE unset) or no UpgradeAdmin is available |
cli-upgrade.sh <scope> from a workstation that has az login. |
| Sidecar layout / probes / scale rules changed (anything outside container images) | cli-upgrade.sh full — runs the full postprovision.sh template swap. |
| The SPA is down — the browser cannot drive a rollback | cli-upgrade.sh rollback against the snapshot file. |
You only edited code in api/ and want a 60-second cycle |
quick-deploy.sh api directly (no snapshot envelope). |
What the script does (envelope around quick-deploy.sh / postprovision.sh)¶
Working tree, git, and the SPA header¶
az acr build packages the current working tree (filtered by
.dockerignore) as the build context. It does not care whether files
are staged, committed, or pushed — whatever is on disk at build time
goes into the image. --allow-dirty only suppresses the dirty-tree
guardrail; it does not change what gets packaged.
The SPA header vA.B.<build> · <short-sha> is resolved on the build
host by scripts/dev/quick-deploy.sh
and scripts/dev/postprovision.sh
and passed to az acr build as --build-arg. The short-sha comes from
git rev-parse --short HEAD, i.e. the last commit. Consequence:
| Pre-deploy git state | Code shipped | SPA header SHA | Traceability |
|---|---|---|---|
| Clean (committed) | HEAD | matches HEAD | ✅ trivial — git show <sha> reproduces it |
Dirty (--allow-dirty) |
working tree | matches previous HEAD | ⚠ header lies — diff exists only on your laptop |
Verification when you want to confirm a specific file made it into the deployed image:
az containerapp exec \
--name "$CONTAINER_APP_NAME" --resource-group "$AZURE_RESOURCE_GROUP" \
--container api --command "sha256sum /app/api/main.py"
sha256sum api/main.py # local comparison
Same hash → shipped as intended. Different → check
.dockerignore
or whether a later build stage overwrote the file.
Preflight checklist¶
The script enforces these automatically and refuses to proceed if any fails:
| Check | What it guards against |
|---|---|
az account show succeeds |
Stale or missing az login |
AZURE_RESOURCE_GROUP, ACR_NAME, ACR_LOGIN_SERVER, CONTAINER_APP_NAME, CONTAINER_APP_FQDN are set (auto-loaded from azd env get-values) |
Pointing at the wrong app |
git status --porcelain is empty |
Building with uncommitted edits silently shipping debug code (override with --allow-dirty) |
--pull only on the branch you started on |
Accidental pull of a feature branch into main |
git pull --ff-only |
Non-fast-forward pulls leaving a merge commit you did not intend |
| Snapshot of current revision + image refs taken before any PATCH | Losing the previous tags to roll back to |
Workload Storage parity: refuses when publicNetworkAccess=Disabled AND no approved Private Endpoint exists on the account |
Deploying into a state where the Container App has no network path to Storage (worker would fail every minute on 403 AuthorizationFailure). Override with --skip-parity-check. |
Exclusive lock on the snapshot file (flock on /tmp/elb-upgrade-snapshot-<app>.json.lock) |
Two operators racing concurrent deploys against the same Container App and corrupting the rollback snapshot — the second run is rejected with a clear error |
Deploy history¶
Every run appends one JSON line per terminal outcome to
$ELB_UPGRADE_HISTORY (default ~/.elb-upgrade-history.jsonl):
{"ts":"2026-05-23T03:22:48Z","scope":"full","app":"ca-elb-dashboard","tag":"20260523122407-58cc179","head_sha":"58cc179","result":"success","elapsed_seconds":127,"message":""}
Possible result values:
| Result | Meaning |
|---|---|
success |
Upgrade completed; /api/health/ready returned 200 within the timeout |
dry_run |
Skipped — dry-run never writes a history entry (the recording function early-returns) |
parity_rejected |
Storage parity preflight blocked the deploy |
build_in_progress |
The build step (quick-deploy.sh or postprovision.sh) was running when the script exited — the last successful state before a build failure |
upgrade_failed_rolled_back |
New tag failed /api/health/ready; auto-rollback to the snapshot succeeded |
rollback_failed |
Auto-rollback PATCH applied but /api/health/ready still fails — manual intervention needed |
rollback_success |
Explicit cli-upgrade.sh rollback scope completed and healthy |
aborted_by_user |
Interactive Proceed? prompt was declined |
aborted |
Catch-all for Ctrl+C, SIGTERM, internal errors, or any path that exited before setting an explicit result |
Useful queries:
# Most recent 5 runs
tail -5 ~/.elb-upgrade-history.jsonl | jq .
# Outcome counts in the last 30 days
jq -r 'select(.ts > "'$(date -u -d '30 days ago' +%Y-%m-%d)'") | .result' \
~/.elb-upgrade-history.jsonl | sort | uniq -c | sort -rn
# Average elapsed_seconds for successful 'full' deploys
jq -r 'select(.result=="success" and .scope=="full") | .elapsed_seconds' \
~/.elb-upgrade-history.jsonl | awk '{s+=$1; n++} END {print s/n}'
The file is best-effort: a missing $HOME or read-only filesystem never
blocks a deploy. Single-line appends are < PIPE_BUF (4 KiB) so the
shell's O_APPEND redirect is atomic against concurrent writers — no
additional locking needed.
Recommended workflow¶
Routine code-only update (api sidecar)¶
# 1. Pull, build, deploy api+worker+beat, then auto-rollback on /api/health failure.
scripts/dev/cli-upgrade.sh api --pull
# 2. Watch the new revision's logs (optional).
scripts/dev/cli-upgrade.sh api --pull --logs
Frontend SPA bundle change¶
# Vite build args (VITE_AZURE_CLIENT_ID etc.) are picked up by quick-deploy.sh
# from azd env values automatically — no manual env juggling.
scripts/dev/cli-upgrade.sh frontend --pull
Sidecar layout / Bicep / terminal base image changed¶
Roll back from a workstation¶
# Read the snapshot taken on the most recent upgrade run on this workstation
# and re-PATCH every sidecar back to those image refs.
scripts/dev/cli-upgrade.sh rollback --yes
The snapshot file is per-app (/tmp/elb-upgrade-snapshot-<app>.json by
default; override with ELB_UPGRADE_SNAPSHOT). If you move workstations
between the upgrade and the rollback, copy the snapshot file across — or
fall back to the manual rollback below.
Manual rollback (when the script is unavailable)¶
The script's safety net is a single az containerapp update --container-name <name> --image <previous-image>
per sidecar. Reproduce it by hand:
# 1. Find the previous active revision (the one BEFORE the broken one).
az containerapp revision list \
--name "$CONTAINER_APP_NAME" --resource-group "$AZURE_RESOURCE_GROUP" \
--query "sort_by([], &properties.createdTime)[-2:].{name:name, active:properties.active, created:properties.createdTime}" \
-o table
# 2. Pull its per-sidecar image refs.
az containerapp revision show \
--name "$CONTAINER_APP_NAME" --resource-group "$AZURE_RESOURCE_GROUP" \
--revision "<previous-revision-name>" \
--query "properties.template.containers[].{name:name, image:image}" \
-o table
# 3. PATCH each container back to the captured image.
az containerapp update \
--name "$CONTAINER_APP_NAME" --resource-group "$AZURE_RESOURCE_GROUP" \
--container-name api --image "$ACR_LOGIN_SERVER/elb-api:<previous-tag>"
# (repeat for worker, beat, frontend, terminal as needed)
# 4. Wait for /api/health.
curl -fsS "https://$CONTAINER_APP_FQDN/api/health"
Health-check budget¶
The script polls https://<fqdn>/api/health/ready every 5 seconds for
--health-timeout seconds (default 180). Tune it with
--health-timeout 300 when:
- The terminal sidecar was rebuilt (cold container, large layer).
- The Container App was scaled to zero before the upgrade (revision warmup).
- A managed-identity refresh is in progress (typically <30 s).
/api/health/ready is the deep readiness probe — it checks the Redis
broker, the Managed Identity credential, the terminal sidecar's loopback
exec server, and a cheap list_tables(top=1) call against the workload
Storage Table data plane. A 200 means the api sidecar is up AND every
critical downstream is actually reachable. On any 503 the script dumps
the response body to stderr so you can see which component is down
before the auto-rollback kicks in.
The cheap /api/health (liveness) endpoint stays in place for Container
Apps platform probes — never use it as a deploy verification gate, it
does not call Azure at all.
Common failure modes¶
| Symptom | Most likely cause | Fix |
|---|---|---|
ACR no longer carries the snapshotted tags (rollback) |
ACR retention policy purged the previous tag. | Bump retention before next upgrade: az acr config retention update --registry "$ACR_NAME" --status enabled --days 180 --type UntaggedManifests. Re-build the older release locally to restore the missing tag. |
Auto-rollback says PATCH succeeded but /api/health still 5xx |
The previous tag also depends on a sidecar image that was purged, OR Storage / Key Vault private endpoint is down. | Inspect az containerapp logs show --container api --type system --tail 100 and az containerapp logs show --container api --tail 100. |
git pull --ff-only failed |
A teammate force-pushed or the working branch is diverged. | Rebase locally and resolve manually; do not pass --allow-dirty to bypass. |
403 on az containerapp update |
Caller's az login identity lacks Contributor on the Container App. |
Use the deploying account, or have the deployer add a Container Apps Contributor role assignment. |
New revision crash-loops with ImagePullBackOff |
Build succeeded but ACR pull permission for the Container App's MI is broken. | Run scripts/dev/postprovision.sh once to re-grant AcrPull. |
| Health check passes but the SPA fails to load | VITE_API_BASE_URL leaked from web/.env.local into the frontend build. |
The script unsets it; if you bypassed it, cli-upgrade.sh frontend --pull will overwrite. |
Preflight rejects with Storage '...' is unreachable from the Container App |
Workload Storage is publicNetworkAccess=Disabled (most often left over from a local-debug storage-public-access.sh off / local-run.sh storage-off / auth-off) AND the deployment never created Private Endpoints (LOCKDOWN_PRIVATE_NETWORKING=false). |
Quick: scripts/dev/storage-public-access.sh on --account <acct> --rg <rg>. Proper: azd env set LOCKDOWN_PRIVATE_NETWORKING true && azd provision. Last-resort override: --skip-parity-check (workload will still fail Storage calls). |
/api/health/ready returns 503 with azure_storage: down in the body |
The api sidecar can reach Azure AD but not the Storage data plane. Same cause as the preflight rejection above, OR transient Azure outage, OR the workload Managed Identity is missing Storage Table Data Contributor on the workload storage account. |
Confirm MI role: az role assignment list --assignee <mi-principalId> --scope <storage-id>. If correct, run the Storage recovery from the row above. The azure_storage.error_class field in the same body (e.g. HttpResponseError, ServiceRequestError, ClientAuthenticationError) tells you the SDK exception category at a glance. |
Preflight rejects with another cli-upgrade run holds /tmp/elb-upgrade-snapshot-<app>.json.lock |
Another cli-upgrade.sh is already running against the same Container App on this workstation, or a previous run was killed before releasing the flock(2) advisory lock. |
If a peer is genuinely deploying, wait for them to finish. If the lock is stale (no cli-upgrade.sh process exists), remove the lockfile: rm /tmp/elb-upgrade-snapshot-<app>.json.lock. |
What this script does not do¶
- No
azd provision. Infra underinfra/*.bicepis not re-applied. Useazd up(orazd provision && cli-upgrade.sh full) for Bicep changes. - No multi-revision blue/green. The bundled Container App is
minReplicas: 1, maxReplicas: 1, revisionsMode: single. Rollback is a fast re-PATCH, not arevision activate. - No cross-tenant deploy. The script honours the current
az logincontext — there is no tenant-switching flag. - No automatic
git push. It only pulls. Whatever you build is the tip of the branch on the workstation at that moment.
Related references¶
- Deployment Reference — the prerequisites, Bicep modules, and the full
azd upflow. - In-app Upgrades — the browser-driven equivalent.
- Runtime Plan — RBAC + identity matrix the
az containerapp updatePATCH depends on. - Container Apps Architecture — sidecar layout and the
quick-deploy.shconstraints.