Troubleshooting¶
Symptom-first index for the errors most teams hit while signing in to or driving the dashboard. Each section is self-contained — start with the heading that matches what you see on screen or in a log.
For onboarding-time questions (how do I find the App Registration clientId, how do I grant RBAC to a teammate, etc.) start with Joining An Existing Deployment instead. This page is for things that are already broken.
Setup Required screen, or AADSTS700038 on sign-in¶
Symptom
- The SPA renders a "Setup Required" glass card instead of the Sign in page, OR
- The Microsoft sign-in popup reports
AADSTS700038: 00000000-0000-0000-0000-000000000000 is not a valid application identifier(the UUID may also be any non-UUID string).
Cause
The SPA was built with no VITE_AZURE_CLIENT_ID, or with the placeholder all-zero UUID that historically shipped in web/.env.example. The build sent that placeholder to Microsoft Entra and Entra rejected it.
Fix
-
If you are running locally, bind your clone to the existing azd environment:
local-run.sh webauto-exportsVITE_AZURE_CLIENT_IDfromAPI_CLIENT_IDin azd env. You do not editweb/.env.localfor the clientId. -
If you cannot run
azd env refresh, paste the clientId directly intoweb/.env.local: -
If the Container App rendered this in a deployed environment, it means
API_CLIENT_IDwas empty when the frontend image was built. Re-runazd provision(orscripts/dev/postprovision.sh) so the App Registration is created/resolved and--build-arg VITE_AZURE_CLIENT_ID=$API_CLIENT_ID_VALreaches the nextaz acr build.
Full clientId discovery flow: Joining An Existing Deployment → Bind your clone.
Sign-in succeeds but Dashboard cards show "access_denied"¶
Symptom
You signed in fine through the deployed SPA, but one or more cards (Storage, ACR, AKS, BLAST Databases) shows access_denied. Browser DevTools shows HTTP 403 with AuthorizationPermissionMismatch from a Storage / ARM endpoint.
Cause
The deployed SPA itself uses the shared managed identity for Azure calls, so seeing access_denied in the deployed surface usually means the MI lost a role assignment (most often after azd down followed by a fresh azd up, which creates a new MI object id).
If you are running the local backend instead, DefaultAzureCredential is using your az login identity, and your account has no RBAC on the workload Storage / ACR / RG yet.
Fix — deployed dashboard (MI lost roles)
Re-run the MI role checklist:
source <(azd env get-values -e <YOUR_ENV> | sed 's/^/export /')
# Then re-run the role assignments from docs/auth.md §0.
Full checklist: Auth → §0 Post-Deploy Permissions Checklist.
Fix — local backend (your account has no roles)
# A. Self-grant (you need User Access Administrator on the workload RG).
scripts/dev/grant-local-rbac.sh # add --dry-run to preview
# B. Deployer grants to a teammate's account.
scripts/dev/grant-local-rbac.sh --user teammate@contoso.onmicrosoft.com
Wait 1-5 minutes for RBAC propagation, then restart scripts/dev/local-run.sh api.
Full RBAC story: Joining An Existing Deployment → RBAC for the new teammate.
Dashboard cards show "network_blocked"¶
Symptom
Storage-backed cards (BLAST Databases, Queries, Results) show network_blocked. The deployed dashboard itself works.
Cause
The workload Storage account has publicNetworkAccess: Disabled (the production default). The deployed Container App reaches Storage over private endpoints from inside the VNet, but your laptop cannot reach the private endpoint. This is expected for the deployed dashboard rendered from a laptop, and for the local backend when run from outside the VNet.
Fix
Use the explicit local-debug helper to open a short IP-allowlisted window for your caller IP only — never defaultAction: Allow, never bypass: AzureServices:
scripts/dev/local-run.sh storage-on # publicNetworkAccess=Enabled with defaultAction=Deny + your IP in ipRules
# ... debug ...
scripts/dev/local-run.sh storage-off # restore publicNetworkAccess=Disabled
Status check:
The helper refuses to run inside a Container App (CONTAINER_APP_NAME guard), so it cannot accidentally weaken production. The local backend may also auto-open with LOCAL_DEBUG_AUTO_OPEN_STORAGE=true — see .github/copilot-instructions.md §9.
Do not leave the network surface open after debugging. The Storage card itself shows the current publicNetworkAccess value so you can confirm it is back to Disabled.
Sign-in works but Dashboard shows no workspace¶
Symptom
You signed in, no error message, but the Dashboard shows the empty Setup Wizard ("Select your subscription / resource group / Storage account / ACR") instead of a workspace.
Cause
The dashboard discovers workspaces by scanning subscriptions for a Storage account tagged for ElasticBLAST. Either:
- your account does not have
Readeron the workload subscription, or - the Storage account is missing the expected tag, or
- the workspace was deployed in a different subscription than the one selected by
az account set.
Fix
-
Confirm your tenant / subscription:
It must match the tenant the deployment lives in.
-
Confirm
Readeron the workload subscription (ask the deployer to grant if missing): -
Use the Setup Wizard once to pick the subscription / resource group / Storage account / ACR explicitly. The selection persists per browser.
Sign-in popup blocked, or redirect URI mismatch (AADSTS50011)¶
Symptom
- The popup closes without signing in, or
- Entra reports
AADSTS50011: The reply URL specified in the request does not match the reply URLs configured for the application.
Cause
The Container App URL was not registered as a SPA redirect URI on the App Registration. This can happen if you redeployed to a new resource group or renamed the Container App.
Fix
scripts/dev/postprovision.sh adds the deployed Container App origin automatically. To do it by hand, follow Deployment Reference → Redirect URI After Deployment.
Keep http://localhost:8090 registered as well if you also run the SPA locally.
Local scripts/dev/local-run.sh web does not pick up the clientId¶
Symptom
You ran azd env refresh, then scripts/dev/local-run.sh web, but the SPA still shows "Setup Required".
Cause
The auto-pull only triggers when VITE_AZURE_CLIENT_ID is empty or the all-zero placeholder. A stale web/.env.local from an older clone may have a non-empty value baked in.
Fix
# Check what is actually exported.
grep '^VITE_AZURE_CLIENT_ID' web/.env.local
# Either delete the line (auto-pull will fill it from azd env), or paste the correct value.
azd env get-values | grep '^API_CLIENT_ID='
Then restart scripts/dev/local-run.sh web. The log line [local-run] Picked up VITE_AZURE_CLIENT_ID from azd env (...) on stderr confirms the auto-pull fired.
Local debug as your real az-login identity (one-shot)¶
Symptom
You want the local dashboard caller chip to show your real UPN instead of anonymous, and you want the BLAST Databases / Storage cards to actually load data (not degraded access_denied) — without running four scripts by hand every session.
Cause
The default local dev mode flips AUTH_DEV_BYPASS=true (anonymous caller) and the workload Storage account starts with publicNetworkAccess: Disabled plus zero RBAC on your az login identity. Three things have to flip together for real auth to work end-to-end:
- RBAC — your account needs
Storage Blob/Table Data Contributor(+Readeron the RG,AcrPullon ACR). - Storage network — the account must be reachable from your laptop (the explicit local-debug allowlist, never
defaultAction: Allowin production). - Bypass off —
AUTH_DEV_BYPASS=falsein.envandVITE_AUTH_DEV_BYPASS=falseinweb/.env.local, plus a restart ofapi+vite.
Fix — single command
# Enable real MSAL login + ensure RBAC + open storage + restart api/web.
scripts/dev/local-debug-auth.sh on
# or:
scripts/dev/local-run.sh auth-on
# When you finish: revert to anonymous bypass + close storage network.
# RBAC is intentionally NOT revoked (cheap to keep).
scripts/dev/local-run.sh auth-off
# Print current state without mutating anything.
scripts/dev/local-run.sh auth-status
The script is idempotent and re-runnable. It auto-detects the workload storage account, ACR, and API_CLIENT_ID from azd env get-values; pass --storage NAME --storage-rg RG to target a specific deployment when multiple stelbdashboard* accounts exist in your subscription.
Useful flags:
| Flag | Effect |
|---|---|
--storage NAME --storage-rg RG |
Target a specific deployment (when azd env default ≠ the one your SPA uses). |
--acr NAME --acr-rg RG |
Override the ACR used for the AcrPull role assignment. |
--skip-rbac |
Skip the role-assignment step (if RBAC is already verified). |
--skip-storage |
Skip the storage network toggle (if you already opened it). |
--skip-restart |
Apply env changes only; restart api + vite yourself. |
--no-close-storage |
(off only) leave storage open; only flip the bypass flags. |
Permission requirements:
az loginas a user with Storage Blob Data Contributor and User Access Administrator (or Owner) on the workload Storage account scope. The script pre-checksaz role assignment listand fails fast if you cannot read assignments at that scope.Microsoft.Storage/storageAccounts/writeon the account (for the network toggle).jqandcurlonPATH(already required by sibling dev scripts).
After auth-on succeeds, open http://localhost:8090, complete the MSAL sign-in, and the caller chip should now show your UPN. /api/me will return your real oid / upn instead of the synthetic 00000000-… dev-bypass identity.
Charter §9 reminder — close the network when done. publicNetworkAccess: Enabled is a transient local-debug state. Running auth-off is enough; if you only want to close the network, scripts/dev/local-run.sh storage-off works.
In-app upgrade flow¶
The header badge never appears¶
Set UPGRADE_GIT_REMOTE on the deployed Container App and wait for the
30-minute discovery beat (or hit Check remote on /upgrade). The
URL must end in .git and resolve to a public HTTPS endpoint. The
upgrade subsystem is intentionally inert until the env is set —
upgrades.md has the full env table.
"Start" returns 403¶
Your caller oid is not in UPGRADE_ADMIN_OIDS and you do not carry
the UpgradeAdmin app role. Add your oid to the env (comma-separated)
or grant the app role:
RG=$(azd env get-value AZURE_RESOURCE_GROUP)
APP=$(azd env get-value CONTAINER_APP_NAME)
MY_OID=$(az ad signed-in-user show --query id -o tsv)
EXISTING=$(az containerapp show --name "$APP" --resource-group "$RG" \
--query "properties.template.containers[?name=='api'].env[?name=='UPGRADE_ADMIN_OIDS'].value | [0]" -o tsv)
az containerapp update --name "$APP" --resource-group "$RG" \
--set-env-vars UPGRADE_ADMIN_OIDS="${EXISTING:+$EXISTING,}$MY_OID"
"Start" returns 409 — upgrade already in progress¶
upgradestate row is not idle. Inspect the row state on the
/upgrade page; if a previous attempt left it in failed_pre or
failed_rollout, transition it back to idle by clicking Rollback
(if a snapshot exists) or by clearing the row manually with the Azure
Storage Explorer / az storage entity replace.
Upgrade stayed in rolling_out past the budget¶
The reconciler's stuck guard moves the row to failed_rollout after
15 minutes — or as fast as 2 minutes when the ACA template clearly
does not carry the target version. If the new revision is actually
unhealthy:
- Read the per-component build log on
/upgrade(orcurl /api/upgrade/jobs/<job_id>/build-log/api). - Click Rollback; the dashboard refuses if ACR no longer carries the snapshot tags — see the next section.
- If even the api sidecar is unreachable, copy the Recovery
commands from
/upgrade(or/api/upgrade/escape-hatch) and paste them into anyaz login-ed shell.
Rollback says "ACR no longer carries the snapshotted tags"¶
Retention has purged at least one of the per-sidecar image tags. The
rollback PATCH would succeed but ACA would crashloop on
ImagePullBackOff. Recovery options:
- Re-build the older release locally —
azd upfrom a checkout of the prior git tag rebuilds the missing tags with the same names. - Or pick a forward upgrade to a known-good newer tag instead of rolling back.
Bump ACR retention so the next rollback succeeds:
az acr config retention update \
--registry "$(azd env get-value PLATFORM_ACR_NAME)" \
--status enabled --days 180 --type UntaggedManifests
Build logs are empty / 404¶
The Append Blob is only created when the az acr build for that
component actually starts. If the upgrade failed_pre during clone or
before the building state, no log was produced. Inspect the row's
phase_detail and the audit history (/api/upgrade/history).
azd env refresh fails with "no environment selected"¶
Symptom
Cause
The clone has never had an azd environment created. azd env refresh only binds an environment that already exists in your local clone.
Fix
azd env new creates the local stub; azd env refresh then fills it from the deployment outputs.
Where to go next¶
- Joining An Existing Deployment — happy path for the same workflow.
- Auth — full RBAC matrix for the managed identity, and the post-deploy permissions checklist.
- Deployment Reference — manual
azdflow, redirect URI setup, lockdown.