Sidecar Reporter Fallback¶
Motivation¶
The deployed Container App revision can run the frontend and terminal containers successfully while the dashboard still marks those sidecars as unavailable. Their standalone metrics reporters exited when /sys/fs/cgroup/cpu.stat was unavailable, so the API stopped receiving sidecar:metrics:<name> heartbeats in Redis.
User-facing change¶
The Sidecars card now receives heartbeat snapshots from frontend and terminal even when cgroup v2 files are not mounted. In that environment the reporters publish procfs-based self-process metrics with source: "procfs" instead of exiting.
API / IaC diff summary¶
- Updated
web/cgroup_reporter.pyandterminal/cgroup_reporter.pyto fall back from cgroup v2 files to/proc/self/statand/proc/self/status. - No API route, storage, RBAC, or Bicep changes.
Validation evidence¶
python3 -m py_compile web/cgroup_reporter.py terminal/cgroup_reporter.pyuv run ruff check api web/cgroup_reporter.py terminal/cgroup_reporter.pyuv run pytest -q api/tests→ 604 passed.- Local fallback smoke: forced
CGROUP_ROOTto a missing path and verified both scripts selectsource: "procfs". - ACR build and Container App rollout:
elb-frontend:20260518020906-reporterfixbuilt in 91 seconds.elb-terminal:20260518020906-reporterfixbuilt in 337 seconds.ca-elb-control--0000044becameRunningAtMaxScale/Healthy.- All six containers reported
Runningandready=true.
- Internal Redis heartbeat check from the
redissidecar returned bothsidecar:metrics:frontendandsidecar:metrics:terminalpayloads withsource: "procfs". - Public endpoints:
/→ HTTP 200./api/health→ HTTP 200, revisionca-elb-control--0000044./api/terminal/health→ HTTP 200,status: "ok".
- ACR was restored to
publicNetworkAccess: DisabledandnetworkRuleSet.defaultAction: Denyafter the build.