Skip to content

k8s_monitoring — shared ThreadPoolExecutor (drop per-call spawn)

Motivation

k8s_warmup_status and _warmup_pods_and_logs each created a fresh ThreadPoolExecutor(...) per call via with blocks. On every monitor poll (4-8 s dashboard cadence × multiple users) the worker spawned and tore down 6 + 12 threads — pthread_create cost plus Python's _thread.start_new_thread overhead added up.

User-facing change

None. Same fan-out behaviour, lower per-call overhead, no thread exhaustion under heavy polling.

API / IaC diff

  • api/services/k8s_monitoring.py
  • Added _k8s_fanout_pool() returning a process-shared ThreadPoolExecutor(max_workers=_K8S_FANOUT_POOL_MAX_WORKERS=16), with env override K8S_FANOUT_POOL_MAX_WORKERS.
  • atexit.register(_shutdown_k8s_fanout_pool) so the pool is torn down on interpreter shutdown.
  • k8s_warmup_status and _warmup_pods_and_logs now reuse the shared pool via pool.submit(...) / pool.map(...) instead of spawning a new executor per call.

Validation

  • uv run pytest -q api/tests/test_k8s_warmup_status_parallel.py api/tests/test_k8s_release_stale_warmup_jobs.py — 8 passed.
  • uv run ruff check api/services/k8s_monitoring.py — clean.