k8s_monitoring — shared ThreadPoolExecutor (drop per-call spawn)¶
Motivation¶
k8s_warmup_status and _warmup_pods_and_logs each created a fresh
ThreadPoolExecutor(...) per call via with blocks. On every monitor
poll (4-8 s dashboard cadence × multiple users) the worker spawned and
tore down 6 + 12 threads — pthread_create cost plus Python's
_thread.start_new_thread overhead added up.
User-facing change¶
None. Same fan-out behaviour, lower per-call overhead, no thread exhaustion under heavy polling.
API / IaC diff¶
api/services/k8s_monitoring.py- Added
_k8s_fanout_pool()returning a process-sharedThreadPoolExecutor(max_workers=_K8S_FANOUT_POOL_MAX_WORKERS=16), with env overrideK8S_FANOUT_POOL_MAX_WORKERS. atexit.register(_shutdown_k8s_fanout_pool)so the pool is torn down on interpreter shutdown.k8s_warmup_statusand_warmup_pods_and_logsnow reuse the shared pool viapool.submit(...)/pool.map(...)instead of spawning a new executor per call.
Validation¶
uv run pytest -q api/tests/test_k8s_warmup_status_parallel.py api/tests/test_k8s_release_stale_warmup_jobs.py— 8 passed.uv run ruff check api/services/k8s_monitoring.py— clean.