BLAST DB hardening round 2 — concurrency, signatures, security, cancel UX¶
Date: 2026-05-22 Scope: backend + frontend
Motivation¶
Round 1 (2026-05-22-blast-db-download-hardening.md) closed the 20 most visible critique items for the DB download surface. A widened critique surfaced 20 more issues across adjacent surfaces (sharding, oracle, terminal WebSocket, NCBI catalogue resilience, frontend reactive bugs). This change closes those.
User-facing change¶
- Cancel a stuck download. Catalog rows now show a "Cancel" button during in-flight copies; clicking aborts every pending blob copy and flips the row to a clean state immediately (no 2 h stale-recovery wait).
- Retry button works even while another DB is downloading. Previously
the per-row "Retry" was greyed by the global
downloadDisabledlock for partial / init_failed rows; now retries are per-DB. - No more repeated partial-copy toasts. The completion-detection
effect dedupes by
(db, terminal phase)so a 10 s poll loop doesn't re-toast the same error. - Multi-shard DBs now correctly detect updates. Update detection
uses a composite signature (hash of N evenly-spaced
.tar.gz.md5ETags) instead of a single shard's ETag, so an NCBI rotation that touches only the last shard still surfaces as "Update available". - NCBI throttling is contained. Repeated 403/5xx trips a circuit breaker for 2 min — users see a fast, actionable error instead of N seconds of timeout per call and NCBI sees zero traffic from us during cooldown.
API / IaC diff summary¶
New routes¶
POST /api/storage/prepare-db/{db_name}/cancel— abort pending blob copies + writecopy_status.phase=cancelledmetadata.
Hardened routes¶
POST /api/blast/databases/{db}/shard— daemon now uses the ETag-aware_update_metadatahelper for every write so a concurrent prepare-db / warmup writer cannot clobber the shard fields.POST /api/blast/databases/{db}/oracle— refuses (409) when the DB'scopy_status.phaseispartial,init_failed, orcopying(was: only checkedupdate_in_progress).GET /api/blast/databases/check-updates— comparison precedence is nowcomposite_signature>signature_etag>source_version vs snapshot. Legacy DBs without ETag fields fall back transparently.POST /api/storage/prepare-db— records a DB-ops audit JobState (op=prepare_db) so the existing/api/audit/logsurface picks it up.
Hardened services¶
api/services/ncbi_catalogue.py:- Composite signature samples up to
NCBI_SIGNATURE_SAMPLE_COUNT(default 8).tar.gz.md5ETags and SHA-256s them into a 16-char marker. httpx.Clientnow has an explicit 30 s timeout._BLAST_VOLUME_SUFFIXESadds.nos,.pos,.nxm,.pxmso volume counts stay accurate for protein DBs.api/routes/storage/common.py:- Circuit breaker (
_NCBI_BREAKER_THRESHOLDconsecutive failures,_NCBI_BREAKER_COOLDOWNseconds) on_resolve_latest_dir+_list_keys. Refuses inbound calls while open. api/routes/storage/prepare_db.py:- Lock registry GC: caps at 256 entries, evicts free locks first.
- Cooperative shutdown event (
_SHUTDOWN_EVENT) interrupts the 60 s poll sleep on SIGTERM. - ETag-aware metadata writes also write
composite_signatureon promotion. api/services/db_ops_audit.py(new): focused helper that creates a JobState + jobhistory entry forprepare_db,shard,oracle, andprepare_db_cancelactions.api/routes/terminal_ws.py: CSWSH defence — WebSocket upgrade now refuses Origin headers that are not same-origin or inTERMINAL_WS_ALLOWED_ORIGINS.TERMINAL_WS_ALLOW_ANY_ORIGIN=trueescape hatch for local dev only.
Frontend¶
web/src/components/cards/storage/useBlastDb.ts:- Dedup ref so partial-copy toasts fire once per (db, phase).
- New
handleCancelaction calls the cancel route + clears local state + drops the dedup entry so a retry produces a fresh toast. web/src/components/cards/storage/useDbPreviews.ts:- Memoised
byNamemap (downstreamuseMemono longer re-fires per render). skipNamesparameter — modal passes the set of already-Ready DBs so the per-modal-open HEAD volume drops.web/src/components/cards/storage/BlastDbRow.tsx:onCancelprop wired; "Cancel" button shows during in-flight copy.- "Retry" Get button is per-DB (no longer gated by another DB's in-flight download).
web/src/api/monitoring.ts:cancelPrepareBlastDbtyped client.web/src/api/blast.ts:BlastDatabasegainscomposite_signature;checkUpdatesresponse addscomposite_signatureper entry.
Validation evidence¶
New tests:
- api/tests/test_ncbi_breaker_composite.py — circuit breaker open/close
+ composite signature detects later-shard rotation.
- api/tests/test_prepare_db_routes.py — 409 on live daemon, cancel
aborts pending copies, cancel refuses when already completed.
- api/tests/test_terminal_ws_origin.py — same-origin allowed, unknown
origin rejected, explicit allowlist works, bypass flag permits all.
- api/tests/test_blast_databases_check_updates.py — legacy ETag-empty
falls back to snapshot diff; composite signature takes precedence.
Lint clean on every changed file.
Out of scope¶
- The composite signature still samples a fixed N (default 8)
.tar.gz.md5ETags. A truly bulletproof signature would HEAD every md5, but that costs N HEADs perpreview_databasecall. 8 covers every multi-shard layout NCBI publishes today. _SHUTDOWN_EVENTis wired into the poll loop but not yetset()from any lifespan / signal handler — that requires a separate change toapi/main.pyand adds risk to the unrelated request-handling path. Once set, stale-recovery + ETag idempotence already cover the partial-completion case.- The frontend build currently fails on a pre-existing TS error in
web/src/pages/UpgradePage.tsx(unrelated WIP — unusedconfirmBreakingstate). Not addressed here.