DB Order Oracle Cache¶
Motivation¶
Positive-hit Web BLAST equivalence is blocked by top-N tied-hit selection at the max_target_seqs boundary. A query-specific strict top-N oracle can prove equality, but it is too expensive to discover during every BLAST submit. BLAST submit must stay fast.
User-facing change¶
- The BLAST database manager now exposes an order-oracle build action for warmed databases.
- The action is intended to run when a DB snapshot changes, or when the user explicitly clicks the build button.
- Precise sharded BLAST submits no longer need to generate oracle data on the search path. Cached DB-order oracle use is an explicit submit opt-in so the default path does not download very large full-database oracle parts.
- Database rows surface cached oracle status and ready part counts.
- Warmup status now detects stale completed warmup Jobs that are pinned to nodes that disappeared after an AKS stop/start cycle.
API and runtime diff¶
- Added
POST /api/blast/databases/{db_name}/oracleto create one Kubernetes Job per warmed shard. - Added
api/services/db_order_oracle.pyfor stable oracle status/part paths and Kubernetes Job manifest generation. - Added
api/services/blast_oracles.pyso tie-order oracle normalization, source-version checks, Storage uploads, and finalizer pointer manifests live outside the Celery task orchestrator. - Added
api/services/blast_db_metadata.pyso DB-name extraction and{db}-metadata.jsonlookup are shared by submit config generation and oracle attachment instead of being duplicated in task code. GET /api/blast/databasesnow includesdb_order_oraclemetadata fromblast-db/metadata/oracles/<db>/status.jsonand part blobs.api/tasks/blast.pynow attachesmetadata/tie-order-oracle-urls.txtonly when the submit payload explicitly setsuse_db_order_oracle=true, the cached parts are complete, and the oraclesource_versionmatches the downloaded database metadata when both are available.terminal/patch_elastic_blast.pynow patches the finalizer to download DB-order oracle part URLs, concatenate them in part order, and exportELB_TIE_ORDER_FILE.api/services/warmup_jobs.pynow reports shard node names and host paths;api/services/k8s_monitoring.pymarks warmup asStalewhen completed Jobs target nodes that are no longer Ready.
Validation evidence¶
- Focused backend tests:
uv run pytest -q api/tests/test_db_order_oracle.py api/tests/test_warmup_jobs.py api/tests/test_storage_data.py api/tests/test_blast_tasks.py->111 passed. - Full backend tests:
uv run pytest -q api/tests->604 passed. - Backend lint:
uv run ruff check api->All checks passed!. - Frontend build previously passed for the UI button/status change; rerun pending after final live probe.
- Live AKS check: cluster
elb-clusterisSucceeded/Running. - Live warmup remediation: existing
core_ntwarmup Jobs were stale after AKS restart because they targeted removed nodes (...00athrough...00j). They were released and recreated on current Ready nodes (...00uthrough...013). Backend warmup status now reportscore_ntasReadywith10/10completed shards. - Live network remediation:
elbstg01.blob.core.windows.netinitially resolved to public IP20.150.4.36inside AKS whilepublicNetworkAccesswasDisabled, causing warmupAuthorizationFailure. Created blob private endpointpe-elbstg01-blob, private DNS zoneprivatelink.blob.core.windows.net, VNet link, and DNS zone group. AKS now resolveselbstg01.blob.core.windows.netto private IP10.224.0.15. - Live oracle build: created 10 DB-order oracle Jobs for
core_ntrun20260517164853-89081927. All 10 completed and uploaded parts underblast-db/metadata/oracles/core_nt/parts/20260517164853-89081927/. - Uploaded
blast-db/metadata/oracles/core_nt/status.jsonfrom inside AKS withstatus=ready,expected_parts=10, andready_parts=10. - Per-shard oracle completion counts:
00=14215475,01=14220903,02=14110105,03=13988357,04=14146638,05=14312905,06=14380566,07=14053353,08=14232474,09=10285005accessions. - Follow-up safety regression on 2026-05-18: focused tests
uv run pytest -q api/tests/test_blast_submit_route_options.py api/tests/test_blast_tasks.py api/tests/test_storage_data.py api/tests/test_sharded_merge.pyreported103 passed; SRP follow-up testsuv run pytest -q api/tests/test_blast_db_metadata.py api/tests/test_blast_oracles.py api/tests/test_blast_tasks.py api/tests/test_blast_submit_route_options.py api/tests/test_storage_data.py api/tests/test_sharded_merge.py api/tests/test_compare_blast_web_xml_outfmt6.pyreported109 passed; full backend testsuv run pytest -q api/testsreported635 passed; coverage now verifies explicit DB-order oracle opt-in, submit forwarding of oracle controls, source-version stale protection, merge oracle handling, extracted oracle and DB metadata service boundaries, URL-shaped DB parsing, and strict oracle accession type validation. - Local smoke hardening on 2026-05-18:
scripts/dev/local-run.sh smokereported27/27 passedagainsthttp://127.0.0.1:8085; the smoke probe now supplies the required AKSsubscription_id, reads complete JSON bodies for large API responses, and rejects non-http(s) smoke URLs.
Residual risk¶
- DB-order oracle is a cached tie-breaker, not yet a proven replacement for a query-specific strict top-N membership oracle on F3L/core_nt. Because full
core_ntoracle parts are large, the default precise submit path remains small and evidence-focused; the next live proof should opt in deliberately and compare against same-snapshot Web/full-run evidence. - The live
rg-elb-01workload resources are older than the Container Apps IaC target. The private endpoint repair was applied directly to the running environment; the activerg-elb-caIaC already contains private endpoints for its managed storage account.