2026-05-17 — Web BLAST searchsp defaults¶
Motivation¶
Precise BLAST sharding must use the same effective search space on every shard
to match a full-database or NCBI Web BLAST-equivalent statistical model. The
core_nt calibration measured a full-database Statistics_eff-space of
32156241807668, but the submit UI and pre-flight path still required callers
to provide that value manually.
User-facing Change¶
- The BLAST database list now exposes verified Web BLAST-compatible search-space metadata for calibrated databases.
- Selecting
core_ntautomatically sendsdb_effective_search_space = 32156241807668in pre-flight and submit payloads. - The Algorithm Parameters panel shows the selected database's calibrated search space when one is available.
- Users can still override the automatic default by putting an explicit
-searchspvalue in Additional options. - Pre-flight now accepts the UI's
aks_cluster_namefield, matching the submit route normalization.
API / Task Diff Summary¶
- Added
api.services.web_blast_searchspas the single source of verified database defaults. - Enriched
api.services.storage_data.list_databasesrows withweb_blast_searchsp, scope, and evidence fields. - Updated
/api/blast/pre-flightand/api/blast/jobsnormalization to inject the verified default only when no explicit search-space override is present. - Added
searchspalias handling for older payloads, mapping it todb_effective_search_spaceat the HTTP boundary. - Wired the React submit page to include the database default in pre-flight and submit requests.
Validation Evidence¶
- Reset local Celery/Redis state by deleting/recreating
elb-dev-redis; queues were empty after restart. - Restarted local API, worker, beat, web, and terminal-exec under
.logs/local/20260517T035004Z-1039155/. curl http://127.0.0.1:8085/api/healthreturnedstatus: ok.curl http://127.0.0.1:8085/api/health/celeryshowed the worker registeredapi.tasks.blast.submitand queue lengths of0.POST /api/health/celery/enqueue-noop?message=searchsp-reset-checkcompleted with Celery stateSUCCESS.api.services.terminal_exec.run(["elastic-blast", "--version"])returnedelastic-blast 1.5.0.post63+e3e9f51.POST /api/blast/pre-flightwithaks_cluster_name,core_nt, precise sharding, and no explicitdb_effective_search_spacereturnedready: true.GET /api/blast/databasesreturnedcore_nt.web_blast_searchsp = 32156241807668.- Existing real AKS evidence:
blastn-batch-s00..s09-job-000-138d8383wereComplete; shard pod logs showed BLAST running with-searchsp 32156241807668; shard XML reportedStatistics_eff-space = 32156241807668. - Fresh real AKS evidence: dashboard job
d07a6a3a-208d-4606-87ff-33304bc7e7dd, ElasticBLAST jobjob-9c58c936101042ea996681485be97da5, and Kubernetes jobsblastn-batch-s00..s09-job-000-5be97da5all used the injected-searchsp 32156241807668. The shard jobs completed in17sto19son warmed local SSD nodes. - The latest shard job manifest used hostPath
/workspacemounted at/blast/blastdbwithsubPath: blast; the only init container wasimport-query-batches, which completed in about 2 seconds. No DB download init container ran on the warmed path. - Patched and live-reran
elb-finalizer-5be97da5after fixing the finalizer image and azcopy wildcard. The finalizer completed, downloaded 10 shard XML files, merged0hits from1query, uploadedmerged_results.out.gzandmerge-report.json, and wrotemetadata/SUCCESS.txt. The final rerun after XML DB-name normalization completed in46s. - Final evidence is in
docs/temp/core-nt-searchsp/fresh-2026-05-17/live-finalizer-5be97da5/.merged_vs_baseline_stats.jsonreportsall_result_statistics_match: trueagainst the full DB baseline:db_len=1041443571674,db_num=125619662,eff_space=32156241807668,hsp_len=33,hit_count=0, andhsp_count=0. The merged top-levelBlastOutput_dbis normalized tocore_nt; the XML is not byte-identical to the VM baseline because the baseline contains the local filesystem DB path. - Browser verification on
http://127.0.0.1:8090/blast/submitselectedcore_ntand showed the submit summary withSearchsp: 32156241807668. - Fresh smoke submit
b9a1c180-06a9-449d-b12f-aefc12ff42bcfor16S_ribosomal_RNAverified query upload, result metadata creation, and terminal-exec launch ofelastic-blast submit. It was cancelled before BLAST pod execution because PVCblast-dbs-pvc-rwmstayed Pending withstorageclass.storage.k8s.io "azureblob-nfs-premium" not found. Leftoverjob/init-pvandpvc/blast-dbs-pvc-rwmwere deleted after evidence capture. - The job detail page refreshed to
failed / submit_failedand displayed the persisted submit error summary instead of the empty fallback message. uv run pytest -q api/tests/test_blast_submit_route_options.py api/tests/test_smoke.py -q— passed.uv run pytest -q api/tests/test_blast_submit_route_options.py api/tests/test_smoke.py api/tests/test_storage_data.py api/tests/test_blast_config_sharding.py api/tests/test_sharded_merge.py— 94 passed.uv run ruff check api/services/web_blast_searchsp.py api/services/storage_data.py api/tests/test_blast_submit_route_options.py api/tests/test_smoke.py— passed.uv run pytest -q api/tests/test_sharded_merge.py— 3 passed.uv run ruff check api/services/web_blast_searchsp.py api/services/storage_data.py api/tests/test_sharded_merge.py terminal/patch_elastic_blast.py— passed.get_errorsforterminal/merge-sharded-results.sh,terminal/patch_elastic_blast.py,api/services/web_blast_searchsp.py, andapi/tests/test_sharded_merge.py— no errors found.cd web && npm run test -- src/pages/blastSubmit/taxonomyFilter.test.ts— passed.cd web && npm run build— passed with the existing Vite chunk-size warning.
Notes¶
Only core_nt has a verified default in this change. Other databases remain
unset until their own repeated Web BLAST/ElasticBLAST evidence is captured.