Web BLAST 10-Example Equivalence Harness¶
Motivation¶
The bundled BLAST submit examples were expected to match the 10 FASTA benchmark files used for NCBI Web BLAST vs sharded ElasticBLAST equivalence checks. The UI only shipped 9 examples, and the existing EQ14 strict Web XML oracle runner was hard-coded to one MPXV FASTA.
User-Facing Change¶
- The BLAST submit examples now include all 10 benchmark FASTA files, including
SARS-CoV-2_orf1ab_NC_045512.2.fasta. - The bundled FASTA text now byte-for-byte matches the sibling benchmark source set.
- Query example tests now enforce the exact 10-file source set and verify declared sequence lengths from the FASTA body.
- The New Search page now submits unsharded databases as unsharded even if stale database metadata reports
sharded: truewith only a single shard set. - The BLAST job detail page now shows result files as soon as output blobs exist, even if the stored dashboard phase is still
submitted.
API / IaC Diff Summary¶
- No production IaC surface changed.
scripts/dev/eq14-core-nt-webxml-sharded.shis now parameterized by query ConfigMap key, taxonomy filter, Entrez query, Web database, and candidate-pool size.- Added
scripts/dev/eq15-core-nt-webxml-example-suite.shto prepare the 10-query AKS runner ConfigMap and launch strict-oracle jobs one example at a time. - The EQ14/EQ15 sharded validation path is explicitly
core_nt-only. It refuses non-core_ntdatabases so unsharded databases are never accidentally run through the 10-shard layout. - Backend submit config generation now suppresses stale sharding options when Storage metadata says the selected database has no prepared shard layout.
- The BLAST submit UI now applies the same prepared-shard check to preflight and submit payloads. If a restored draft still says
precisebut the selected database has noshard_sets, the browser sendssharding_mode=off,db_auto_partition=false, and noshard_sets. scripts/dev/compare-blast-web-xml-outfmt6.pyJSON reports now include first missing accession samples forweb_onlyandcandidate_onlycases.scripts/dev/aks-equivalence-runner.sh job-downnow removes the generated*-user-scriptConfigMap as well as the main script ConfigMap.scripts/dev/docker-compose.full.ymlnow passes the Azurite table/blob endpoints to the API, worker, and beat sidecars, and mounts the host Azure CLI cache into the terminal sidecar for local exec auth.terminal/exec_server.pynow forwards an explicit child process Azure auth environment soazcopy,kubectl, andelastic-blastsubprocesses do not depend on interactive shell profile state.terminal/patch_elastic_blast.pynow patches the upstream AKS templates for the local-SSD path withworkload=blasttolerations/node selectors, unique init-SSD job names per ElasticBLAST job suffix, and init wait selectors filtered byelb-job-id.- The BLAST results state loader now falls back to Azure settings stored in the job payload when the result URL has no query string.
Validation Evidence¶
cd web && npm run test -- src/pages/blastSubmit/queryExamples.test.ts— passed, 2 tests.cd web && npm run build— passed. Vite emitted the existing large chunk warning.uv run pytest -q api/tests/test_compare_blast_web_xml_outfmt6.py— passed, 6 tests.uv run pytest -q api/tests/test_blast_tasks.py api/tests/test_compare_blast_web_xml_outfmt6.py— passed, 73 tests.uv run ruff check scripts/dev/compare-blast-web-xml-outfmt6.py api/tests/test_compare_blast_web_xml_outfmt6.py— passed.cd web && npm run test -- src/pages/blastSubmit/taxonomyFilter.test.ts src/pages/blastSubmit/shardingAvailability.test.ts— passed, 16 tests.bash -n scripts/dev/aks-equivalence-runner.sh scripts/dev/eq14-core-nt-webxml-sharded.sh scripts/dev/eq15-core-nt-webxml-example-suite.sh— passed.EQ14_DB_NAME=18S_fungal_sequences bash scripts/dev/eq14-core-nt-webxml-sharded.sh dummy.fa— rejected with exit code 2 before workspace side effects.scripts/dev/eq15-core-nt-webxml-example-suite.sh prepare— createdelb-equivalence/eq14-core-nt-webxml-toolswith 10 benchmark FASTA files.- Live AKS/Web BLAST strict-oracle run for
mpxv-f3l-nc-003310: - First run with
MAX_TARGET_SEQS=5000: strict oracle produced 498/500 shared accessions, no value mismatches. - Second run with
MAX_TARGET_SEQS=50000: still 498/500 shared accessions, no value mismatches. - Missing Web accessions were stable across runs:
PZ346335.1,PZ346334.1. blastdbcmdprobe across all 10 warmedcore_nt_shard_00..09databases skipped both accessions on every shard, proving the remaining mismatch is DB snapshot drift rather than merge/order/candidate cutoff logic.- Local six-sidecar browser validation on
http://127.0.0.1:18080submitted all 10 New Search examples from the browser and confirmed.out.gzplusSUCCESS.txtartifacts for every job:
| Example | Job ID | Database | Aggregate |
|---|---|---|---|
P. falciparum 18S - chr1 |
294e4b99-0351-4ab3-855f-839af613362a |
18S_fungal_sequences |
ok, 299 hits |
MPXV F3L - NC_003310.1 |
7c9f1547-1d41-492d-a099-fdc6a1206e82 |
elb_compare_tiny |
no_hits, 0 hits |
MPXV F3L - NC_063383.1 |
bdf595c1-6569-49ce-a8e8-785fa5aa177e |
elb_compare_tiny |
no_hits, 0 hits |
P. falciparum 18S - chr5 |
22a1731e-cf4f-4211-bb84-b2039fbdfc1d |
18S_fungal_sequences |
ok, 344 hits |
P. falciparum 18S - chr7 |
de30dbbb-0bbb-4602-a9f5-f69ea361b0a7 |
18S_fungal_sequences |
ok, 344 hits |
P. falciparum 18S - chr13 |
35a39944-92e1-40e6-9bbf-5a6a39c30322 |
18S_fungal_sequences |
ok, 297 hits |
P. falciparum 18S - chr11 |
cdd565ca-6850-4949-80df-150b93cafd28 |
18S_fungal_sequences |
ok, 297 hits |
SARS-CoV-2 N gene |
bcad1cd8-761b-44bf-a608-c88d22bfaacb |
elb_compare_tiny |
no_hits, 0 hits |
SARS-CoV-2 RdRP |
9d8e19f7-5ead-418f-ba99-01a674e70ad8 |
elb_compare_tiny |
no_hits, 0 hits |
SARS-CoV-2 ORF1ab |
495ef1d3-7690-483f-a05f-0b726b6db8f9 |
elb_compare_tiny |
no_hits, 0 hits |
- Browser evidence: the BLAST results page for
22a1731e-cf4f-4211-bb84-b2039fbdfc1drendered the result table while job state still readsubmitted, includingbatch_000-blastn-18S_fungal_sequences.out.gzandSUCCESS.txt. - AKS evidence: repeated browser-suite submits created unique
init-ssd-<suffix>-N,submit-jobs-<suffix>,elb-finalizer-<suffix>, andblastn-batch-...-<suffix>jobs without name collisions or stale init selector failures.
Follow-Up Required¶
Exact 500/500 equivalence for the current NCBI Web BLAST core_nt result requires refreshing the workload core_nt database snapshot from NCBI, regenerating shard manifests, and warming the 10-shard AKS node cache before rerunning the 10-example suite.
The MPXV and SARS bundled UI examples currently target the tiny local comparison database, so their browser-suite validation proves pipeline completion and artifact rendering rather than biological hits. Viral hit validation should be repeated against refreshed core_nt after the database snapshot refresh.