Web BLAST equivalence contract¶
Motivation¶
The control plane is intended to replace NCBI Web BLAST for supported Azure workloads while running much faster on warmed AKS node-local shards. That claim needs a stricter default than approximate sharding: every Web-compatible run must use prepared shards plus full-database search-space correction, and comparator evidence must separate true mismatches from top-N tie-boundary diagnostics.
User-facing change¶
- Warmed, prepared databases now prefer the
Web-equivalent shardmode instead ofFast shardon the BLAST submit page. Fast shardremains available as an explicit throughput probe, but its copy no longer implies full Web BLAST equivalence.- The Web CSV comparison helper now reports
tie_window_equivalentand can exit successfully with--accept-tie-windowwhen strict order fails only inside a shared top-N score class. Strict equality remains the final pass criterion for Web-equivalence claims.
API/IaC diff summary¶
- No IaC changes.
- Frontend sharding availability now defaults eligible warmed DBs to
preciseand labels that mode asWeb-equivalent shard. scripts/dev/compare-blast-web-csv.pynow mirrors the Web XML/outfmt6 comparator's tie-window diagnostic shape for CSV evidence.terminal/merge-sharded-results.shnow recordstie_cutoff_overflow_countandtie_cutoff_querieswhenmax_target_seqstruncates a tied score class.terminal/merge-sharded-results.shnow acceptsELB_TIE_ORDER_FILE; when a same-snapshot accession order oracle is supplied, tied score classes are ordered by oracle rank before fallback ordinal.ELB_TIE_ORDER_STRICT=1also excludes non-oracle hits before truncation when the oracle defines top-N membership.terminal/patch_elastic_blast.pynow patches the finalizer to download${ELB_RESULTS}/${ELB_METADATA_DIR}/tie-order-oracle.txt, export it asELB_TIE_ORDER_FILE, and enable strict oracle mode before merging./api/blast/submitoptions now accepttie_order_oracle_textortie_order_oracle_accessions; the worker uploads that data toresults/<job>/metadata/tie-order-oracle.txtfor the finalizer.scripts/dev/infer-blast-tie-order.pyrecords offline tie-order inference attempts and scores synthetic order keys against Web evidence.docs/blast-searchsp-discovery.mdnow includes the runtime equivalence contract: warmed prepared shards,sharding_mode=precise, verified full-DB-searchsp, merge-supported output, and comparator evidence.
Validation evidence¶
uv run pytest -q api/tests/test_compare_blast_web_csv.py api/tests/test_blast_config_sharding.py api/tests/test_blast_tasks.py→ 109 passed.uv run pytest -q api/tests/test_blast_tasks.py::test_upload_tie_order_oracle_writes_finalizer_metadata api/tests/test_blast_tasks.py::test_upload_tie_order_oracle_rejects_oversized_payload api/tests/test_sharded_merge.py api/tests/test_compare_blast_web_csv.py→ 10 passed.uv run pytest -q api/tests→ 593 passed.cd web && npm run test -- shardingAvailability→ 4 passed.cd web && npm run build→ passed; Vite emitted the existing large chunk warning.- Runtime observation: local compose sidecars are healthy; dashboard/API show
elb-clusterRunning inrg-elb-01;core_ntwarmup is Ready on 10/10 shards; workerreconcile_auto_warmupreturnsalready_ready. /api/blast/pre-flightwith an existing precise shardedcore_ntpayload returnsready: true,critical_blockers: 0, andsharding_precision.precision_level: precise_single_query.- Comparator rerun: no-hit
core_ntcalibration remains strictly equivalent (canonical-compare.jsonreportsequivalent: true,difference_count: 0). Current F3L positive-hit Web XML/CSV evidence remains non-equivalent to the final sharded top-500 candidate. A wide-pool XML comparison reportsshared_accessions: 500,web_only: 0,value_mismatch_count: 0, andtie_window_equivalent: true, confirming the next work item is top-N tie/order merge optimization rather than candidate generation. - Merge diagnostic rerun on the wide F3L candidate pool reports
total_input_hits: 11261,tie_break_count: 11085, andtie_cutoff_overflow_count: 8620; the cutoff score class has 9,120 tied hits and only 500 can be emitted. - Tie-order inference evaluated 249 synthetic keys; the best key still reached only
top500_overlap: 33, so local metadata does not justify a fabricated production tie-breaker. - Oracle proof: using the Web top-500 accession list as
ELB_TIE_ORDER_FILEwith strict mode against the same wide pool yields strict comparator success:equivalent: true,exact_order: true,shared_accessions: 500,web_only: 0,candidate_only: 0,value_mismatch_count: 0, andtie_cutoff_overflow_count: 0. - 16S same-snapshot proof: using the local full-run XML accession order as a strict oracle against contiguous sharded XML artifacts yields exact 500/500 accession order; remaining canonical XML differences are
Hit_idGI-prefix and fiveHit_defprovenance differences from synthetic FASTA shard DB regeneration. uv run ruff check api/tasks/blast.py api/routes/stubs.py api/tests/test_blast_tasks.py terminal/patch_elastic_blast.py scripts/dev/infer-blast-tie-order.py scripts/dev/compare-blast-web-csv.py api/tests/test_compare_blast_web_csv.py api/tests/test_sharded_merge.py→ passed.