BLAST results / submit form — defensive hardening pass¶
Motivation¶
After landing the BLAST results aggregate / alignments / export endpoints and the BLAST-submit Duplicate / Export config flow, a critical-review sweep surfaced eight load-bearing issues that would either lie to the researcher, allow an authenticated user to fish in unrelated jobs / storage accounts, or silently corrupt the form state on re-hydration.
This change closes all eight. Every fix carries a regression test.
User-facing changes¶
- Results analytics no longer pretend "no hits found" when every result
blob read failed (RBAC missing, storage outage, etc.). The endpoint now
returns
status: "degraded"withdegraded_reason: "all_reads_failed"and aread_failurescounter so the SPA can surface a real "storage unreachable" toast. - Result CSV/TSV export returns 503 instead of a misleading header-only file when every blob read fails.
- Duplicate-job hydration is stricter: a forged or stale config snapshot
that carries
program: "rm -rf /",sharding_mode: "wormhole", or a wrong-typed field (string instead of boolean, etc.) is silently dropped for that field — the rest of the form still loads. Previously these poisoned values would land inFormStateand only surface at submit time as a validation error from the backend. isNucleotideProgramin the BLAST-submit query section no longer classifiestblastnas nucleotide. tblastn's query is protein, so the reverse-complement / GC% UI is now correctly hidden for that program.SeqStatssplitsnCountfromambiguous. The ambiguous-IUPAC warning chip in QuerySection now fires only on R/Y/S/W/K/M/B/D/H/V (matchinghasAmbiguousBases), and the FASTA stats display can show N separately if the UI wants to in future.
Security-facing changes¶
_validate_result_blob_namerejects..,?,#,\\,%2e,%2f, emptyjob_id,job_idcontaining/or.., and a leading slash in the remainder after the prefix. Previously only the prefix and bare..were checked._blob_servicevalidates the storage account name against^[a-z0-9]{3,24}$before constructing the URL. Stops a forged querystring from redirecting the api sidecar's MI to an attacker-controlled URL (e.g.storage_account=victim.blob.core.windows.net)._ensure_job_read_allowedfails closed on Table Storage lookup failure whenAUTH_DEV_BYPASSis not set. Previously a Storage outage would let any authenticated user read any job. Dev bypass keeps the fail-open behaviour because the synthetic identity has no real OID.
API / IaC diff summary¶
api/services/storage_data.py:- Added
_STORAGE_ACCOUNT_NAME_RE(^[a-z0-9]{3,24}$) and a check in_blob_servicethat raisesValueErrorbefore the URL is built. api/routes/stubs.py:_validate_result_blob_name: expanded checks (see above)._ensure_job_read_allowed: fail-closed unless dev bypass.blast_job_results_aggregate: returnsstatus: "degraded"withdegraded_reason: "all_reads_failed"when every blob read fails; surfacesread_failures+truncatedon every response; catches an unexpected exception inaggregate_blast_hitsand degrades cleanly.blast_job_results_export: raises 503{"code": "all_reads_failed"}when every blob read fails.web/src/pages/blastSubmit/configSerializer.ts:normaliseFormFields: per-field type validation.program/sharding_modemust match enum; numeric fields require finite numbers; boolean fields require booleans; everything else requires strings. Mismatches are dropped for that field only.partialFormFromJobPayload: validatesprogramagainst the enum.web/src/pages/blastSubmit/fastaUtils.ts:SeqStatsaddsnCountand narrowsambiguousto IUPAC-beyond-N.baseCompositionupdates accordingly.web/src/pages/blastSubmit/QuerySection.tsx:isNucleotideProgramremovestblastn; adds a comment explaining why (protein query for tblastn means reverse-complement / GC% don't apply).
No infra changes. No new dependencies.
Validation evidence¶
uv run pytest -q api/tests
...
581 passed in 21.90s
cd web && npm test
Test Files 14 passed (14)
Tests 131 passed (131) # configSerializer adds 4 new tests
# storage_data adds 10 new tests
# results_routes adds 4 new tests
# fastaUtils assertion updated
cd web && npm run build
✓ built in 4.65s
New tests added:
api/tests/test_blast_results_routes.py:test_alignments_rejects_backslash_traversaltest_alignments_rejects_url_encoded_traversaltest_aggregate_degraded_when_all_reads_failtest_export_degraded_when_all_reads_failapi/tests/test_storage_data.py:test_blob_service_rejects_invalid_account_names(parametrised, 9 cases)test_blob_service_accepts_valid_account_namesweb/src/pages/blastSubmit/configSerializer.test.ts:drops invalid program values (defence in depth)drops invalid sharding_mode valuesdrops booleans where strings are expected and vice versarejects unrecognised program values in payloadweb/src/pages/blastSubmit/fastaUtils.test.ts:- Updated
counts ambiguous bases separately from Nassertion.
Out of scope¶
- Surfacing
degraded/truncatedin the BLAST analytics panel UI — the backend now sends the fields; a follow-up frontend change should render them as warning chips. Tracked for the next pass. - The fail-closed change in
_ensure_job_read_allowedis only exercised by the existing tests because they all run underAUTH_DEV_BYPASS=true. A dedicated test that bypasses the dev flag would require a fixture for the full MSAL caller — deferred.