BLAST submit: surface the full NCBI standard nucleotide catalogue in the database dropdown¶
Motivation¶
A user comparing our submit page with NCBI Web BLAST asked whether we
could match NCBI's behaviour of listing the full Standard databases
reference catalogue inside the dropdown (core_nt, nt, refseq_*,
wgs, est, sra, tsa, tls, htgs, pat, RefSeq_Gene, gss,
dbsts, pdb, …) rather than only the databases that happen to be in
our blast-db storage container.
Today we only show what /api/blast/databases returns, so the operator
cannot discover that, e.g., refseq_rna is a thing NCBI exposes.
User-facing change¶
- The Standard / rRNA·ITS / Genomic+transcript tabs now show the full NCBI reference catalogue alongside what is actually downloaded.
- The tab badge reads
"<downloaded> /+<catalogue>"— e.g. the Standard tab on a fresh cluster shows2 /+11, meaning the operator has two databases in storage and eleven more they could choose to add. The+Nportion is grey so the eye still anchors on the downloaded count. - The dropdown now uses two
<optgroup>s: In storage (ready) lists the immediately submittable databases (unchanged behaviour); Available from NCBI (not downloaded yet) lists the catalogue gap as disabled options so the operator can read the canonical NCBI labels, sizes and short descriptions without being able to submit against a database that is not yet downloaded. - A small "Add one from the Dashboard Storage card" hint sits below the
dropdown whenever the catalogue gap is non-empty, linking to
/(the Storage card already drives the actual download). - The categorisation regex has been tightened so
refseq_*databases now surface under Standard databases (matching NCBI), and the newwgs/tsa/est/refseq_genomes/refseq_reference_genomesentries surface under Genomic + transcript.
No backend or infra changes — the catalogue is the existing
DB_CATALOG in web/src/components/cards/storageDbCatalog.ts,
extended with NCBI's standard nucleotide list.
Diff summary¶
web/src/components/cards/storageDbCatalog.ts- +14 catalogue entries:
refseq_select,refseq_rna,refseq_reference_genomes,refseq_genomes,wgs,est,sra,tsa,tls,htgs,pat,RefSeq_Gene,gss,dbsts,pdb,28S_fungal_sequences. Sizes are NCBI's 2026 published figures, rounded conservatively (they only drive the download UX). web/src/pages/blastSubmit/DatabaseSection.tsxdatabaseCategoryByName(name, source)extracted so it can classify both downloadedBlastDatabaserecords and catalogue strings.notDownloadedCatalogue(category, downloaded, programType)filtersDB_CATALOGby category + nucleotide/protein type and drops anything already in storage.- Per-tab badge now shows downloaded + faded catalogue count, with a
descriptive
titletooltip. - Dropdown re-rendered with
<optgroup>s for "in storage" and "available from NCBI" (the latter disabled). - Helper hint with a
Downloadicon links to/when the catalogue gap is non-empty. web/src/theme/glass.css.blast-search-set-tab__hint(faded+Ncount beside the badge)..blast-db-add-hint(dashed accent-tinted helper banner under the dropdown).
Validation¶
cd web && npx tsc --noEmit→ clean.cd web && npx eslint --max-warnings 0 src/pages/blastSubmit/DatabaseSection.tsx src/components/cards/storageDbCatalog.ts→ clean.cd web && npm run build→ built in ~8 s.cd web && npx vitest run→ 19 files, 185 tests passed.- Manual (Playwright + screenshot): Standard tab now shows
2 /+11, the dropdown listscore_ntandelb_compare_tinyunder "In storage" and eleven NCBI databases under "Available from NCBI" (each suffixed with— Not downloaded). Helper hint reads "11 more NCBI nucleotide databases are listed above as reference. Add one from the Dashboard Storage card to make it submittable."
Out of scope¶
- Auto-triggering downloads from the submit page. The Storage card already owns the download flow and the catalogue size estimates here intentionally route the operator through that flow.
- Protein-side catalogue beyond
pdb(the NCBI screenshot the user referenced was the nucleotide tab).