Taxonomy tab — organism fallback + Blast Name column (NCBI parity)¶
Motivation¶
The Recent searches → Taxonomy tab rendered 100% of hits as
"Unclassified" because the default BLAST outputs we receive
(-outfmt 6 12-std and -outfmt 5 legacy XML) do not include
sscinames / staxids. NCBI's BLAST UI fills the same screen with
"Organism / Blast Name / Score / Number of Hits" by resolving the
subject title to NCBI Taxonomy server-side.
This is Phase 1 + Phase 2 of the three-phase NCBI-parity plan
discussed with the user (Phase 3 — changing the default outfmt to
carry sscinames staxids — is deferred pending an explicit go-ahead
because it touches the sharded-merge engine and submit pipeline).
User-facing change¶
Taxonomy → Organismnow lists actual scientific names instead of a single "Unclassified" bucket. Example: a Monkeypox virus search on core_nt now renders one "Monkeypox virus / viruses / 100 hits" row, matching NCBI's BLAST UI.- New Blast Name column shows the NCBI group derived from the
lineage chain (
viruses,bacteria,mammals,plants,fungi,eukaryotes, …). - A faint
~marker appears after organism names that came from the stitle heuristic (not from the BLAST output'ssscinames). Hovering the taxid link surfaces a tooltip when the taxid was resolved via NCBI eutils by organism name. - The Organism sub-tab now always requests lineage enrichment so the Blast Name column is populated by default. Per-taxid eutils calls remain cached and capped at top-20 organisms.
API / IaC diff summary¶
api/services/blast_result_analytics.py- New helper
extract_organism_from_stitle()cuts the subject title at NCBI-style stopwords (isolate,strain,chromosome, …) and strips curator prefixes (PREDICTED:,TPA:,MAG:, …). rollup_taxonomy()falls back to that helper whensscinames/staxidsare absent, emittingorganism_source: "stitle"so the UI can flag best-effort rows.enrich_taxonomy_with_lineage()resolves organism→taxid viataxonomy.search_taxonomy()when the row lacks a taxid, then derivesblast_namefrom the lineage chain. New meta keyname_resolvedcounts how many rows used the lookup path.api/routes/blast/result_analytics.py— initiallineage_metashape carries the newname_resolvedkey even wheninclude_lineage=false.web/src/api/blast.ts—BlastTaxonomyRowextended withblast_name,organism_source,taxid_source.web/src/pages/blastResults/analytics/TaxonomyPanel.tsx— always passesinclude_lineage: true; OrganismTable gains a Blast Name column and surfaces the new heuristic / name-lookup hints.- No infra / Bicep changes.
Validation¶
uv run pytest -q api/tests→ 977 passed (16 new tests inapi/tests/test_blast_result_analytics_organism.pycover the stitle heuristic, sscinames precedence, blast_name derivation, and name-lookup tolerance for missing/blank results).uv run ruff check api/services/blast_result_analytics.py api/routes/blast/result_analytics.py api/tests/test_blast_result_analytics_organism.py→ clean.- Frontend type changes verified against the existing TaxonomyPanel
usage (TS build error in
ExecutionStepsCard.tsxis pre-existing and unrelated; confirmed by stashing the taxonomy edits and re-runningnpm run build).
Limitations / next steps¶
- Heuristic is best-effort — exotic stitle formats may still bucket as Unclassified (preferred over mislabel).
- Name → taxid resolution adds one eutils call per distinct organism (cached). For result sets with many distinct species this adds a few hundred ms on the first Taxonomy open.
- Phase 3 (changing BLAST submit default to emit
sscinames staxidsnatively) is deferred. It would remove the heuristic + eutils path on future jobs but requires updating the sharded-merge script, the parser's column handling for outfmt 6 without# Fields:headers, and a number of submit pipeline tests.
Follow-up — Descriptions table column trim (NCBI parity)¶
- Removed the
QueryandShardcolumns from the Descriptions table. The query is already selectable in the filter bar and visible in the Alignments tab, so it does not need a column; the source shard is an internal artefact (merged_results.out.gzfor every row in a sharded run) that added no diagnostic value. - Renamed
Organism→Scientific Nameand moved it to sit immediately to the right ofDescription, matching NCBI's BLAST UI column order ("Description / Scientific Name / Max Score / …"). - Each row now falls back to the new
organismFromStitlehelper when the BLAST output lackssscinames/staxids. The helper mirrors the backendextract_organism_from_stitleheuristic so the table shows "Monkeypox virus" instead of "—" for every row in a typical core_nt search. Locked in byweb/src/pages/blastResults/analytics/helpers.test.ts(11 cases, same fixture as the backend test).
Follow-up — Scientific Name detail modal¶
The Descriptions table now opens a read-only NCBI Taxonomy detail
modal when the user clicks a Scientific Name cell. The same eutils
endpoints already powering the BLAST submit Taxonomy picker
(/api/blast/taxonomy/{search, detail, image, tree}) are reused via
the existing 24 h server cache plus a 24 h React Query staleTime, so
re-opening the same taxon costs zero extra NCBI calls.
- New component:
web/src/components/taxonomy/TaxonomyDetailModal.tsx. Centred glass-card modal with a Wikipedia thumbnail (falling back toTaxonomyDefaultIconwhen no image is available), a key/value facts panel (taxid + NCBI Browser icon link, division, parent, ≤ 5 synonyms, last updated), and the reusable<LineageTree>for the ancestry chain. Focus trap, Escape-to-close, andaria-labelledbyare wired throughuseFocusTrap. - Trigger:
web/src/pages/blastResults/analytics/BlastHitsTable.tsxScientific Name cell is now a<ScientificNameCell>button that passes{ name, taxid, source }to the modal. When the row already carries a numericstaxids, the newparseLeadingTaxidhelper hands it straight togetTaxonomyDetailso the modal skips the name lookup. When the name was parsed heuristically fromstitle(source: "stitle"), the modal surfaces a hint on a "not found" result so the user knows the resolution was approximate. - Cleanup: removed an orphaned
<th>Shard</th>left over from the earlier column trim — the table header now matches the body cells. - Visual polish: separated the read-only modal's image styles from
the broader Taxonomy picker image column so the thumbnail can no
longer overlap the facts area, replaced the cramped definition list
with two-column fact cards, hid noisy
no rankbadges, shortened NCBI timestamps to dates, tuned the compact lineage tree scale, and made lineage nodes open their NCBI Taxonomy Browser records in a new tab. - No backend changes. All four taxonomy routes were already real and cached; only the SPA learned a new entry point into them.
Validation¶
cd web && npm test— 29 files / 256 tests pass (11 newparseLeadingTaxidcases inweb/src/pages/blastResults/analytics/helpers.test.ts).cd web && npm run build— clean Vite build, no TypeScript errors (existing chunk-size warning only).npx eslinton the touched frontend files — clean.