Parallelize split-child report + artifact downloads¶
Motivation¶
Two split-merge code paths were strictly sequential:
_load_split_child_merge_reports— 1 HTTPS RTT per child to fetch the tinymerge-report.json. A 100-shard split paid 100 sequential round trips before the parent merge could even start aggregating._verify_split_child_result_artifacts— 1_result_blob_mapcall per child (which itself does alist_blobs(prefix=child/)). Same N×RTT shape blocking the finalize task.
Total: a 100-shard parent merge wasted ~200 sequential HTTPS round trips before any productive work happened.
User-facing change¶
None semantically — same dicts returned in the same input order. Latency on the parent merge step drops to roughly the slowest single child report read + slowest single list call (×4 concurrency bucket).
API / IaC diff¶
api/tasks/blast/split_pipeline.py_load_split_child_merge_reportsnow fans out viaThreadPoolExecutor(max_workers=min(4, len(children)))and usespool.mapso the returned list keeps input order. Concurrency cap matches the existingstream_blob_bytes4-permit budget so we do not exceed the BlobServiceClient pool._verify_split_child_result_artifactskeeps the upfront "not-completed → ValueError" validation sequential, then probes every child's blob map in parallel through the same fan-out shape. Missing-artifact aggregation stays sequential since it only walks the already-materialized status dicts.
Validation¶
uv run pytest -q api/tests/test_blast_tasks.py— 120 passed (XML + tabular merge + verify_split_child_result_artifacts cases unchanged).uv run ruff check api/tasks/blast/split_pipeline.py— clean.