2026-05-22 — Documentation SEO and GEO baseline¶
Motivation¶
The MkDocs site at https://dotnetpower.github.io/elb-dashboard/ was missing the basics that classic search engines and the newer generative-engine crawlers (ChatGPT, Claude, Perplexity, Google AI overviews) expect:
- No per-page
<meta name="description">beyond a single site-wide fallback, so search snippets and shared links all looked identical. - No Open Graph / Twitter Card metadata, so previews on chat tools and social platforms degraded to bare URL strings.
- No JSON-LD structured data, so neither classic search engines nor LLM crawlers had a machine-readable summary of what the site documents.
- No
robots.txt, so AI crawlers (GPTBot,ClaudeBot,PerplexityBot,Google-Extended, …) had no explicit allow rule. - No
llms.txtorllms-full.txt, the emerging llmstxt.org standard for letting LLM crawlers ingest a project in one fetch.
User-facing change¶
- Search snippets, social previews, and AI-generated answers now use the proper title and a page-specific description for every published page.
- The home page exposes a structured Key Facts summary and a
Frequently Asked Questions section, both also emitted as JSON-LD
(
WebSite,Organization,SoftwareApplication,FAQPage). - Inner pages emit
TechArticleJSON-LD with the per-page description and canonical URL. https://dotnetpower.github.io/elb-dashboard/robots.txtexplicitly allows AI crawlers and points at the sitemap.https://dotnetpower.github.io/elb-dashboard/llms.txtprovides a curated documentation map for LLM crawlers;llms-full.txtconcatenates every navigable page into a single ~7,000-line corpus.
Implementation summary¶
| Area | Change |
|---|---|
mkdocs.yml |
Richer site_description, site_author, copyright, theme.custom_dir: docs/overrides, favicon, extra features (navigation.indexes, search.share, content.action.edit, content.tooltips), extra.og defaults, extra.llms paths, plugins: search, hooks: scripts/docs/llms_hooks.py. |
docs/overrides/main.html |
Extends Material's base.html; overrides {% block extrahead %} to inject Open Graph + Twitter Card + theme-color + a JSON-LD @graph on the home page and TechArticle JSON-LD on every other page. Reads optional per-page jsonld: frontmatter for FAQ / HowTo extras. |
docs/robots.txt |
Explicit Allow: / for every major AI crawler plus the sitemap reference. |
docs/llms.txt |
llmstxt.org-style site map: quote-block summary, core concepts, curated link list. |
scripts/docs/llms_hooks.py |
on_post_build hook concatenates every navigable markdown page into site/llms-full.txt (skips features_change/**, temp/**, overrides/**). |
docs/*.md (top-level) |
Added YAML frontmatter title: / description: to 11 pages plus the home index.md and changelog.md. |
docs/index.md |
New Key Facts admonition and Frequently Asked Questions section; jsonld: frontmatter feeds a FAQPage block. |
No source code under api/, web/, infra/, or terminal/ was
touched. No new runtime dependencies — the build hook uses only the
standard library and the existing mkdocs-material / pymdown-extensions
stack.
Validation evidence¶
$ uv run ruff check scripts/docs/llms_hooks.py
All checks passed!
$ uv run mkdocs build --strict
INFO - Documentation built in 4.52 seconds
Spot-checked against the generated site/:
| Check | Result |
|---|---|
site/robots.txt exists, lists GPTBot/ClaudeBot/Perplexity/Google-Extended |
✅ |
site/llms.txt exists, llmstxt.org format |
✅ |
site/llms-full.txt exists, 7,116 lines |
✅ |
site/sitemap.xml has 391 <loc> entries |
✅ |
site/index.html has 2 JSON-LD blocks: @graph[WebSite, Organization, SoftwareApplication] + FAQPage (6 Q&A) |
✅ |
site/auth/index.html, site/high-level-architecture/index.html, site/user-guide/index.html emit TechArticle JSON-LD |
✅ |
Every page carries page-specific <meta name="description">, OG, and Twitter Card tags |
✅ |
<meta name="robots" content="index,follow,max-image-preview:large,max-snippet:-1"> present |
✅ |
Out of scope¶
- mkdocs-material
socialplugin (auto-generated PNG social cards) is available but needscairosvg+Pillow+ systemlibcairo; not added to keep the build pipeline portable. The static logo OG image works as a sensible fallback. - No analytics, no third-party tag managers.
- Tag-based navigation pages (mkdocs-material
tagsplugin) deferred — notags:frontmatter yet to justify a tag index.
Update — auto-generated social cards enabled¶
Reversed the "out of scope" decision above. libcairo2, libgdk-pixbuf,
and fonts-dejavu are already installed on the supported build hosts
(Ubuntu / Debian / GitHub-hosted runners), so the pure-Python additions
(cairosvg, Pillow) are enough to enable the plugin.
pyproject.toml[dependency-groups].devaddscairosvg>=2.7,<3andpillow>=10.0,<12.mkdocs.ymlenables the social plugin with a brand-aligned card layout (background_color: "#1f1d3a",color: "#f5f4ff", Roboto font).docs/overrides/main.htmldrops itsog:type/og:title/og:description/og:url/og:image/twitter:card/twitter:title/twitter:description/twitter:imageblocks — the plugin now emits all of them, and duplicating them caused the SVG logo URL to shadow the new PNG card on consumers that honour the first occurrence (Slack, Discord). The override now only emitsog:site_name,og:locale,meta name="description",meta name="robots",theme-color,application-name, and the JSON-LD blocks.- Long SEO titles (e.g. "ElasticBLAST Control Plane — Browser-only BLAST
on Azure") were truncating in the 1200×630 layout. Affected pages now
carry a short card title under
social.cards_layout_options.title/descriptionin their frontmatter (the plugin mergespage.meta.socialwith the site-levelcards_layout_options, per its_config()implementation).
Validation:
$ uv run mkdocs build --strict
INFO - Documentation built in 27.09 seconds # cold (first card render)
INFO - Documentation built in 6.13 seconds # warm (cache hit)
$ find site/assets/images/social -name '*.png' | wc -l
392 # one card per published page
$ du -sh site/assets/images/social
23M
$ for f in site/index.html site/auth/index.html site/get-started/index.html ...
1 1 # exactly one og:title and one og:image per page (no duplicates)
Cold builds take ~30 s; warm builds (.cache/plugin/social/) stay near
the original ~5 s. The cache directory is already gitignored.
Update — BreadcrumbList, HowTo, full description coverage, TL;DR, self-hosted Mermaid¶
Second pass to close the gaps called out in the post-baseline review.
- BreadcrumbList JSON-LD on every inner page
(
docs/overrides/main.html). Walkspage.ancestors(deepest-first → reversed) and emits Home → … → page. Section nodes without aurlare still emitted asListItem.namewithoutitem, which schema.org permits. Verified shapes: Home → User Guide → DashboardHome → Agent Reference → Repo LayoutHome → Releases → v0.1.0Home → Auth(top-level pages, no intermediate section)- HowTo JSON-LD on
docs/get-started.md— sixHowToStepentries matching the page's##sections, withtool,supply,totalTime, and per-stepurlanchors. Google rich-result eligible. - Description frontmatter filled in on 19 pages that previously fell
back to the site-level description: every
docs/copilot/*.md, everydocs/user-guide/*.md, anddocs/releases/*.md. Coverage check across 27 published pages shows 0 fallbacks. - TL;DR callouts (
!!! tip "TL;DR") added near the top ofdocs/index.md's neighbours: Get Started, High Level Architecture, Auth, Deployment Reference, Container Apps Migration, and User Guide. Improves both human skim-reading and the chance that an LLM crawler cites the page accurately. - Self-hosted Mermaid: dropped the
unpkg.com/mermaid@10.9.1CDN reference inmkdocs.ymlextra_javascriptand replaced it withdocs/javascripts/vendor/mermaid.min.js(3.2 MB pinned to 10.9.1). Thepreconnecthint indocs/overrides/main.htmlwas removed in the same change. Sitegrep -r unpkg site/only matches Material's own bundled JS now (out of our control). - JSON-LD hardening: every author-supplied string in the JSON-LD
blocks now goes through Jinja's
| tojsonfilter so embedded double-quotes, backslashes, or control characters can't break the surrounding JSON. Discovered by a page whose description contained the literal phrase"where to edit", which produced an unparseableTechArticleblock on the previous pass.
Validation:
$ uv run mkdocs build --strict
INFO - Documentation built in 6.32 seconds
# All 7 sampled pages parse cleanly, 0 invalid JSON blocks:
$ uv run python -c "..." # see scripts/dev (one-off, not committed)
site/copilot/repo-layout/index.html Breadcrumb: [Home, Agent Reference, Repo Layout]
site/user-guide/dashboard/index.html Breadcrumb: [Home, User Guide, Dashboard]
site/user-guide/results/index.html Breadcrumb: [Home, User Guide, Results]
site/releases/v0.1.0/index.html Breadcrumb: [Home, Releases, v0.1.0]
site/auth/index.html Breadcrumb: [Home, Auth]
site/get-started/index.html types=[TechArticle, BreadcrumbList, HowTo]
site/index.html types=[@graph(WebSite+Org+SoftwareApplication), FAQPage]
TOTAL INVALID JSON BLOCKS: 0
Self-assessed SEO/GEO score after this pass: roughly A− (≈88/100) on the rubric used for the baseline review. Remaining headroom is in off-page authority (backlinks, GSC/Bing registration) and Material's own bundled JS still referencing unpkg, neither of which is addressable from this repo alone.
Update — critical hardening pass¶
A skeptical re-audit of the two earlier passes found three real defects that the "everything builds clean" smoke tests had hidden.
1. Template was published as a public asset (CRITICAL)¶
theme.custom_dir pointed at docs/overrides/, which sits inside
docs_dir. MkDocs therefore copied docs/overrides/main.html verbatim
into site/overrides/main.html. Anyone visiting
https://dotnetpower.github.io/elb-dashboard/overrides/main.html would
see the raw Jinja template (including, for example, the per-page
page.meta.jsonld plumbing). Search engines would also index it as
garbage HTML.
Fix: moved docs/overrides/ → repo-root overrides/ (sibling of
mkdocs.yml) and updated theme.custom_dir: overrides in
mkdocs.yml. MkDocs no longer treats it as docs
content. Verified site/overrides/ does not exist after a clean build.
2. HTML entities leaked into JSON-LD values¶
MkDocs renders page.title through the markdown pipeline, so an H1
like Authentication & Authorization arrives in templates as the
HTML-escaped string Authentication & Authorization. The earlier
pass embedded that directly via | tojson, publishing
"name": "Authentication & Authorization" to crawlers — 6 such
warnings were detected across the features_change archive.
Fix: added an unescape_html Jinja filter in
scripts/docs/llms_hooks.py
(registered via the on_env hook) and applied it to every
page-derived string in overrides/main.html
JSON-LD blocks (page_title, page_descr, crumbs.name,
config.site_author). Post-fix audit: 0 entity warnings across 790
JSON-LD blocks on 400 generated HTML pages.
3. Average og:title was 82 chars (Google truncates at ~60)¶
The previous round set long descriptive titles like
"Repository Layout — ElasticBLAST Control Plane (Agent Reference)",
then the social plugin appended - {site_name}, producing 100-char
og:titles. Google truncates SERP snippets at roughly 60 characters.
Fix: rewrote every page's frontmatter title: to be concise (the
plugin already appends the site name automatically). Average og:title
length dropped from 82.5 → 51.1 characters; the longest is now
66 chars; 0 pages ≥70 chars.
Smaller hardening items¶
llms-full.txtnow excludes both off-nav pages (copilot/security-audit-followup.md,copilot/version-management.md). The security-audit page documents open follow-up items from the 2026-05-22 security sweep — surfacing it to AI crawlers would advertise unfixed weaknesses. Corpus shrunk from 7,116 → 6,664 lines.- Verified every HowTo step URL anchor on Get Started matches a real
page slug (
#choose-your-path,#before-you-start, …). - Schema.org required-field check across the home, Get Started, Auth, Dashboard, and Repo Layout pages: every node (FAQPage, HowTo, TechArticle, BreadcrumbList, WebSite, Organization, SoftwareApplication) carries its required properties.
- Verified Mermaid script integrity: pinned local copy is 3.2 MB,
sha384-WmdflGW9aGfoBdHc4rRyWzYuAjEmDwMdGdiPNacbwfGKxBW/SO6guzuQ76qjnSlr. Did not add an SRI attribute to the<script>tag —extra_javascriptdoesn't exposeintegrityplumbing, and the file is now same-origin anyway, so the SRI benefit is marginal.
Final validation¶
$ uv run ruff check scripts/docs/llms_hooks.py
All checks passed!
$ uv run mkdocs build --strict
INFO - Documentation built in 6.21 seconds # warm
INFO - Documentation built in 28.45 seconds # cold (cards regenerated)
$ python -c "audit script"
JSON-LD parse errors: 0 (790 blocks / 400 pages)
JSON-LD entity warnings: 0
og:title avg / max: 51.1 / 66 chars
overrides/main.html in site: False
sensitive pages in llms-full.txt: False
Re-revised score after hardening: ~92/100 (A). The remaining points are still external-only (off-page authority, social-graph backlinks).