feat(seo): static sitemap.xml with git-based lastmod#222
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a400d681f1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
superwall-docs | c5373bd | Jun 26 2026, 10:46 PM |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d83f1de65f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Replace the request-time sitemap route (which stamped every URL with new Date() on each crawl, training Google to ignore <lastmod>) with a build-time static sitemap whose <lastmod> comes from real git history. - New scripts/generate-sitemap.ts (runs in the build chain): one git log pass for per-file dates, resolves <include> deps into content/shared so shared edits bump the right pages, and serves dist/client/docs/sitemap.xml. - Shallow clones omit <lastmod> rather than publish one wrong date; git failures degrade gracefully instead of breaking the build. - src/lib/sitemap.ts refactored to pure, testable, worker-safe helpers. - Remove runtime route src/routes/sitemap[.]xml.ts (regenerates routeTree).
Two follow-ups from PR review on the sitemap generator: - Deploy environments (Cloudflare Workers Builds) shallow-clone with no fetch-depth setting, which left the deployed sitemap with no <lastmod>. Detect a shallow clone and deepen it with 'git fetch --unshallow' (anonymous; the repo is public). Falls back to omitting <lastmod> if history still can't be obtained — never fails the build. - /docs/changelog renders <ChangelogTimeline/>, which imports the committed src/lib/changelog-entries.json. Add that data file as a supplemental source so changelog regenerations bump the page's date.
d83f1de to
622c80a
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 622c80a. Configure here.
If 'git rev-parse --is-shallow-repository' itself errors, the old code treated the repo as non-shallow and still computed dates — which on a shallow clone could publish clustered, misleading <lastmod> values. Now an unreadable depth probe omits <lastmod> instead, matching the script's 'never publish wrong dates' policy. Also drops the redundant post-fetch re-check (a successful --unshallow already implies full history) and the now-unused isShallowRepository helper. Addresses Cursor Bugbot review on PR #222.

What & why
/docs/sitemap.xmlwas generated at request time on the Cloudflare Worker and stampedevery URL with
new Date().toISOString(). That tells Google "every page changed just now"on every crawl, so Google learns to ignore our
<lastmod>entirely (it only trusts the fieldwhen it's consistently accurate).
This replaces that with accurate, build-time
<lastmod>derived from git history, servedas a static asset. Verified end-to-end on a Cloudflare preview deploy: 592 URLs, all dated.
Approach
Cloudflare Workers have no filesystem / git at request time, so dates are resolved during
bun run build(Node, full repo) — same pattern as the existinggenerate-static-cache/generate-search-indexpost-build scripts. The site has no runtime content source (MDX iscompiled into the bundle), so the page set is fixed at build time and a live route buys
nothing. The generated
dist/client/docs/sitemap.xmlis served at/docs/sitemap.xml— thesame delivery path proven by
search-index.json(confirmed in preview: HTTP 200,application/xml).How dates are computed
git log --no-merges --name-only --pretty=format:…%cspass builds afile → latest-commit-date (YYYY-MM-DD)map (1 subprocess, not ~590).content/docs/<page.path>.<include>dependencies are resolved transitively — 107 pages render shared bodiesfrom
content/shared/**, so an edit to a shared file bumps every page that includes it./docs/changelogrenders<ChangelogTimeline/>, whichimports the committed
src/lib/changelog-entries.json; that file is added as asupplemental source so changelog regenerations bump the page's date.
/docs→src/routes/index.tsx;/home(301→dashboard) → dashboard content./ios,/android, …) inherit their content source and only get apriority bump (single source of truth).
<lastmod>omitted(never falls back to
new Date()).Robustness — shallow clones self-heal
Deploy environments (Cloudflare Workers Builds) shallow-clone with no fetch-depth setting,
which would otherwise leave every page date-less. The generator detects a shallow clone and
deepens it with
git fetch --unshallow(anonymous — the repo is public). Verified in theCloudflare build log:
✓ Fetched full git history → 592 urls (592 with <lastmod>). If historystill can't be obtained, it omits
<lastmod>rather than publish a wrong date, and neverfails the build (git errors degrade gracefully).
Changes
src/lib/sitemap.ts— pure, worker-safe:getSitemapSourceEntries(dedupe + prioritymerge),
attachLastModified(date resolution injected by caller), optional<lastmod>.scripts/generate-sitemap.ts— new build-time generator (git dates, include + componentdata resolution, shallow self-heal, graceful degradation), wired into
build.src/routes/sitemap[.]xml.ts(+ regeneratedrouteTree.gen.ts).src/lib/seo-routes.test.ts— updated for the new API.Testing
bun test— 69 pass.<lastmod>, valid XML (xmllint);/docs/changelogcorrectlyreflects max(wrapper, changelog JSON); include resolution verified.
Note on current dates
~586 of ~590 pages currently share
2026-06-23because of recent bulk commits (#218/#219).That's accurate git history; dates diverge naturally as pages are edited individually.
Notes
/homeis a redirecting URL in the sitemap (pre-existing); left as-is.Note
Low Risk
SEO/build pipeline change only; graceful degradation on git issues and no auth or runtime behavior changes beyond sitemap delivery path.
Overview
Replaces request-time sitemap generation (every URL stamped with
new Date()) with a build-time staticdist/client/docs/sitemap.xml, served likesearch-index.jsonon Cloudflare Workers where git/fs are unavailable at runtime.scripts/generate-sitemap.tsruns aftervite build, loads the docs page list via a lightweight Vite SSR pass, maps each URL to backing source files (MDX paths, transitive<include>deps, and supplemental files like changelog JSON), and sets<lastmod>from a singlegit logpass. Shallow CI clones are deepened withgit fetch --unshallowwhen possible; git failures omit<lastmod>without failing the build.src/lib/sitemap.tsis refactored into pure helpers:getSitemapSourceEntries(dedupe, static landing priorities, source path mapping),attachLastModified(injected date resolution), and XML that only emits<lastmod>when a date is known. The TanStacksitemap[.]xmlworker route is removed; tests inseo-routes.test.tscover the new API.Reviewed by Cursor Bugbot for commit c5373bd. Bugbot is set up for automated code reviews on this repo. Configure here.