Do not replace .md with .html in non-relative links in docs.
Until now, we had docs pointing to non-existent .html files in github and even
wrongly replaced README.md with README.html in some code examples.
For example, in configurable-attributes doc, we had:
<a href="https://github.com/bazelbuild/bazel-skylib/blob/master/docs/selects_doc.html"><code class="highlighter-rouge">selects</code></a>
After the fix, we will have:
<a href="https://github.com/bazelbuild/bazel-skylib/blob/master/docs/selects_doc.md"><code class="highlighter-rouge">selects</code></a>
Full recursive diff of `./scripts/serve-docs.sh --target` output
with the result of this fix:
https://gist.github.com/tetromino/fa590eff74db10ac0815773cb46d821b
Note that parsing markdown or html with sed is generally impossible to
get right, but since we're already doing it, we may for now continue; the
alternative would be to add an intelligent markdown and html parser to
the pipeline.
Fixes https://github.com/bazelbuild/bazel/issues/6285
RELNOTES: None.
PiperOrigin-RevId: 316705091
diff --git a/site/jekyll-tree.sh b/site/jekyll-tree.sh
index 905ae46..97cd453 100755
--- a/site/jekyll-tree.sh
+++ b/site/jekyll-tree.sh
@@ -109,7 +109,18 @@
local tempf=$(mktemp -t bazel-doc-XXXXXX)
chmod +w $f
- cat "$f" | sed 's,\.md,.html,g;s,Blaze,Bazel,g;s,blaze,bazel,g' > "$tempf"
+ # Replace .md with .html only in relative links to other Bazel docs.
+ # sed regexp explanation:
+ # \( and \) delimits a capturing group
+ # \1 inserts the capture
+ # [( "'\''] character preceding a url in markdown syntax (open paren
+ # or space) or html syntax (a quote); note that '\'' embeds
+ # a single quote in a bash single-quoted string.
+ # [a-zA-Z0-9/._-]* zero or more legal url characters but not ':' - meaning
+ # that the url is not absolute.
+ cat "$f" | \
+ sed -e 's,\([( "'\''][a-zA-Z0-9/._-]*\)\.md,\1.html,g' \
+ -e 's,Blaze,Bazel,g;s,blaze,bazel,g' > "$tempf"
cat "$tempf" > "$f"
}