Pandoc, Pagefind and Make

Recently I’ve refresh my approach to website generation using three programs.

Pandoc does the heavy lifting. It renders all the HTML pages, CITATION.cff (from the projects codemeta.json) and rendering an about.md file (also from the project’s codemeta.json). This is done with three Pandoc templates. Pandoc can also be used to rendering man pages following a simple page recipe.

I’ve recently adopted Pagefind for indexing the HTML for the project’s website and providing the full text search UI suitable for a static website. The Pagefind indexes can be combined with your group or organization’s static website providing a rich cross project search (exercise left for another post).

Finally I orchestrate the site construction with GNU Make. I do this with a simple dedicated Makefile called website.mak.

website.mak

The website.mak file is relatively simple.

  1. #
  2. # Makefile for running pandoc on all Markdown docs ending in .md
  3. #
  4. PROJECT = PROJECT_NAME_GOES_HERE
  5. MD_PAGES = $(shell ls -1 *.md) about.md
  6. HTML_PAGES = $(shell ls -1 *.md | sed -E 's/.md/.html/g') about.md
  7. build: $(HTML_PAGES) $(MD_PAGES) pagefind
  8. about.md: .FORCE
  9. cat codemeta.json | sed -E 's/"@context"/"at__context"/g;s/"@type"/"at__type"/g;s/"@id"/"at__id"/g' >_codemeta.json
  10. if [ -f $(PANDOC) ]; then echo "" | pandoc --metadata title="About $(PROJECT)" --metadata-file=_codemeta.json --template codemeta-md.tmpl >about.md; fi
  11. if [ -f _codemeta.json ]; then rm _codemeta.json; fi
  12. $(HTML_PAGES): $(MD_PAGES) .FORCE
  13. pandoc -s --to html5 $(basename $@).md -o $(basename $@).html \
  14. --metadata title="$(PROJECT) - $@" \
  15. --lua-filter=links-to-html.lua \
  16. --template=page.tmpl
  17. git add $(basename $@).html
  18. pagefind: .FORCE
  19. pagefind --verbose --exclude-selectors="nav,header,footer" --bundle-dir ./pagefind --source .
  20. git add pagefind
  21. clean:
  22. @if [ -f index.html ]; then rm *.html; fi
  23. @if [ -f README.html ]; then rm *.html; fi
  24. .FORCE:

Only the “PROJECT” value needs to be set. Typically this is just the name of the repository’s base directory.

Pandoc, filters and templates

When write my Markdown documents I link to Markdown files instead of the HTML versions. This serves two purposes. First GitHub can use this linking directory and second if you decide to repurposed the website as a Gopher or Gemini resource you don’t linking to the Markdown file makes more sense. To convert the “.md” names to “.html” when I render the HTML I use a simple Lua filter called links-to-html.lua.

  1. # links-to-html.lua
  2. function Link(el)
  3. el.target = string.gsub(el.target, "%.md", ".html")
  4. return el
  5. end

The “page.tmpl” file provides a nice wrapper to the Markdown rendered as HTML by Pandoc. It includes the site navigation and project copyright information in the wrapping HTML. It is based on the default Pandoc page template with some added markup for navigation and copyright info in the footer. I also update the link to the CSS to conform with our general site branding requirements. You can generate a basic template using Pandoc.

  1. pandoc --print-default-template=html5

I also use Pandoc to generate an “about.md” file describing the project and author info. The content of the about.md is taken directly from the project’s codemeta.json file after I’ve renamed the “@” JSON-LD fields (those cause problems for Pandoc). You can see the preparation of a temporary “_codemeta.json” using cat and sed to rename the fields. This is I use a Pandoc template to render the Markdown from.

  1. ---
  2. title: $name$
  3. ---
  4. About this software
  5. ===================
  6. $name$ $version$
  7. ----------------
  8. $if(author)$
  9. ### Authors
  10. $for(author)$
  11. - $it.givenName$ $it.familyName$
  12. $endfor$
  13. $endif$
  14. $if(description)$
  15. $description$
  16. $endif$
  17. $if(license)$- License: $license$$endif$
  18. 0$if(codeRepository)$- GitHub: $codeRepository$$endif$
  19. $if(issueTracker)$- Issues: $issueTracker$$endif$
  20. $if(programmingLanguage)$
  21. ### Programming languages
  22. $for(programmingLanguage)$
  23. - $programmingLanguage$
  24. $endfor$
  25. $endif$
  26. $if(operatingSystem)$
  27. ### Operating Systems
  28. $for(operatingSystem)$
  29. - $operatingSystem$
  30. $endfor$
  31. $endif$
  32. $if(softwareRequirements)$
  33. ### Software Requiremets
  34. $for(softwareRequirements)$
  35. - $softwareRequirements$
  36. $endfor$
  37. $endif$
  38. $if(relatedLink)$
  39. ### Related Links
  40. $for(relatedLink)$
  41. - [$it$]($it$)
  42. $endfor$
  43. $endif$

This same technique can be repurposed to render a CITATION.cff if needed.

Pagefind

Pagefind provides three levels of functionality. First it will generate indexes for a full text search of your project’s HTML pages. It also builds the necessary search UI for your static site. I include the search UI via a Markdown document that embeds the HTML markup described at Pagefind.app’s Getting started page. When I invoke Pagefind I use the --bundle-dir option to be “pagefind” rather than “_pagefind”. The reason is GitHub Pages ignores the “pagefind” (probably ignores all directories with ”” prefix).

If you need a quick static web server while you’re writing and developing your documentation website Pagefind can provide that using the --serve option. Assuming you’re in your project’s directory then something like this should do the trick.

  1. pagefind --source . --bundle-dir=pagefind --serve