iustin - Hakyll basics

As part of my migration to Hakyll, I had to spend quite a bit time understanding how it works before I became somewhat “at-home” with it. There are many posts that show “how to do x”, but not so many that explain its inner workings. Let me try to fix that: at its core, Hakyll is nothing else than a combination of make and m4 all in one. Simple, right? Let’s see :)

Note: in the following, basic proficiency with Haskell is assumed.

Monads and data types

Rules

The first area (the make equivalent), more precisely the Rules monad, concerns itself with the rules for mapping source files into output files, or creating output files from scratch.

Key to this mapping is the concept of an Identifier, which is name in an abstract namespace. Most of the time—e.g. for all the examples in the upstream Hakyll tutorial—this identifier actually maps to a real source file, but this is not required; you can create an identifier from any string value.

The similarity, or relation, to file paths manifests in two ways:

the Identifier data type, although opaque, is internally implemented as a simple data type consisting of a file path and a “version”; the file path here points to the source file (if any), while the version is rather a variant of the item (not a numeric version!).
if the identifier has been included in a rule, it will have an output file (in the Compiler monad, via getRoute).

In effect, the Rules monad is all about taking source files (as identifiers) or creating them from scratch, and mapping them to output locations, while also declaring how to transform—or create—the contents of the source into the output (more on this later). Anyone can create an identifier value via fromFilePath, but “registering” them into the rules monad is done by one of:

matching input files, via match, or matchMetadata, which takes a Pattern that describes, well, matching source files (on the file-system):
```
match :: Pattern -> Rules () -> Rules ()
```

creating an abstract identifier, via create:

create :: [Identifier] -> Rules () -> Rules ()

Note: I’m probably misusing the term “registered” here. It’s not the specific value that is registered, but the identifier’s file path. Once this string value has been registered, one can use a different identifier value with a similar string (value) in various function calls.

Note: whether we use match or create doesn’t matter; only the actual values matter. So a match "foo.bar" is equivalent to create ["foo.bar"], match here takes the list of identifiers from the file-system, but does not associated them to the files themselves—it’s just a way to get the list of strings.

The second argument to the match/create calls is another rules monad, in which we’re processing the identifiers and tell how to transform them.

This transformation has, as described, two aspects: how to map the file path to an output path, via the Rules data type, and how to compile the body, in the Compiler monad.

Name mapping

The name mapping starts with the route call, which lifts the routes into the rules monad.

The routing has the usual expected functionality:

idRoute :: Routes, which maps 1:1 the input file name to the output one.
setExtension :: String -> Routes, which changes the extension of the filename, or sets it (if there wasn’t any).
constRoute :: FilePath -> Routes, which is special in that it will result in the same output filename, which is obviously useful only for rules matching a single identifier.
and a few more options, like building the route based on the identifier (customRoute), building it based on metadata associated to the identifier (metadataRoute), composing routes, match-and-replace, etc.

All in all, routes offer all the needed functionality for mapping.

Note that how we declare the input identifier and how we compute the output route is irrelevant, what matters is the actual values. So for an identifier with name (file path) foo.bar, route idRoute is equivalent to constRoute "foo.bar".

Compilation

Slightly into more interesting territory here, as we’re moving beyond just file paths :) Lifting a compiler into the routes monad is done via the compile function:

compile :: (Binary a, Typeable a, Writable a) => Compiler (Item a) -> Rules ()

The Compiler monad result is an Item a which is just and identifier with a body (of type a). This type variable a means we can return any Writable item. Many of the compiler functions work with/return String, but the flexibility to use other types is there.

The functionality in this module revolves around four topics:

The current identifier

First the very straightforward functions for the identifier itself:

getUnderlying :: Compiler Identifier, just returns the identifier
getUnderlyingExtension :: Compiler String, returns the extension

And the for the body (data) of the identifier (mostly copied from the haddock of the module):

getResourceBody :: Compiler (Item String): returns the full contents of the matched source file as a string, but without metadata preamble, if there was one.
getResourceString :: Compiler (Item String), returns the full contents of the matched source file as a string.
getResourceLBS :: Compiler (Item ByteString), equivalent to the above but as lazy bytestring.
getResourceFilePath :: Compiler FilePath, returns the file path of the resource we are compiling.

More or less, these return the data to enable doing arbitrary things to it, and are at the cornerstone of a static site compiler. One could implement a simple “copy” compiler by doing just:

match "*.html" $ do
  -- route to the same path, per earlier explanation.
  route idRoute
  -- the compiler just returns the body of the source file.
  compile getResourceLBS

All the other functions in the module work on arbitrary identifiers.

Routing

I’m used to Yesod and its safe routes functionality. Hakyll has something slightly weaker, but with programmer discipline can allow similar levels of I know this will point to the right thing (and maybe correct escaping as well). Enter the:

getRoute :: Identifier -> Compiler (Maybe FilePath)

function which I alluded to earlier, and which—either for the current identifier or another identifier—returns the destination file path, which is useful for composing links (as in HTML links) to it.

For example, instead of hard-coding the path to the archive page, as /archive.html, one can instead do the following:

let archiveId = "archive.html"

create [archiveId] $ do
  -- build here the archive page
  …

-- later in the index page
create "index.html" $ do
  …
  compile $ do
    -- compute the actual url:
    archiveUrl <- toUrl <$> getRoute archiveId
    -- then use it in the creation of the index.html page

The reuse of archiveId above ensures that if the actual path to the archive page changes (renames, site reorganisation, etc.), then all the links to it (assuming, again, discipline of not hard-coding them) are automatically pointing to the right place.

Working with other identifiers

Getting to the interesting aspect now. In the compiler monad, one can ask for any other identifier, whether it was already loaded/compiled or not—the monad takes care of tracking dependencies/compiling automatically/etc.

There are two main functions:

load :: (Binary a, Typeable a) => Identifier -> Compiler (Item a), which returns a single item, and
loadAll :: (Binary a, Typeable a) => Pattern -> Compiler [Item a], which return a list of items, based on the same patterns used in the rules monad.

If the identifier/pattern requested do not match actual identifiers declared in the “parent” rules monad, then these calls will fail (as in monadic fail).

The use of other identifiers in a compiler step is what allows moving beyond “input file to output file”; aggregating a list of pages (e.g. blog posts) into a single archive page is the most obvious example.

But sometimes getting just the final result of the compilation step (of other identifiers) is not flexible enough—in case of HTML output, this includes the entire page, including the <html><head>…</head> part, not only the body we might be interested in. So, to ease any aggregation, one uses snapshots.

Snapshots

Snapshots allow, well, snapshotting the intermediate result under a specific name, to allow later retrieval:

saveSnapshot :: (Binary a, Typeable a) => Snapshot -> Item a -> Compiler (Item a), to save a snapshot
loadSnapshot :: (Binary a, Typeable a) => Identifier -> Snapshot -> Compiler (Item a), to load a snapshot, similar to load
loadAllSnapshots :: (Binary a, Typeable a) => Pattern -> Snapshot -> Compiler [Item a], similar to loadAll

One can save an arbitrary number of snapshots at various steps of the compilation, and then re-use them.

Note: load and loadAll are actually just the snapshot variant, with a hard-coded value for the snapshot. As I write this, the value is "_final", so probably it’s best not to use the underscore prefix for one’s own snapshots. A bit of a shame that this is not done better, type-wise.

What next?

We have rules to transform things, including smart name transforming, we have compiler functionality to transform the data. But everything mentioned until now is very generic, fundamental functionality, bare-bones to the bone (ha!).

With just this functionality, you have everything needed to build an actual site. But starting at this level would be too tedious even for hard-core fans of DIY, so Hakyll comes with some built-in extra functionality.

And that will be the next post in the series. This one is too long already :)