Hakyll basics
Posted on March 21, 2018 with tags haskell. Part 1 of a 1-part series on hakyll. See the previous or next posts.
As part of my migration to Hakyll, I had to spend quite a bit time
understanding how it works before I became somewhat “at-home” with
it. There are many posts that show “how to do x”, but not so many
that explain its inner workings. Let me try to fix that: at its core,
Hakyll is nothing else than a combination of make
and
m4
all in one. Simple, right? Let’s see :)
Note: in the following, basic proficiency with Haskell is assumed.
Monads and data types
Rules
The first area (the make
equivalent), more precisely the
Rules
monad, concerns itself with the rules for mapping
source files into output files, or creating output files from scratch.
Key to this mapping is the concept of an Identifier
, which
is name in an abstract namespace. Most of the time—e.g. for all the
examples in the upstream Hakyll tutorial—this identifier actually maps
to a real source file, but this is not required; you can create an
identifier from any string value.
The similarity, or relation, to file paths manifests in two ways:
- the
Identifier
data type, although opaque, is internally implemented as a simple data type consisting of a file path and a “version”; the file path here points to the source file (if any), while the version is rather a variant of the item (not a numeric version!). - if the identifier has been included in a rule, it will have an
output file (in the
Compiler
monad, via getRoute).
In effect, the Rules
monad is all about taking source
files (as identifiers) or creating them from scratch, and mapping them
to output locations, while also declaring how to transform—or
create—the contents of the source into the output (more on this
later). Anyone can create an identifier value via fromFilePath,
but “registering” them into the rules monad is done by one of:
matching input files, via match, or
matchMetadata
, which takes aPattern
that describes, well, matching source files (on the file-system):match :: Pattern -> Rules () -> Rules ()
creating an abstract identifier, via create:
create :: [Identifier] -> Rules () -> Rules ()
Note: I’m probably misusing the term “registered” here. It’s not the specific value that is registered, but the identifier’s file path. Once this string value has been registered, one can use a different identifier value with a similar string (value) in various function calls.
Note: whether we use match or create doesn’t matter; only the actual
values matter. So a match "foo.bar"
is equivalent to
create ["foo.bar"]
, match here takes the list of
identifiers from the file-system, but does not associated them to the
files themselves—it’s just a way to get the list of strings.
The second argument to the match
/create
calls
is another rules monad, in which we’re processing the identifiers and
tell how to transform them.
This transformation has, as described, two aspects: how to map the file path to an output path, via the Rules data type, and how to compile the body, in the Compiler monad.
Name mapping
The name mapping starts with the route
call, which lifts
the routes into the rules monad.
The routing has the usual expected functionality:
idRoute :: Routes
, which maps 1:1 the input file name to the output one.setExtension :: String -> Routes
, which changes the extension of the filename, or sets it (if there wasn’t any).constRoute :: FilePath -> Routes
, which is special in that it will result in the same output filename, which is obviously useful only for rules matching a single identifier.- and a few more options, like building the route based on the
identifier (
customRoute
), building it based on metadata associated to the identifier (metadataRoute
), composing routes, match-and-replace, etc.
All in all, routes offer all the needed functionality for mapping.
Note that how we declare the input identifier and how we compute the
output route is irrelevant, what matters is the actual values. So for
an identifier with name (file path) foo.bar, route idRoute
is equivalent to constRoute "foo.bar"
.
Compilation
Slightly into more interesting territory here, as we’re moving beyond
just file paths :) Lifting a compiler into the routes monad is done
via the compile
function:
compile :: (Binary a, Typeable a, Writable a) => Compiler (Item a) -> Rules ()
The Compiler
monad result is an Item a
which is just and identifier with a body (of type
a
). This type variable a
means we can return
any Writable item. Many of the compiler functions work with/return
String
, but the flexibility to use other types is there.
The functionality in this module revolves around four topics:
The current identifier
First the very straightforward functions for the identifier itself:
getUnderlying :: Compiler Identifier
, just returns the identifiergetUnderlyingExtension :: Compiler String
, returns the extension
And the for the body (data) of the identifier (mostly copied from the haddock of the module):
getResourceBody :: Compiler (Item String)
: returns the full contents of the matched source file as a string, but without metadata preamble, if there was one.getResourceString :: Compiler (Item String)
, returns the full contents of the matched source file as a string.getResourceLBS :: Compiler (Item ByteString)
, equivalent to the above but as lazy bytestring.getResourceFilePath :: Compiler FilePath
, returns the file path of the resource we are compiling.
More or less, these return the data to enable doing arbitrary things to it, and are at the cornerstone of a static site compiler. One could implement a simple “copy” compiler by doing just:
"*.html" $ do
match -- route to the same path, per earlier explanation.
route idRoute-- the compiler just returns the body of the source file.
compile getResourceLBS
All the other functions in the module work on arbitrary identifiers.
Routing
I’m used to Yesod and its safe routes
functionality. Hakyll has something slightly weaker, but with
programmer discipline can allow similar levels of I know this will
point to the right thing
(and maybe correct escaping as
well). Enter the:
getRoute :: Identifier -> Compiler (Maybe FilePath)
function which I alluded to earlier, and which—either for the current identifier or another identifier—returns the destination file path, which is useful for composing links (as in HTML links) to it.
For example, instead of hard-coding the path to the archive page, as
/archive.html
, one can instead do the following:
let archiveId = "archive.html"
$ do
create [archiveId] -- build here the archive page
…
-- later in the index page
"index.html" $ do
create
…$ do
compile -- compute the actual url:
<- toUrl <$> getRoute archiveId
archiveUrl -- then use it in the creation of the index.html page
The reuse of archiveId
above ensures that if the actual
path to the archive page changes (renames, site reorganisation, etc.),
then all the links to it (assuming, again, discipline of not
hard-coding them) are automatically pointing to the right place.
Working with other identifiers
Getting to the interesting aspect now. In the compiler monad, one can ask for any other identifier, whether it was already loaded/compiled or not—the monad takes care of tracking dependencies/compiling automatically/etc.
There are two main functions:
load :: (Binary a, Typeable a) => Identifier -> Compiler (Item a)
, which returns a single item, andloadAll :: (Binary a, Typeable a) => Pattern -> Compiler [Item a]
, which return a list of items, based on the same patterns used in the rules monad.
If the identifier/pattern requested do not match actual identifiers declared in the “parent” rules monad, then these calls will fail (as in monadic fail).
The use of other identifiers in a compiler step is what allows moving beyond “input file to output file”; aggregating a list of pages (e.g. blog posts) into a single archive page is the most obvious example.
But sometimes getting just the final result of the compilation step
(of other identifiers) is not flexible enough—in case of HTML output,
this includes the entire page, including the
<html><head>…</head>
part, not only the body we might be
interested in. So, to ease any aggregation, one uses snapshots.
Snapshots
Snapshots allow, well, snapshotting the intermediate result under a specific name, to allow later retrieval:
saveSnapshot :: (Binary a, Typeable a) => Snapshot -> Item a -> Compiler (Item a)
, to save a snapshotloadSnapshot :: (Binary a, Typeable a) => Identifier -> Snapshot -> Compiler (Item a)
, to load a snapshot, similar toload
loadAllSnapshots :: (Binary a, Typeable a) => Pattern -> Snapshot -> Compiler [Item a]
, similar toloadAll
One can save an arbitrary number of snapshots at various steps of the compilation, and then re-use them.
Note: load
and loadAll
are actually just
the snapshot variant, with a hard-coded value for the snapshot. As I
write this, the value is "_final"
, so probably it’s best
not to use the underscore prefix for one’s own snapshots. A bit of a
shame that this is not done better, type-wise.
What next?
We have rules to transform things, including smart name transforming, we have compiler functionality to transform the data. But everything mentioned until now is very generic, fundamental functionality, bare-bones to the bone (ha!).
With just this functionality, you have everything needed to build an actual site. But starting at this level would be too tedious even for hard-core fans of DIY, so Hakyll comes with some built-in extra functionality.
And that will be the next post in the series. This one is too long already :)