iustin - Types as control flow constructs

A bug I’ve recently seen in production code gave me the idea for this blog post. Probably smarter people already wrote better things on this topic, so this is mostly for myself, to better summarise my own thoughts. Corrections are welcome, please leave a comment!

Let’s say we have a somewhat standard API in Python or C++:

a function (init_t) to create/initialise our data type, which returns an object of type t
the object then has some properties or methods that we can call

Signalling failures to initialise the object can be done in two ways: either by raising an exception, or by returning a null/None result. There are advantages and disadvantages to both:

if using exceptions, then you must be sure to catch and handle all possible exceptions that the initialisation function can raise (otherwise more of your code can be aborted than you want)
if using None as return value, then you must check that the value is correct before using it, otherwise you get an exception or null pointer dereference

The None model can create latent bugs, for example in the following code:

ok = True
for arg in input_list:
  t = init_t(arg)

  if t is None:
    ok = False
    continue

  t.process()

return ok

The presence of the continue statement there is critical. Removing it, or moving it after some other statements which work with t will result in a bug. So using value-based returns forces us to introduce (manually) control points, without having the possibility to validate the model by the compiler (e.g. in C++).

So it would seem that this kind of latent bugs pushes us to use the exception model, with its drawbacks.

Let’s look at how this interface would be implemented in (IMHO) idiomatic Haskell (where the a and b types represent the input and output types):

initT :: a -> Maybe T
processT :: T -> b

my_fn =
 …
 case initT arg of
   Nothing -> -- handle failure
   Just v -> processT v

Yes, this can be written better, but it’s beside the main point. The main point is that by introducing a wrapper type “around” our main type (T), we are forced via the type system to handle the failure case. We can’t simply pass the result of initT to a function which accepts T, because it won’t type check. And, no matter what we do with the result value, there are no exceptions involved here, so we only have to think about types/values, and not control flow changes. In effect, types become automatically-validated control-flow instructions. Or so it looks to me ☺.

So using types properly, we can avoid the exception-vs-return-value debate, and have all advantages without the disadvantages of either. If that is the case, why isn’t this technique used more in other languages? At least in statically typed languages, it would be possible to implement it (I believe), via a C++ template, for example. In Python, you can’t actually apply it, as there’s no static way of enforcing the correct decapsulation. I was very saddened to see that Google’s Go language, which is quite recent, has many examples where initialisation functions return a tuple err, value = …, separating the actual value from the error, making it not safer than Python. It might be that polymorphic types are not as easy to work with, or it might be the lack of pattern matching. In any case, I don’t think this was the last time I’ve seen a null pointer dereference (or the equivalent AttributeError: ‘NoneType’ object has no attribute …). Sadly ☹

You can even go further in Haskell and introduce more control flow structure via wrapper types. Please bear another contrived example: an HTML form that gets some input data from the user, validates it, saves it to the database, and echoes it back to the user. Without types, you would have to perform these steps manually, and ensure they are kept in the correct order when doing future modifications. With types, you only have to design the types correctly and export only smart contructors (but not the plain ones):

module Foo ( ValidatedValue
           , validateValue
           , RecordId
           , CommittedValue
           , commitValue
           , buildFeedbackForm
           ) where

data ValidatedValue a = ValidatedValue a

validateValue :: a -> Maybe (ValidatedValue a)

data RecordId = …

data CommittedValue a = CommittedValue a RecordId

commitValue :: ValidatedValue a -> ComittedValue a

buildFeedbackForm :: CommittedValue a -> HTMLDoc

From these types, it follows more or less that the only correct workflow is:

get a value from the user
validate it
commit it, getting a transaction/record ID
send the response to the user

In other words:

handleFeedbackForm = buildFeedbackForm . commitValue . validateValue

There are still issues here, e.g. the type a is completely hidden behind the wrapper types, and we can’t recover some basic properties (even if we use newtype, unless we use GeneralizedNewtypeDeriving). But it does offer a way to improve control flow safety.

And that is my 0.02 currency unit for today.