Sunday 14 August 2016

On type-level programming in Haskell: types and constraints

When I started working on the Kind interpreter in Haskell, I had a few goals:

  • Get the interpreter working both with minimal hassle and efficiently, which Haskell was reputed to be a good language for
  • Refresh my functional programming skills, which had been a little neglected for a few yearsdecades.
  • Learn about Haskell's type system, which was reputed to be one of the best in a mainstream language.

While working on the object system for Kind's interpreter, I got a perfect chance to explore the third of those goals, because I wanted to do something that is perhaps a little harder in Haskell than it should be: store data of an unknown type in a variable, and access it later without needing to guess its exact type.

As an example, in a typical object-oriented language, I can do something like this:

class MyDataObject {
   Showable value;   // interface defines 'show()' method that returns a string
   // constructor/accessors go here
}

MyDataObject obj1 = new MyDataObject (32);

out.print (obj1.getValue().show());

Note that the last line there does not know or need to know that the value stored in obj1 is an integer; it could in fact be anything that supports the appropriate interface.

It turns out that doing this in Haskell is hard. There is a standard type, Data.Dynamic that is somewhat similar to MyDataObject in that it allows you to package up an arbitrary value and forget its actual type at compile time, but the problem is that the only way to get the value back out again is if you know the value's type. But Haskell's equivalent to the Showable interface I used above isn't a type, but a constraint. Which I now need to explain...

Types versus Constraints

In an Object Oriented language, interfaces and classes each have their own type, and those types have subtype and supertype relationships to each other. We can happily convert between one type and its supertypes (although the other way around either requires a runtime check [static languages], or could fail at runtime if an unsupported operation is used [dynamic languages]). Everything is a type.

In Haskell, things are a little more complicated. To a first approximation, it supports three entirely distinct kinds* of type:

  • Data types - things that can have values. In the typical OO language, classes, interfaces, primitives, structures, enumerations, delegates, events, or just about anything else you can think of is a data type. In Haskell, data types are merely the surface level. We also have:
  • Type constructors. These are like functions that return types. They are vaguely equivalent to a generic type in an OO language, except that they have a first class existence of their own, and can never actually be *used* without full instantiation (cf Java, where I can have an ArrayList<String> and simply assign it to a variable of type ArrayList -- in Haskell you'd get a kind error if you tried to do that).
  • Type constraints, of which there is only one variety in vanilla un-extended Haskell: the type class. These are like predicates on types, that state whether a particular set of functions is available for them.

This last point is critical, and something a lot of novice Haskell programmers miss. When we see some code like this:

class MyTypeClass t where
    someFunction :: t -> t
    anotherFunction :: t -> t -> Int

data Imp = Imp Int

instance MyTypeClass Imp where
    someFunction (Imp i) = i + 1
    anotherFunction (Imp a) (Imp b) = a - b

f :: MyTypeClass t => t -> t -> t
f a b | anotherFunction (someFunction a) b == 0 = a
      | otherwise = f (someFunction a) b

we might look at this from the background of spending most of our time in object-oriented environments and say that MyTypeClass is like an interface or abstract class. The use of the name "class" which seems familiar reinforces that thought. And at it's surface, it is a bit like that; you can use it to do some of the same things. But in its details it is different.

One thing to note is that the functions in MyTypeClass do not have the same type in different instances. In the instance for MyTypeClass Imp, someFunction for example has type Imp -> Imp, while for another instance, perhaps MyTypeClass Gnome, the type would be different (in this case, Gnome -> Gnome). In an object-oriented language, the function would be specialised for the first argument, but the remaining arguments and the result would not be specialised. In such a situation, the types of the functions would look something like this:

  someFunction :: Imp -> MyTypeClass
  anotherFunction :: Imp -> MyTypeClass -> Int

Note that this isn't valid Haskell. The reason for this is because MyTypeClass isn't a type; Haskell doesn't know how to generically handle objects that have instances of it, because of the way type classes are implemented (which is something I'll discuss in more detail later on) -- in every case, the type of a value either received or returned from a function must be known by the caller (and indeed in cases where a value is returned, the caller gets to specify what type is returned).

It turns out that the fact that we can require *all* of the values mentioned in a type class to be exactly the same type is particularly useful. Without the ability to do this, for example, you can't implement a generic Monad type, which means that all of the abstract operations that can be performed on Monads in a language like Haskell (e.g. sequencing operations, collecting results of lists of operations, folding lists of operations, applying monadic actions to lists of values and collecting the results, lfiting functions to operate on monadic types, and so on) must be implemented for each individual monadic type in a traditional object-oriented language. Less esoterically, you can implement type-safe comparison where you don't have to worry about the possibility of two types that aren't mutually comparable being found at runtime: that simply can't happen with a comparison type class, the compiler will catch the problem.

But this flexibility does come at a cost: type classes cannot be considered types, at least not in a sound manner, because the types of different implementations' functions are not compatible, so cannot be passed to each other. Make them compatible, and we lose the flexibility that allows a typesafe Monad or Comparable type.

So why can't Dynamic return a value of a type class?

Now, we have the background to get back to our original point. To get the value out of a Dynamic container, we need to specify its type. It looks like this:

  frobnicateString :: Dynamic -> String
  frobnicateString dyn = case (fromDyn dyn) :: Maybe String of
                             Nothing -> "wasn't a string"
                             Just str -> reverse str

In the case statement, we force fromDyn to attempt to return a Maybe String (i.e. either a string or an indicator that no string is available). Haskell could have worked out that was what I was trying to do, but I've left the type signature in for clarity's sake. fromDyn checks to see whether the value passed to it contains a string; if it does it returns it in a "Just" wrapper, otherwise it returns "Nothing". But we can't specify a type class for it to return, because type classes aren't types. Haskell needs to know what the type of a value is in order to work with it; that's a basic constraint of how the language works. In the next post, I'll start looking at how we can solve this problem.

No comments:

Post a Comment